CN120449861B - Policy file intelligent rule extraction and change comparison method for electric charge checking - Google Patents
Policy file intelligent rule extraction and change comparison method for electric charge checkingInfo
- Publication number
- CN120449861B CN120449861B CN202510953695.3A CN202510953695A CN120449861B CN 120449861 B CN120449861 B CN 120449861B CN 202510953695 A CN202510953695 A CN 202510953695A CN 120449861 B CN120449861 B CN 120449861B
- Authority
- CN
- China
- Prior art keywords
- rule
- policy
- policy rule
- text
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention belongs to the technical field of power file rule extraction and change comparison, and in particular relates to an intelligent rule extraction and change comparison method of policy files for checking electric charges, which comprises the steps of analyzing, reconstructing and optimizing the related policy files of the electric charges, and reserving original context and logic level information; the method comprises the steps of designing a core sentence recognition Prompt template and a rule extraction conversion Prompt template, constructing a policy rule extraction agent, guiding a large language model to recognize the core sentence, extracting rule element information and forming standardized policy rule description, retrieving similar policy rules from a history policy rule description knowledge base, constructing a policy rule comparison agent, carrying out deep semantic comparison and change analysis on the current standardized policy rule description and the history similar policy rule description, realizing the automatic recognition of the new and modified contents of the current rule and the history rule, and providing key technical support for the automation and the intellectualization of the electricity charge checking business rule generation.
Description
Technical Field
The invention belongs to the technical field of power file rule extraction and change comparison, and particularly relates to an intelligent policy file rule extraction and change comparison method for electric charge checking.
Background
In key businesses such as electric charge checking, preferential policy execution and the like in the electric power industry, policy files are important bases for formulating checking logic and execution rules, and various checking rules must be defined for the electric charge related policy files to check whether each link meets the requirements of the related policy files.
With the continued updating of national and local government policies, there is a continuing need to translate newly issued policy terms into executable business rules to ensure rapid fall-to-ground and system compliance of the policies. However, the related policy documents of the electric charge have the characteristics of strong specialization, complex logic, ambiguous treaty expression and the like. To formulate relevant verification rules in a targeted manner, one line of business specialists must be relied on to interpret relevant policy documents, extract points and write rule descriptions. In addition, the method and the system are also used for checking with checking rules in the original informatization system in a comparison mode to identify the influence range of the falling of the policy file on the power checking service. The process is time-consuming and labor-consuming, the process is tedious, the problems of rule omission, logic loopholes and the like are inevitably caused, obvious hysteresis exists on the landing response of the policy file, and the intelligent degree of the power system and the response capability of enterprises are reduced.
In recent years, with the rise of large language model technology, the technical advantages exhibited by the technology in terms of natural language understanding and complex semantic extraction provide a new solution for automatic parsing and rule generation of policy documents. However, the related policy files for checking the electric charge generally have the characteristics of long space, strong field speciality, multiple hidden business logics and the like, and the related policy files for checking the electric charge are directly analyzed by using a general large model, so that deep fusion on business text understanding and business rule requirements is often lacking, and the effect is not ideal, such as inaccurate positioning of rule sentences, omission of rule content information, incorrect understanding and extraction of rule content and the like, and the related policy files are difficult to directly use in a business system. In addition, for automatic acquisition, identification and semantic alignment between new and old rules, the direct application of a large model still lacks systematic technical links and capabilities, and a large amount of auditing and checking work is required, so that the high requirement of timely updating is difficult to meet.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention provides an intelligent policy file rule extraction and change comparison method for electric charge checking.
The technical scheme for solving the technical problems is as follows:
the invention provides an electricity charge checking-oriented policy file intelligent rule extraction and change comparison method, which comprises the following steps:
analyzing the electric charge related policy file, reserving the context level and logic structure information of the original electric charge related policy file in the analysis process, and optimizing the analyzed content;
Designing a core sentence identification promt template, constructing a policy rule extraction agent, identifying a core sentence from the optimized content, extracting element information related to the rule, converting the element information into standardized policy rule description, and outputting the standardized policy rule description corresponding to the original content;
constructing a history policy rule description knowledge base, searching for history similar policy rule descriptions from the history policy rule description knowledge base based on the standardized policy rule descriptions and the original text content, and then carrying out deep semantic comparison and change analysis on the current standardized policy rule descriptions and the history similar policy rule descriptions by constructing a policy rule comparison agent to identify newly added and modified policy clauses and output structured change details.
Further, analyzing the electric charge related policy file includes:
reading the content of the electric charge related policy file, and analyzing the electric charge related policy file by a design file content and structure analysis method, wherein the design file content and structure analysis method comprises the following steps:
For a Word format document, a python-docx-based analysis module is adopted to identify and disassemble the text content structure of the Word format document, the original title level and number structure in the Word format document are reserved in the identification result, different types of identifiers are respectively assigned to the text content, and the reading result is stored as a set containing document style attributes and corresponding contents according to the sequence of the original Word format document;
For PDF format documents, the alignment of PaddleOCR tools is utilized for identification, page elements are subjected to semantic segmentation by using layout analysis technology on the content of each page, the space position of a text block, the text content and the text type of each page are obtained, and the text block, the text content and the text type are also stored as an analysis content set according to the sequence of the original PDF format documents.
Further, the optimizing process is performed on the analyzed content, which specifically comprises:
For each text paragraph in the analyzed content, splicing the title content and the type of the text paragraph into the text paragraph to form a new text paragraph, and blocking the paragraphs with the lengths larger than the preset length according to the maximum blocking length;
and dividing each paragraph block according to sentences, numbering each sentence, and merging the sentences into paragraph texts, namely numbered paragraph blocks.
Further, the design core sentence recognition Prompt template includes:
the core statement recognition Prompt template is defined as P R, expressed as:
;
in the above formula, G represents a set target description; the method comprises the steps of inputting paragraph text blocks with number marks, prompting the recognition step given by S, guiding a large language model to carry out thinking judgment, wherein I represents set important matters, namely the matters needing important attention, O R represents the output format of a core sentence recognition task, takes json format as output, and comprises the number marks, the judgment result and the thinking process of each sentence.
Further, constructing a policy rule extraction agent includes:
based on the core sentence identification Prompt template, creating a core sentence screening tool T:
;
based on the screened core sentences, the design rule extracts and converts the Prompt template P E:
;
combining a core sentence screening tool, a rule extraction and transformation Prompt template and a large language model, and constructing a policy rule extraction agent A E:
;
Wherein M represents a large language model, F represents a sentence filtering flow, c f represents paragraph text screened by a core sentence screening tool, O E represents an output format of a rule extraction conversion task, and E X represents a rule extraction result example which is provided for the large language model for reference.
Further, extracting element information related to the rule, converting the element information into standardized policy rule description, and outputting the standardized policy rule description corresponding to the original text content, wherein the method comprises the following steps:
inputting the numbered paragraph blocks into a policy rule extraction agent, identifying a core sentence with rule properties based on a core sentence screening tool, extracting specific rule elements from the core sentence by using the agent, and outputting standardized and element-clear policy rule expression and corresponding text numbers;
and acquiring the original text content from the paragraph block text according to the original text number, obtaining the original text content corresponding to each policy rule description, and forming the mapping relation between the rules and the original text.
Further, constructing a history policy rule description knowledge base, searching for history similar policy rule descriptions from the history policy rule description knowledge base based on the standardized policy rule descriptions and the original text content, including:
Extracting description information of the historical power checking rule and a corresponding identifier id from a table of the historical power checking rule, and constructing a historical rule description knowledge base;
Based on the policy rule description and the corresponding original text content extracted from the administrative policy file, the similarity retrieval method is utilized to respectively retrieve and acquire the similar history rule description from the knowledge base, and the similarity difference of different expression modes is considered.
Further, constructing a policy rule comparison agent, comprising:
based on the historical similar policy rule description, the design policy rule comparison analyzes the promt template P C:
;
Based on the constructed policy rule comparison and analysis promt template and the large language model, constructing a policy rule comparison agent A C:
;
Wherein G represents a set target description; S represents a recognition step prompt given to guide a large language model to conduct thinking judgment, I represents a set important item, R represents an input current policy rule description, R S is a similar historical policy rule obtained by R retrieval, O C represents an output format of a policy rule comparison analysis task, wherein the output format comprises analysis explanation of the historical rule, a similarity judgment result of the historical rule, a judgment thinking process and change analysis result content, and E C represents a rule comparison analysis result output example.
Further, the newly added and modified policy terms are identified, and a structured change detail is output, the process being expressed as:
;
;
In the above-mentioned method, the step of, AndThe method respectively represents a policy rule description R i and a certain history similar policy rule description, the policy rule compares the policy change type and change detail analysis result output by an agent, the change type comprises modification and new addition, CR represents the comparison and analysis result of all rules, and R Si represents the similar history policy rule obtained by searching the current policy rule description.
Compared with the prior art, the invention has the following technical effects:
(1) The invention designs a document content structured analysis method for electricity charge checking policy rule extraction, which can read, analyze and reconstruct policy files in Word format and PDF format, effectively reserve original context and logic level information, enhance a level structure representation mode, improve the understanding capability of a large language model on document semantics and structural logic, and simultaneously perform optimization processing operations such as block merging and the like on analysis results, so as to avoid model misjudgment caused by context cutoff or redundant content, thereby laying a corpus foundation for subsequent rule extraction and rule comparison.
(2) According to the invention, the core sentence is designed to identify the Prompt template and the rule extraction and conversion Prompt template, the policy rule extraction agent is constructed based on the large language model, the core rule sentence is identified from the input analyzed paragraph text, the rule elements are extracted and converted into the structured policy rule description, and the intelligent conversion from the administrative policy file to the structured rule is realized. The method obviously reduces the manual intervention cost and effectively improves the accuracy and the processing efficiency of rule extraction.
(3) According to the invention, a collaborative mechanism based on vector semantic retrieval and policy comparison agent is introduced, after new policy rules are generated, history similar policy rules can be automatically matched, and by designing and comparing a template for analyzing the promt, the policy rule comparison agent is constructed, so that the similarity and difference conditions of contents between the new policy rules and the old policy rules can be analyzed and compared, the change types and change details of new and added, modified and the like can be accurately identified, the effective inheritance and evolution between the new and the old policy rules can be realized, and the response efficiency of rule updating can be remarkably improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a diagram showing the result of analysis of the policy documents relating to the electricity fee in the experiment of the present invention;
FIG. 3 is a graph showing the results of core sentence recognition in the experiment of the present invention;
FIG. 4 is a graph showing the result of policy rule extraction in the experiment of the present invention;
FIG. 5 is a graph of the results of a comparison of the method of the present invention with a generic large language model;
FIG. 6 is a graph showing the results of comparison and analysis of historical policies in the experiment of the present invention;
FIG. 7 illustrates the details of a core sentence recognition Prompt template;
FIG. 8 illustrates the details of rule extraction transformation Prompt templates;
Fig. 9 shows the specific content of the policy rule comparison analysis promt.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given below of the specific implementation, structure, features and effects of the technical solution according to the present invention with reference to the accompanying drawings and preferred embodiments. The particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention provides an electricity charge checking-oriented policy file intelligent rule extraction and change comparison method. The method adopts a multi-stage processing strategy, integrates the history policy rule description knowledge resource, constructs a policy rule extraction agent and a policy rule comparison agent through a design file analysis method, and forms a policy interpretation and rule extraction link of 'file analysis-rule extraction-rule comparison'. By the method, intelligent conversion from unstructured text to rule extraction and change identification of the file is realized, accuracy of policy rule interpretation is improved, and response period of rule landing is shortened.
In one embodiment of the present invention, referring to fig. 1, an intelligent rule extraction and change comparison method for policy documents for checking electric charges includes the following steps:
Step 100, analyzing the related policy file of the electric charge, reserving the context level and logic structure information of the related policy file of the original electric charge in the analysis process, and optimizing the analyzed content;
Step 200, designing a core sentence recognition template, constructing a policy rule extraction agent, recognizing a core sentence from optimized contents, extracting element information related to the rule, converting the element information into standardized policy rule description, and outputting the standardized policy rule description corresponding to the original contents;
Step 300, constructing a history policy rule description knowledge base, searching for history similar policy rule descriptions from the history policy rule description knowledge base based on the standardized policy rule descriptions and the original text content, comparing the current standardized policy rule descriptions with the history similar policy rule descriptions by constructing a policy rule comparison agent, carrying out deep semantic comparison and change analysis on the current standardized policy rule descriptions and the history similar policy rule descriptions, identifying newly added and modified policy terms, and outputting structured change details.
The following detailed development of each step is performed:
and 100, analyzing the related policy file of the electric charge, retaining the context level and logic structure information of the related policy file of the original electric charge in the analysis process, and performing optimization processing on the analyzed content.
Different from the conventional large language model for segmenting the input text in a fixed length, the invention can analyze the content of the electric charge related policy file of different types by designing the method for analyzing the content and the structure of the electric charge related policy file, retain the context level and the logic structure information of the original electric charge related policy file in the analysis process, and process the analyzed result, so that the finally obtained text block content has good readability and structural integrity, and the large language model can accurately and completely understand the content. The method comprises the following specific steps:
Step 110, reading the content of the related policy file of the electric charge, designing a file content and structure analysis method, analyzing the related policy file of the electric charge, and retaining the context level and logic structure information of the related policy file of the original electric charge in the analysis process.
The electric charge related policy file is typically stored in Word or PDF format. The embodiment designs a differential analysis scheme aiming at the two formats, ensures that the related policy files of the electric charge can be accurately read and structured, and lays a foundation for the subsequent content analysis and rule extraction based on the intelligent agent.
For a Word format document, a python-docx-based analysis module is adopted, the text content structure of the Word format document is identified and disassembled by analyzing style attributes of the Word format document, such as a reading level, paragraph indentation, text content, a numbering list and the like, the original title level and numbering structure in the Word format document are reserved in an identification result, different type identifiers are respectively given to the contents of the title, the text paragraph, the numbering structure and the like, and the reading result is stored according to the sequence of the original Word format document, wherein the result is expressed as:
;
In the above-mentioned method, the step of, Representing all the content parsed in the Word format document, type m representing the m-th type of identification, text m representing the m-th type of text content.
For a PDF format document, since the PDF format document is typeset and directly read, semantic marks are usually lacking and PDF contents in a picture form exist, the invention uses PaddleOCR tools for alignment and identification, applies layout analysis technology to carry out semantic segmentation on page elements of each page, obtains the space position of a text block, the text contents and the text types of each page, such as text, title, table and the like, finally obtains the identification result of the whole PDF format document and stores the identification result according to the sequence of the original PDF format document, and the identification result is expressed as follows:
;
;
in the above expression, C p represents all the contents parsed in the PDF format document, p n represents the parsed contents of the nth page, and region m represents the position of the mth type of text contents in the page.
And 120, performing optimization processing on the analyzed content, wherein the optimization processing comprises title splicing, block merging and numbering marking to obtain the optimized content, namely the numbered paragraph blocks.
In order to ensure that the large language model can efficiently and accurately understand the content when analyzing the related policy file of the electric charge, and avoid influencing the extraction and generation effects due to the limitation of a context window or the interference of redundant information, the invention processes and optimizes the document content analyzed in the step 110.
First, for each text paragraph in C p or C w, the title content and type to which it belongs are spliced in the text paragraph, such as "first-level title: xx < text paragraph content >", to form a new text paragraph, so as to enhance the subject information of the text paragraph.
And secondly, separating the original electric charge related policy file according to the new text paragraphs, judging the length of each paragraph, judging whether the combined length of the adjacent paragraphs is lower than a set maximum block length threshold (512 characters are set in the invention) if the lengths of the adjacent paragraphs are smaller than the preset threshold, and combining the adjacent paragraphs if the combined length of the adjacent paragraphs is still lower than the set maximum block length threshold so as to reduce the request times during model processing and avoid understanding deviation caused by information fragmentation.
Wherein, the result C after the block and the combination is expressed as:
;
In the above equation, c l represents the first paragraph block after merging.
Finally, each paragraph block is segmented according to sentences, and each sentence is numbered and then combined into paragraph text, namely the numbered paragraph block, so that a core rule sentence can be identified later. Numbered paragraph block text collectionExpressed as:
;
In the above formula, c l' represents the numbered first paragraph block.
Step 200, designing a core sentence recognition template, constructing a policy rule extraction agent, recognizing the core sentence from the optimized content, extracting element information related to the rule, converting the element information into standardized policy rule description, and outputting the standardized policy rule description corresponding to the original content.
After analyzing and optimizing the content of the electric charge related policy file, the invention identifies a Prompt template and a rule extraction conversion Prompt template through designing a core sentence, constructs a policy rule extraction intelligent body, combines semantic analysis and rule expression characteristics, guides a large model to understand and screen the segment block content sentence by sentence, identifies the core sentence, ensures the accuracy and purity of rule extraction, guides the large language model to extract element information related to the rule from the identified core sentence, further converts the element information into standardized and highly readable policy rule description, and acquires an original text based on sentence numbers to form a mapping of the rule and the original text. The method comprises the following specific steps:
step 210, designing a core sentence recognition template, and guiding the large language model to accurately recognize the core sentence with rule property in the paragraph block by setting a task target, important matters and an output format.
The large amount of descriptive and background content contained in the electric charge related policy file affects the model extraction accuracy, so that a special core sentence identification promt needs to be designed, and the purpose of guiding and constraining a large language model to accurately identify a core sentence with a rule property is achieved. By setting a guiding step and related reminding in the core sentence identification Prompt, the large language model is helped to quickly lock key information, each sentence is analyzed one by one, whether the sentences contain the core content of the policy rules or not is judged, and irrelevant sentences are excluded.
The core statement recognition Prompt template is defined as P R, expressed as:
;
in the above formula, G represents a set target description; The method comprises the steps of inputting paragraph text blocks with number marks, prompting the recognition step given by S, guiding a large language model to carry out thinking judgment, wherein I represents set important matters, namely the matters needing important attention, O R represents the output format of a core sentence recognition task, takes json format as output, and comprises the number marks, the judgment result and the thinking process of each sentence. The core sentence recognition Prompt template is specifically shown in fig. 7.
Step 220, based on the core sentence recognition Prompt template, creating a core sentence screening tool.
And (3) identifying a template of the Prompt by combining the core sentences constructed in the step (210), and further constructing a core sentence screening tool for realizing automatic filtering of the irregular sentences in the paragraph blocks and accurate extraction of the regular sentences. The core sentence screening tool extracts a judging result corresponding to each number mark based on a large language model output result, then filters out sentences judged to be non-core, only retains the content marked as key core sentences, so that texts entering a subsequent extraction process are closely related to policy content and are clearly expressed, and the text quality of the rule extraction process is ensured.
The structure of the core sentence screening tool T can be expressed as:
;
wherein, P R represents the core sentence recognition promt template constructed in step 210, M represents a large language model, and F represents a sentence filtering flow.
And 230, extracting a conversion promt template based on the screened core sentence by the design rule.
In order to accurately extract rule contents from core sentences, a conversion Prompt template is extracted from design rules, and a large language model is guided to extract and convert the identified core sentences into standardized policy rule expression with clear elements. The rule extraction conversion Prompt template combines the language characteristics and rule expression habit of the electric charge related policy file, performs task guidance around the structure of rule content, guides the large language model to extract complete rule content through three-stage guiding steps of element extraction, logic reconstruction and standardized output, outputs the rule which is convenient to understand and clear in structure, and simultaneously outputs the statement number of the original electric charge related policy file corresponding to the rule so as to acquire the content of the original policy file for comparison and verification.
The rule extraction transformation Prompt template is defined as P E, expressed as:
;
the meaning of G, S, I is the same as that of the core sentence recognition Prompt in the step 210, c f is a paragraph text screened by the tool constructed in the step 220, O E represents an output format of a rule extraction and conversion task, wherein the output format comprises a rule source, namely an original text number identifier to which the rule belongs, a rule name and rule conversion result content, and E X is a rule extraction result example and is provided for a large language model to be referenced. The rule extraction transformation template is specifically shown in fig. 8.
And 240, constructing a policy rule extraction agent by combining the core sentence screening tool, the rule extraction conversion promt template and the large language model.
The policy rule extraction agent is constructed in combination with the rule extraction conversion Prompt template P E constructed in step 230, the tool T constructed in step 220 and the large language model M, so as to realize the automation of the rule extraction process. Policy rules extraction agent a E is structurally represented as:
;
wherein, the The input policy rules are represented as paragraph text blocks with numbered identifiers of the agent.
The policy rule extraction agent guides and calls the core sentence screening tool T through the rule extraction and conversion template, and the screening and standardized expression of the core sentence are completed by utilizing the large language model M, so that the rule expression easy to understand is formed.
And 250, extracting element information related to the policy rules based on the numbered paragraph blocks, converting the element information into standardized policy rule descriptions, and acquiring corresponding original text contents to form a rule-original text mapping relation.
First, for the paragraph block text collection numbered in step 120Sequentially inputting the numbered paragraph blocks c i 'into a policy rule extraction agent to obtain standardized rule description and original text number set which are extracted and converted from the numbered paragraph blocks c i', wherein the standardized rule description and original text number set is expressed as:
;
;
In the above-mentioned method, the step of, After the numbered paragraph blocks are input into the policy rule extraction agent, the acquired rule description and the original text number set are acquired; A kth rule description extracted from the numbered paragraph blocks; the original text number corresponding to the kth rule description is represented, and one rule description may be derived from multiple sentences, so Represented as a set, may contain one or more numbers.
Obtaining the text corresponding to each rule description from the text of the paragraph blocks according to the text number, such as text numberExpressed as:
;
;
In the above-mentioned method, the step of, Representation according toComplete text obtained from numbered paragraph blocks c i' is numbered by eachCorresponding text content is spliced to form; representing a set of policy rule text and textual content mappings.
Finally, integrating the text output results of all the sections, and further obtaining the policy rules and the original text mapping set R extracted from the whole electric charge related policy file, wherein the policy rules and the original text mapping set R are expressed as follows:
;
in the above formula, rq represents the description of the q-th policy rule in the whole electric charge related policy file, and tq represents the original text content corresponding to the q-th policy rule in the whole electric charge related policy file.
For example, the ith rule ri content is:
"business user electricity price = internet electricity price + internet link line loss fee + power transmission and distribution price +".
The i-th rule original text content ti is:
the electricity price of the industrial and commercial users is composed of the internet electricity price, the line loss cost of the internet link, the electricity transmission and distribution price and the system operation cost.
Step 300, constructing a history policy rule description knowledge base, searching for history similar policy rule descriptions from the history policy rule description knowledge base by utilizing a similarity retrieval method based on the standardized policy rule descriptions and the original text content, and carrying out deep semantic comparison and change analysis on the current standardized policy rule descriptions and the history similar policy rule descriptions by constructing a policy rule comparison agent to identify newly added and modified policy clauses and output structured change details.
The electric charge checking work is carried out according to the related policy files of the related electric charge, the related policy files of the electric charge are continuously evolved and updated, and if the difference between the new and old policy files cannot be timely identified and the updating rule can directly influence the carrying out of the electric charge checking work. In order to realize effective inheritance and change analysis between new and old policy rules, a history policy rule description knowledge base is firstly established, history similar policy rule descriptions are obtained based on semantic similarity retrieval, intelligent matching and change recognition between current rules and history rules are realized by a policy rule comparison agent, and a change comparison analysis result of a structured policy is output and is used as an important basis for checking rule updating.
Step 310, constructing a history policy rule description knowledge base.
As an example, this step 310 may include:
and 311, extracting description information and corresponding identifiers of the historical power checking rules from the table of the historical power checking rules to obtain a historical policy rule record set.
Extracting description information of the historical power checking rules and corresponding unique identifiers id from a table of the historical power checking rules stored in the power business database in a special mode to form a complete historical policy rule record set R t, wherein the complete historical policy rule record set R t is expressed as:
;
Where desc i represents the text description of the ith historical policy rule.
In order to improve semantic consistency and text processing efficiency, all description information is subjected to unified standardized processing, including redundant punctuation removal, head and tail blank character removal and the like.
And 312, performing semantic coding on the history policy rule record set by using a pre-trained vector embedding model to obtain a history policy rule description vector, and storing the history policy rule description vector in a history policy rule description knowledge base.
A set of history policy rule records is semantically encoded using a pre-trained vector embedding model. Specifically, the text description content desc i of each history policy rule is input to a pre-trained vector embedding model, expressed as:
;
In the above equation, V i represents the history policy rule description vector, and BGE represents the pre-training model used.
In this way, the textual description of all the historical policy rules is converted into a multidimensional policy rule description vector that can be used for text retrieval. Subsequently, the policy rule description vector is stored in a vector database (such as FAISS, chroma DB), and an index is established to support subsequent similarity retrieval.
Step 320, based on the standardized policy rule description and the original text content, searching the historical similar policy rule description from the stored knowledge base of the historical policy rule description by using a similarity searching method.
In order to improve the accuracy and semantic coverage capability of the history similar policy rule retrieval, the method and the system retrieve the history policy rule description according to the extracted policy rule description and the corresponding original text so as to consider the similarity calculation difference brought by different expression modes.
Specifically, for any rule and original content in the policy rule and original mapping set R obtained in step 250, it is first converted into a vector form in the same manner as in step 312, and expressed as:
;
;
wherein, the An embedded vector representing textual content t i,An embedded vector representing the policy rule description r i.
Secondly, based on cosine similarity, respectively calculating、And similarity between each of the history policy rule description vectors in the history policy rule description repository stored in step 312, and extracting and associating with the history policy rule description vectors based on the similarity scoreAndThe K most similar vectors.
Finally, obtain andAndThe history rule text corresponding to the similarity vector of (c) is unified as a history rule similar to the policy rule description r i, expressed as:
;
wherein, the Represents the most similar history rule retrieved by policy rule description r i,Represents the most similar history rule retrieved by policy rule description r i,A set of similar history rules representing a policy rule description r i.
Step 330, based on the history similar policy rule description, design policy rule comparison analyzes the promt template.
In order to realize the difference recognition and change classification between the current policy rules and the history similar policy rules, a template for comparing and analyzing the policy rules is designed and used for guiding a large language model to carry out item-by-item comparison analysis on the input similar history policy rules on the basis of understanding rule semantics, recognizing change conditions and outputting structural comparison results, wherein three similarity judgment results are designed, including complete consistency, high similarity and dissimilarity, respectively corresponding to the change types of the three rules, namely no change, no modification and no new addition, and analyzing the specific conditions or contents of the modification. The policy rule comparison analysis promt template is defined as P C, expressed as:
;
The meaning of G, S, I is the same as the Prompt template in step 230, R represents the input current policy rule description, R S is the similar history policy rule obtained by R retrieval, O C represents the output format of the policy rule comparison analysis task, wherein the output format comprises analysis description of the history rule, similarity judgment result of the history rule, judgment thinking process, change analysis result and the like, and E C represents a rule comparison analysis result output example. Policy rule comparison analysis Prompt is specifically shown in FIG. 9.
And 340, constructing a policy rule comparison agent by combining the policy rule comparison analysis promt template and the large language model.
And (3) combining the policy rule comparison analysis template P C constructed in the step 330 and the large language model M, constructing a policy rule comparison agent, and realizing an automatic flow of rule comparison analysis. The structure of policy rule comparison agent a C is expressed as:
;
wherein R and R S have the same meaning as described in step 330.
Step 350, inputting the history similar policy rule description into a policy rule comparison agent, obtaining the similar history rule, performing deep semantic comparison and change analysis, identifying the newly added and modified policy terms, and outputting the structured change detail.
For the policy rule and original text mapping set R obtained in step 250, each standardized policy rule description is similar to the history policy rule obtained by the method in the original text content application step 320, and the history policy rule is sequentially input into the policy rule comparison agent, and the comparison analysis result between the obtained policy rule and the history policy is finally obtained, so that the change types and change detail results of all the policy rules are obtained, and the process is expressed as follows:
;
;
In the above-mentioned method, the step of, AndThe method is characterized by respectively representing a policy rule description r i and a certain historical similar policy rule description, wherein the policy rule compares the policy change type output by an agent with a change detail analysis result, the change type comprises modification and new addition, and CR represents the comparison and analysis result of all rules in a policy file. For example, the number of the cells to be processed,Output is 2, representing that the rule is a modification to the history rule,The specific differences are included, such as the rule newly adds an "xx" condition or modifies the "xx" condition to be a "yy" condition.
The method has the effects that:
the invention takes the partial content of the policy file related to 'electricity transmission and distribution price and power grid enterprise proxy electricity purchasing implementation work' of a certain area as an example to show the effect of the invention.
As shown in fig. 2, after the analysis of the policy file in step 100, it can be seen that after the analysis, the original policy file is divided into a plurality of paragraph texts with moderate lengths, and the title information and sentence number information content are added into the paragraph texts, so that the original hierarchical structure is maintained.
Fig. 3 shows the recognition result of the recognition of the core sentence, and it can be seen that, based on the Prompt of the recognition of the core sentence, the large language model recognizes the input paragraph text, and then determines that the second sentence in the input is an irrelevant sentence, and only the sentence content related to the policy rule is retained after recognition.
Fig. 4 shows the result of policy rule extraction, and it can be seen that, for the inputted filtered text, two policy rules are successfully extracted through policy rule comparison and analysis Prompt, and the original text expression statement can be converted into a standardized and easily understood rule description.
FIG. 5 compares the effect of the rule extraction method of the present invention with that of directly using a general large language model, namely, directly inputting an original document into the large language model, and directly prompting it to extract rules for data, and outputting the extracted rule contents. As can be seen from comparison of FIG. 5, in the same input range, the rule expression extracted directly by using the general large language model is more generalized, basically the copy of the original text content, and part of the condition content is missing and inaccurate, but the result of the invention is relatively fine, clear and more accurate.
FIG. 6 shows the result of comparative analysis with the historical policy rules, including the current policy rules entered, the set of similar historical policy rules retrieved, and the output results of the large language model, it can be seen that by the comparative analysis method of the present invention, it is successfully judged that the present rule modifies the historical rules, and the new rule complements the relevant description of the "capacitance range and the optional electricity rate".
The foregoing embodiments are merely for illustrating the technical solution of the present invention, but not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the technical solution described in the foregoing embodiments may be modified or substituted for some of the technical features thereof, and that these modifications or substitutions should not depart from the spirit and scope of the technical solution of the embodiments of the present invention and should be included in the protection scope of the present invention.
Claims (7)
1. The policy file intelligent rule extraction and change comparison method for the electric charge check is characterized by comprising the following steps of:
analyzing the electric charge related policy file, reserving the context level and logic structure information of the original electric charge related policy file in the analysis process, and optimizing the analyzed content;
Designing a core sentence identification promt template, constructing a policy rule extraction agent, identifying a core sentence from the optimized content, extracting element information related to the rule, converting the element information into standardized policy rule description, and outputting the standardized policy rule description corresponding to the original content;
Constructing a history policy rule description knowledge base, searching for history similar policy rule descriptions from the history policy rule description knowledge base based on the standardized policy rule descriptions and the original text content, and then carrying out deep semantic comparison and change analysis on the current standardized policy rule descriptions and the history similar policy rule descriptions by constructing a policy rule comparison agent to identify newly added and modified policy clauses and output structured change details;
Wherein, design core sentence discernment Prompt template includes:
the core statement recognition Prompt template is defined as P R, expressed as:
;
in the above formula, G represents a set target description; the method comprises the steps of inputting paragraph text blocks with number marks, S, I, O R and S, wherein S represents a recognition step prompt given by the S to guide a large language model to carry out thinking judgment, I represents set important matters, namely contents needing important attention;
constructing policy rules extraction agents, comprising:
based on the core sentence identification Prompt template, creating a core sentence screening tool T:
;
based on the screened core sentences, the design rule extracts and converts the Prompt template P E:
;
combining a core sentence screening tool, a rule extraction and transformation Prompt template and a large language model, and constructing a policy rule extraction agent A E:
;
Wherein M represents a large language model, F represents a sentence filtering flow, c f represents paragraph text screened by a core sentence screening tool, O E represents an output format of a rule extraction conversion task, and E X represents a rule extraction result example which is provided for the large language model for reference.
2. The method for intelligent rule extraction and change comparison of policy documents for electric charge checking according to claim 1, wherein the analyzing the policy documents related to electric charge comprises:
reading the content of the electric charge related policy file, and analyzing the electric charge related policy file by a design file content and structure analysis method, wherein the design file content and structure analysis method comprises the following steps:
For a Word format document, a python-docx-based analysis module is adopted to identify and disassemble the text content structure of the Word format document, the original title level and number structure in the Word format document are reserved in the identification result, different types of identifiers are respectively assigned to the text content, and the reading result is stored as a set containing document style attributes and corresponding contents according to the sequence of the original Word format document;
And for the PDF format document, carrying out alignment identification by utilizing PaddleOCR tools, carrying out semantic segmentation on page elements by using layout analysis technology on the content of each page, obtaining the space position of a text block, the text content and the text type of each page, and storing the text block, the text content and the text type as an analysis content set according to the original PDF format document sequence.
3. The method for intelligent rule extraction and change comparison of policy documents for electric charge checking according to claim 1, wherein the method is characterized by optimizing the analyzed content, and specifically comprising:
For each text paragraph in the analyzed content, splicing the title content and the type of the text paragraph into the text paragraph to form a new text paragraph, and blocking the paragraphs with the lengths larger than the preset length according to the maximum blocking length;
and dividing each paragraph block according to sentences, numbering each sentence, and merging the sentences into paragraph texts, namely numbered paragraph blocks.
4. The method for intelligent rule extraction and change comparison of policy documents for electric charge checking according to claim 1, wherein the method for extracting element information related to rules, converting the element information into standardized policy rule descriptions, and outputting the standardized policy rule descriptions corresponding to original text contents comprises the steps of:
inputting the numbered paragraph blocks into a policy rule extraction agent, identifying a core sentence with rule properties based on a core sentence screening tool, extracting specific rule elements from the core sentence by using the agent, and outputting standardized and element-clear policy rule expression and corresponding text numbers;
and acquiring the original text content from the paragraph block text according to the original text number, obtaining the original text content corresponding to each policy rule description, and forming the mapping relation between the rules and the original text.
5. The method for intelligent rule extraction and change comparison of policy documents for electric charge checking according to claim 1, wherein constructing a history policy rule description knowledge base, and searching for history similar policy rule descriptions from the history policy rule description knowledge base based on the standardized policy rule descriptions and original text content, comprises:
extracting description information of the historical power checking rule and a corresponding identifier id from a table of the historical power checking rule, and constructing a historical policy rule description knowledge base;
Based on the policy rule description and the corresponding original text content extracted from the administrative policy file, the similarity retrieval method is utilized to respectively retrieve and acquire the similarity history rule description from the knowledge base, and the similarity difference of different expression modes is considered.
6. The method for intelligent rule extraction and change comparison of policy documents for electricity fee checking according to claim 5, wherein constructing policy rule comparison agent comprises:
based on the historical similar policy rule description, the design policy rule comparison analyzes the promt template P C:
;
Based on the constructed policy rule comparison and analysis promt template and the large language model, constructing a policy rule comparison agent A C:
;
Wherein G represents a set target description; S represents a recognition step prompt given to guide a large language model to conduct thinking judgment, I represents a set important item, R represents an input current policy rule description, R S is a similar historical policy rule obtained by R retrieval, O C represents an output format of a policy rule comparison analysis task, wherein the output format comprises analysis explanation of the historical rule, a similarity judgment result of the historical rule, a judgment thinking process and change analysis result content, and E C represents a rule comparison analysis result output example.
7. The method for intelligent rule extraction and change comparison of policy documents for electric charge checking according to claim 6, wherein the newly added and modified policy terms are identified and structured change details are output, and the process is expressed as:
;
;
In the above-mentioned method, the step of, AndThe method respectively represents a policy rule description R i and a certain history similar policy rule description, the policy rule compares the policy change type and change detail analysis result output by an agent, the change type comprises modification and new addition, CR represents the comparison and analysis result of all rules, and R Si represents the similar history policy rule obtained by searching the current policy rule description.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510953695.3A CN120449861B (en) | 2025-07-11 | 2025-07-11 | Policy file intelligent rule extraction and change comparison method for electric charge checking |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510953695.3A CN120449861B (en) | 2025-07-11 | 2025-07-11 | Policy file intelligent rule extraction and change comparison method for electric charge checking |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN120449861A CN120449861A (en) | 2025-08-08 |
| CN120449861B true CN120449861B (en) | 2025-10-17 |
Family
ID=96609631
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510953695.3A Active CN120449861B (en) | 2025-07-11 | 2025-07-11 | Policy file intelligent rule extraction and change comparison method for electric charge checking |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120449861B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120744073B (en) * | 2025-09-02 | 2025-11-18 | 烟台海颐软件股份有限公司 | Hierarchical knowledge network construction and retrieval method for intelligent electric charge questions and answers |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112182248A (en) * | 2020-10-19 | 2021-01-05 | 深圳供电局有限公司 | A Statistical Method for the Key Policy of Electricity Price |
| CN118820403A (en) * | 2024-07-24 | 2024-10-22 | 江苏风云科技服务有限公司 | Policy text denoising and related matters extraction method and system based on large model |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102394480B1 (en) * | 2020-09-29 | 2022-05-04 | 인제대학교 산학협력단 | Methods and systems for syntactic and semantic information extraction from plant procedures |
| US20250117863A1 (en) * | 2023-10-06 | 2025-04-10 | Vertex, Inc. | Generation of verbose tax category descriptions using a generative language model |
| CN119166662B (en) * | 2024-11-21 | 2025-05-16 | 烟台海颐软件股份有限公司 | A method for constructing SQL agent in power field based on KMDI chain |
| CN119903904A (en) * | 2024-12-30 | 2025-04-29 | 深圳供电局有限公司 | A system and method for constructing an electricity price policy knowledge base based on artificial intelligence |
| CN119963001A (en) * | 2025-01-20 | 2025-05-09 | 国网四川省电力公司天府新区供电公司 | A policy mining and intelligent interaction platform based on AI big model |
| CN120218079A (en) * | 2025-03-12 | 2025-06-27 | 中国人民解放军军事科学院军事科学信息研究中心 | Intelligence information integration method and device based on multi-agent task chain strategy large model |
-
2025
- 2025-07-11 CN CN202510953695.3A patent/CN120449861B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112182248A (en) * | 2020-10-19 | 2021-01-05 | 深圳供电局有限公司 | A Statistical Method for the Key Policy of Electricity Price |
| CN118820403A (en) * | 2024-07-24 | 2024-10-22 | 江苏风云科技服务有限公司 | Policy text denoising and related matters extraction method and system based on large model |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120449861A (en) | 2025-08-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111723215B (en) | Device and method for establishing biotechnological information knowledge graph based on text mining | |
| CN113961685A (en) | Information extraction method and device | |
| CN113159969B (en) | Financial long text rechecking system | |
| US20230028664A1 (en) | System and method for automatically tagging documents | |
| CN115292450B (en) | A method for constructing domain knowledge base of data classification and grading based on information extraction | |
| CN113196277A (en) | System for retrieving natural language documents | |
| CN113168499A (en) | Methods of Searching Patent Documents | |
| CN111061882A (en) | Knowledge graph construction method | |
| CN113486189A (en) | Open knowledge graph mining method and system | |
| CN116821376B (en) | Knowledge graph construction method and system in coal mine safety production field | |
| CN111008530A (en) | A complex semantic recognition method based on document word segmentation | |
| CN120449861B (en) | Policy file intelligent rule extraction and change comparison method for electric charge checking | |
| CN112257442A (en) | Policy document information extraction method based on corpus expansion neural network | |
| CN115438195A (en) | A method and device for constructing a knowledge map in the field of financial standardization | |
| CN119719386A (en) | Document generation method based on fusion of large model and knowledge graph | |
| CN115470319A (en) | Structured document demand rapid identification and entry organization management method | |
| CN118942104B (en) | A method and system for extracting structured information | |
| CN120197607A (en) | A method for automatically identifying differences between domestic and foreign standard documents | |
| CN120492608A (en) | Structured processing method, device, equipment and storage medium for contract text | |
| CN118395968A (en) | A method and system for automatic analysis of data classification and grading standard files | |
| CN118350371A (en) | A method and system for extracting token pairs from patent texts | |
| CN117668234A (en) | Text label dividing method, medium and electronic equipment | |
| CN117313721A (en) | Document management method and device based on natural language processing technology | |
| CN113961702A (en) | A method for extracting article title hierarchy | |
| CN120780916B (en) | Resource recommendation method and system based on hybrid search RAG |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |