[go: up one dir, main page]

CN120407878A - A RAG method and system for compliance analysis of multimodal financial documents - Google Patents

A RAG method and system for compliance analysis of multimodal financial documents

Info

Publication number
CN120407878A
CN120407878A CN202510490735.5A CN202510490735A CN120407878A CN 120407878 A CN120407878 A CN 120407878A CN 202510490735 A CN202510490735 A CN 202510490735A CN 120407878 A CN120407878 A CN 120407878A
Authority
CN
China
Prior art keywords
text blocks
text
multimodal
sub
financial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510490735.5A
Other languages
Chinese (zh)
Inventor
王新宇
周灵
池纪君
邰正晗
李铸洪
何海林
华雨晨
郭桐深
李木之
卢鹏
王苏羽晨
黄杰瑞
吴益洪
韩博喻
李维恩
李泽宇
马力恒
崔军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengguang Phantom Suzhou Technology Co ltd
Original Assignee
Chengguang Phantom Suzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengguang Phantom Suzhou Technology Co ltd filed Critical Chengguang Phantom Suzhou Technology Co ltd
Priority to CN202510490735.5A priority Critical patent/CN120407878A/en
Publication of CN120407878A publication Critical patent/CN120407878A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90348Query processing by searching ordered data, e.g. alpha-numerically ordered data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Biophysics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Biomedical Technology (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Accounting & Taxation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of large financial models, and provides a RAG method and a system for compliance analysis of multi-modal financial documents, wherein the RAG method comprises the steps of preprocessing an input multi-modal financial document, generating corresponding text blocks from multi-modal data in the multi-modal financial document, and constructing a corresponding vector database; the method comprises the steps of responding to a target query request, entering a multi-path searching process according to the target query statement to generate a bundle package for responding to a sub-query statement, calculating the semantic alignment degree of each bundle package and the sub-query statement based on the bundle package generated by the multi-path searching module, calculating corresponding time rewards according to a time rewards mechanism to obtain a matching score, ranking and optimizing the matching score through a direct preference optimizing mechanism to obtain a preference text block set to respond to generating a corresponding answer, and effectively solving the defects of the traditional method in the aspects of processing heterogeneous financial data and improving the retrieval recall rate and the correlation by integrating a preprocessing, searching and reordering module of financial documents.

Description

RAG method and system for multi-mode financial document compliance analysis
Technical Field
The invention relates to the technical field of large financial models, in particular to a RAG method and system for multi-mode financial document compliance analysis.
Background
In the financial industry, with the evolution of regulations, financial institutions are faced with increasingly complex compliance requirements. To address these challenges, more and more financial institutions seek a question-and-answer (QA) system that can efficiently retrieve and analyze compliance information. These systems typically require the use of external knowledge bases or databases in combination with retrieval enhancement generation (RAG) techniques to enhance the capabilities of the language model, thereby improving the accuracy of decision support. The existing RAG system mostly depends on a single text retrieval method, such as dense retrieval or sparse lexical matching, and the text content in the financial document is analyzed and extracted. However, financial documents often contain multimodal data, such as unstructured text, semi-structured forms, images, etc., and existing RAG methods have limitations in processing such multimodal information.
At present, the conventional RAG system has a disadvantage in processing multi-mode data, especially when facing complex structured and unstructured data in financial documents, the conventional RAG system cannot effectively integrate different types of data, so that context information is fragmented or lost. In addition, the existing retrieval method mainly relies on semantic similarity for matching, and ignores specific hidden supervision relations and field standards in the financial field, so that the accuracy and compliance of the retrieval result are affected. Meanwhile, the conventional sorting method fails to fully consider the importance of compliance, so that key legal or financial information is easily missed, and the reliability and effect of retrieval generation are further affected.
In order to solve the above-mentioned problems, a technology capable of effectively processing a multi-modal financial document and accurately recognizing compliance requirements is needed.
Disclosure of Invention
The application provides a RAG method and a system for multi-modal financial document compliance analysis, which overcome the limitations in the prior art, and can comprehensively process multi-modal information such as texts, forms and images in financial documents by combining a multi-modal preprocessing component, a flexible multi-path retrieval module and a domain-specific document reordering module, thereby improving the accuracy and efficiency of information retrieval. In addition, through the special reordering technique in the field, ensure the high accuracy of system in compliance analysis, provide an extensible, practical compliance analysis solution for financial institutions:
the invention provides a RAG method for multi-mode financial document compliance analysis, which comprises the following steps:
Step S1, preprocessing an input multi-mode financial document, generating corresponding text blocks by using multi-mode data in the multi-mode financial document through a file preprocessing module, and vectorizing and encoding all preprocessed text blocks and metadata thereof by using a BGE-M3 compact encoder to construct a corresponding vector database;
step S2, responding to a target query request;
step S3, entering a multi-path searching process according to a target query statement in the target query request, wherein the multi-path searching stage process is a process of decomposing the target query statement into at least one or more sub-query statements, searching text blocks matched with the sub-query statements from a vector database through a multi-path searching module, and dynamically binding, expanding and combining the text blocks according to a similarity threshold value to generate a binding package for responding to the sub-query statements;
and S4, calculating the semantic alignment degree of each bundle package and sub-query sentences through a domain-specific document reordering module based on the bundles generated by the multi-path retrieval module, calculating corresponding time rewards according to a time rewards mechanism to obtain matching scores, and performing ranking optimization on the matching scores through a direct preference optimization mechanism to obtain corresponding preference text block sets to respond to target query sentences and generate corresponding answers.
Further, in step S1, preprocessing an input multi-modal financial document, generating corresponding text blocks from multi-modal data in the multi-modal financial document by a file preprocessing module, vectorizing and encoding all the preprocessed text blocks and metadata thereof by using a BGE-M3 compact encoder, and constructing a corresponding vector database, including:
Step S11, acquiring a financial document for processing a target query request;
Step S12, decomposing multi-modal modules including text, image and form according to the original layout sequence of the multi-modal data of the financial document by using an open source PDF tool MinerU;
S13, converting the multi-mode blocks into text blocks through a large language model, converting the image blocks into structured text summaries as the text blocks of the images through a visual language model, and converting the table blocks into text descriptions with consistent structures as the text blocks of the tables;
And step S14, processing the text blocks based on the semantic enhancement technology, and outputting a text block set for constructing a vector database.
Further, in step S14, the text blocks are processed based on the semantic enhancement technique, and a text block set for constructing a vector database is output, including:
The cosine similarity among different text blocks is calculated through SBERT sentences in an embedding mode, and when the cosine similarity exceeds a set threshold value, the text blocks are combined to finish redundancy and duplicate removal;
Performing reference resolution by using a large language model, identifying and replacing the pronouns in the same section by context information through iterative resolution, and dividing different references representing the same entity into an equivalent set for solving the problem of reference ambiguity in a text block;
Structured metadata is added to each text block using a large language model, the metadata including chapter headers, page locations, and data types.
And carrying out text vectorization coding on all the preprocessed text blocks and metadata corresponding to the text blocks through a BGE-M3 dense encoder, and constructing a vector database for target query statement query of the client.
Further, in step S3, according to the target query statement in the target query request, a multi-path search process is entered, including:
s31, when a client initiates a target query request to perform database query, an executor disassembles a target query sentence into a plurality of independent sub-query sentences by using a natural language processing technology through a large language model, replaces pronouns in the independent sub-query sentences with clear entities through coreference resolution, and automatically associates context information;
step S32, a multi-path searching module calculates similarity scores between sub-query sentences and document blocks through a plurality of retrievers respectively, wherein the retrievers comprise a BM25 sparse retriever, a FAISS dense retriever, a metadata retriever and a HyDE retriever;
step S33, carrying out weighted fusion on the similarity score of each retriever and a preset weight coefficient of each retriever, sequencing text blocks according to the scores after weighted fusion, and selecting the first K text blocks with the scores from high to low as candidate text blocks, wherein K is a preset natural numerical value larger than 0;
And step S34, carrying out dynamic binding expansion combination on the candidate text blocks based on the similarity threshold value, and generating a binding package for responding to the sub-query statement.
Further, in step S34, the candidate text blocks are dynamically bound, expanded and combined based on the similarity threshold, and a bundle package for responding to the sub-query statement is generated, which includes:
performing preliminary retrieval by taking the candidate text blocks as independent units, and calculating corresponding intensive embeddings through a pre-selected trained text embedding model, wherein the intensive embeddings are vector representations of the candidate text blocks;
cosine similarity of adjacent candidate text blocks based on dense embedding calculation and used for judging content correlation of adjacent candidate text boxes
Judging whether cosine similarity of adjacent candidate text blocks reaches a similarity threshold value or not;
And dynamically merging the adjacent candidate text blocks into a bundle when the cosine similarity of the adjacent candidate text blocks reaches a similarity threshold, otherwise, not merging and keeping independence.
Further, in step S4, based on the bundles generated by the multi-path retrieval module, the semantic alignment degree of each bundle and the sub-query statement is calculated by the domain-specific document reordering module, and the calculation formula for obtaining the matching score by calculating the corresponding time reward score in combination with the time reward mechanism is as follows:
;
Wherein, the Matching scores of the sub-query sentences and candidate text blocks in the bundle; Calculating semantic alignment of the bundle and sub-query statements through a cross encoder; the function is a Sigmoid function and is used for mapping the calculation result to a [0,1] interval; For the transposition of the weight vector W, Awarding points for time; is a bias term.
Further, in step S4, ranking optimization is performed on the matching scores through a direct preference optimization mechanism, for obtaining a corresponding set of preferred text blocks to respond to the target query and generate a corresponding answer, including:
Preliminarily calculating a matching score between the text block and the query sentence in a BAAI/bge-reranker-v2-Gemma model of the document reordering module;
Constructing positive and negative sample pairs, and using a direct preference optimization mechanism to adjust BAAI/bge-reranker-v2-Gemma model weights to optimize the ordering of text blocks by minimizing a cross entropy loss function, so as to form a corresponding preference text block set;
And generating a final answer responding to the target query statement according to the preference text block set.
Further, the cross entropy loss function is calculated by the formula,
;
Wherein, the As a match score for a positive sample,A match score that is a negative sample,Is the expected value of the positive and negative samples.
Based on the same inventive concept, the present invention provides a RAG system for multi-modal financial document compliance analysis, performing a RAG method for multi-modal financial document compliance analysis as described above, comprising:
the file preprocessing module is used for preprocessing an input multi-mode financial document, generating corresponding text blocks from multi-mode data in the multi-mode financial document, and vectorizing and encoding all the preprocessed text blocks and metadata thereof by adopting a BGE-M3 dense encoder to construct a corresponding vector database;
The response module is used for responding to the target query request;
The multi-path searching stage process is a process of decomposing the target query statement into at least one or more sub-query statements, searching text blocks matched with the sub-query statements from a vector database, and dynamically binding and expanding the text blocks according to a similarity threshold value to generate a binding packet for responding to the sub-query statements;
The document reordering module is used for calculating the semantic alignment degree of each bundle package and sub-query sentences through the domain-specific document reordering module based on the bundle packages generated by the multi-path retrieval module, calculating corresponding time rewards according to a time rewards mechanism to obtain matching scores, and ranking and optimizing the matching scores through a direct preference optimization mechanism to obtain corresponding preference text block sets to respond to target query sentences and generate corresponding answers
Further, the calculation formula of the matching score is:
;
Wherein, the Matching scores of the sub-query sentences and candidate text blocks in the bundle; Calculating semantic alignment of the bundle and sub-query statements through a cross encoder; the function is a Sigmoid function and is used for mapping the calculation result to a [0,1] interval; For the transposition of the weight vector W, Awarding points for time; is a bias term.
Compared with the prior art, the invention has at least one of the following beneficial effects:
(1) The invention provides a RAG method and a system for multi-mode financial document compliance analysis, which effectively solve the defects of the traditional method in the aspects of processing heterogeneous financial data and improving retrieval recall rate and correlation by integrating preprocessing, retrieving and reordering modules of financial documents. The system utilizes a mixed retrieval strategy and a DPO reordering technology special for the field, not only improves the accuracy of information extraction, but also can preferentially identify the compliance key content, ensures the high quality and high compliance of answer generation, shows remarkable performance improvement through comprehensive experimental verification, and particularly exceeds the existing baseline method in accuracy and recall rate in financial compliance tasks.
(2) The invention provides a high-efficiency and accurate end-to-end gold fusion rule question-answering method by integrating preprocessing, retrieving and reordering steps of financial documents, which can process multi-mode financial data and ensure accurate extraction of compliance information.
(3) The invention can uniformly process heterogeneous data formats such as texts, tables, images and the like through the multi-mode file preprocessing module, and generate the structured vector database, thereby effectively overcoming the defect that the traditional method cannot process complex financial document data.
(4) The invention introduces a multi-path retrieval module, comprising a dense retriever, a sparse retriever, a metadata retriever and a HyDE retriever, improves the recall rate and the correlation of complex problems, and has obvious advantages particularly when processing cross-modal and cross-document complex financial data.
(5) According to the invention, the reordering model is finely adjusted through the direct preference optimization mechanism, the key information related to compliance is preferentially presented, irrelevant contents are restrained, and the generated answer is ensured to meet the supervision requirement.
Drawings
FIG. 1 is a flow chart of steps of a RAG method for multimodal financial document compliance analysis of the present invention;
FIG. 2 is a schematic diagram of the operation of the RAG method for compliance analysis of multimodal financial documents of the present invention;
FIG. 3 is a schematic diagram of a query flow in an embodiment of the present invention;
FIG. 4 is a diagram of a working frame of a file preprocessing component in an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
First embodiment
The application of the existing retrieval enhancement generation RAG method in the financial document compliance analysis faces several core challenges. First, financial documents often contain multiple types of data forms, including unstructured text (e.g., narrative disclosure), semi-structured data (e.g., forms, images), and structured data, which traditional text-based retrieval methods cannot effectively handle, resulting in fragmentation of information and loss of context, thereby affecting the comprehensiveness and accuracy of compliance analysis. Secondly, existing RAG methods generally rely on dense search or sparse matching techniques, which, although performing well in certain general tasks, lack implicit regulatory relationships specific to the financial field and deep capture of legal terms, so that when facing financial documents, the system easily ignores critical compliance information or generates inference errors, thereby reducing the effectiveness and reliability of the system. Furthermore, prior art search rankings are often based on semantic similarity, rather than domain-specific compliance priorities, which results in the failure of timely and accurate presentation of critical information, which may impact compliance decisions by financial institutions, increasing regulatory risks. Finally, due to frequent updates of financial regulations, existing systems lack sufficient dynamic adaptability to quickly respond to regulatory changes, resulting in inadequate stability of the system for long-term use in regulatory environments.
Specifically, existing search methods mainly match by calculating semantic similarity between texts, i.e., find documents or paragraphs that are closest in meaning to the query content. However, this approach ignores implicit regulatory relationships and domain standards that are unique to the financial domain. There are many specific rules and requirements (e.g., compliance requirements, legal terms, financial indicators, etc.) in the financial arts that are often not expressed by direct semantic similarity, but rather are related to specific regulatory provisions, regulatory interpretations, or industry practices, etc. These "implicit" non-explicit regulatory relationships and domain criteria may not be captured solely by semantic similarity matching. Thus, existing retrieval methods may not adequately identify and process these critical, domain-specific content, resulting in incomplete or inaccurate results. That is, existing methods focus only on semantic matching of surfaces, and ignore some of the implicit regulations and standards that must be followed in the financial arts, which may affect the accuracy and reliability of compliance analysis.
Based on the above-mentioned problems, the inventor proposes FinSage a framework based on thinking to solve the key difficulties in the compliance analysis of financial documents fundamentally. The FinSage framework in the present application proposes a comprehensive and robust solution by innovatively combining multi-modal preprocessing, domain-aware retrieval strategies, and compliance prioritization mechanisms. Aiming at the problem of multi-mode data processing, finSage converts heterogeneous data formats such as texts, tables, images and the like into structured data through a file preprocessing component, so that the problem of processing fragmented information by a traditional method is solved, and the efficient processing of different types of data in a unified frame is ensured. Secondly, through a multi-path retrieval module, sparse retrieval, dense retrieval, metadata perception semantic search and hypothesis expansion retrieval (HyDE) are combined, and hidden supervision relations and legal terms in financial documents are accurately captured, so that key information is ensured not to be missed. In the aspect of the priority ordering of the compliance, the special reordering module based on the direct preference optimization ensures the priority presentation of the compliance key content, simultaneously suppresses irrelevant information, and improves the processing efficiency and the accuracy of the compliance decision of the document. In summary, the method for generating RAG by search enhancement provided by the application has the capabilities of dynamic adaptation and real-time updating, and can keep high-efficiency operation in the environment with continuously changing financial supervision standards, thereby fully meeting the demands of modern financial institutions on compliance, accuracy and flexibility. The specific implementation mode is as follows:
as shown in fig. 1 and 2, the present invention provides a RAG method for compliance analysis of multi-modal financial documents, comprising:
Step S1, preprocessing an input multi-mode financial document, generating corresponding text blocks by using multi-mode data in the multi-mode financial document through a file preprocessing module, and vectorizing and encoding all preprocessed text blocks and metadata thereof by using a BGE-M3 compact encoder to construct a corresponding vector database;
step S2, responding to a target query request;
step S3, entering a multi-path searching process according to a target query statement in the target query request, wherein the multi-path searching stage process is a process of decomposing the target query statement into at least one or more sub-query statements, searching text blocks matched with the sub-query statements from a vector database through a multi-path searching module, and dynamically binding, expanding and combining the text blocks according to a similarity threshold value to generate a binding package for responding to the sub-query statements;
and S4, calculating the semantic alignment degree of each bundle package and sub-query sentences through a domain-specific document reordering module based on the bundles generated by the multi-path retrieval module, calculating corresponding time rewards according to a time rewards mechanism to obtain matching scores, and performing ranking optimization on the matching scores through a direct preference optimization mechanism to obtain corresponding preference text block sets to respond to target query sentences and generate corresponding answers.
The specific process is shown in fig. 3 and 4, the file preprocessing module performs preprocessing on an input financial document or financial file, including text encoding and semantic enhancement, wherein the text encoding mainly extracts multi-mode blocks including texts, images, tables and the like for an open-source PDF tool, converts the multi-mode blocks into text representations through a large language model LLM, and the semantic enhancement mainly includes three parts, namely (a) removing redundant blocks through similarity comparison, (b) resolving common fingers in subtitle chapters, and (c) generating abstracts based on the subtitle chapters as metadata. The enhanced blocks are embedded into a vector database for subsequent processing. Receiving a target query statement in a database query at the client, such as a user query how the Lotus Tech performed in 2024, how it was planned to scale up in this year,
For a target query statement of a user query, the multi-path search module firstly performs query transformation, including query decomposition, divides the target query statement into a plurality of independent sub-query statements, and performs common reference analysis and context integration. For example, what is the core marketing strategy grown in Lotus Technology inc.2025 and what is the sales data in Technology inc.2024 are divided, what is based on the divided sub-query sentences, the multi-path search module performs matching search through a BM25 sparse searcher, a FAISS dense searcher, a metadata searcher and a HyDE searcher respectively, dynamically binds text blocks according to a similarity threshold, and generates a bundle combination for responding to the sub-query sentences. And carrying out re-optimization sequencing through a document reordering module according to the generated bundle combination, and generating a final answer for responding to the target query statement.
Further, in step S1, preprocessing an input multi-modal financial document, generating corresponding text blocks from multi-modal data in the multi-modal financial document by a file preprocessing module, vectorizing and encoding all the preprocessed text blocks and metadata thereof by using a BGE-M3 compact encoder, and constructing a corresponding vector database, including:
Step S11, acquiring a financial document for processing a target query request;
Step S12, decomposing the multi-modal data into three multi-modal modules of text, image and form according to the original layout sequence of the multi-modal data of the financial document by using an open source PDF tool MinerU;
step S13, converting the multi-mode blocks into text blocks through a large language model, wherein the step comprises the steps of converting the image blocks into text blocks with structured text summaries as images through a visual language model, and converting the table blocks into text descriptions with consistent structures as the text blocks of the table, wherein the text descriptions highlight data trends;
And step S14, processing the text blocks based on a semantic enhancement technology, and outputting a text block set for constructing a structured vector database.
Further, in step S14, the text block is processed based on the semantic enhancement technology, and a text block set for constructing the vector database is output, including:
The cosine similarity among different text blocks is calculated through SBERT sentences in an embedding mode, and when the cosine similarity exceeds a set threshold value, the text blocks are combined to finish redundancy and duplicate removal;
Performing reference resolution by using a large language model, identifying and replacing the pronouns in the same section by context information through iterative resolution, and dividing different references representing the same entity into an equivalent set for solving the problem of reference ambiguity in a text block;
structured metadata is added to each text block using a large language model, the metadata including chapter headers, page locations, and data structures and time information.
And carrying out text vectorization coding on all the preprocessed text blocks and metadata corresponding to the text blocks through a BGE-M3 dense encoder, and constructing a vector database for target query statement query of the client.
It should be specifically noted that financial documents often contain complex and diverse data formats, including unstructured text (e.g., narrative disclosures), semi-structured data (e.g., forms and images), and contextual metadata. These multimodal data tend to be interdependent and closely related and cannot be handled independently. For example, descriptive text typically relates to detailed corporate financial information and market analysis, while tables and images present data trends and financial indicators, the combination of which truly reflects the corporate financial situation. However, conventional retrieval systems typically focus on processing of text data, with poor understanding and indexing of non-textual information such as images and forms, resulting in fragmentation of information or loss of context upon retrieval.
Because of the heterogeneity of these data types, traditional keyword-based retrieval methods (e.g., BM 25-based sparse retrieval) may not be able to fully capture the semantic relationships of documents, especially when the user's query involves complex financial problems. For example, a mere keyword match may ignore numerical trends in images or key data in tables, affecting the accuracy of the search. On the other hand, modern multi-modal methods require simultaneous processing and understanding of different types of data, thereby ensuring that all relevant information is comprehensively considered and efficiently extracted during retrieval.
In order to solve the challenges, the FinSage framework provided by the application uniformly converts information such as texts, tables, images and the like into a structured vector representation by adopting a multi-mode data file preprocessing component, so that different types of data are ensured to be uniformly processed under the same framework. The processing mode avoids the problem of fragmentation of the context information, improves the integration capability of the multi-mode information, and enables the retrieval system to respond to the complex query requirements in the financial field more comprehensively and accurately. The specific implementation process is as follows:
(1) Using the open source PDF tool MinerU, the document content is parsed into three classes of blocks (chunk) text blocks, image blocks, and form blocks in natural reading order. These sets of blocks are denoted as . Each block represents an independent portion of the document, facilitating subsequent processing and analysis.
(2) Converting the multi-modal blocks includes ① converting the information in the image into a structured text excerpt using a visual language model. For example, "2023 years of revenue is increased by 15%" as a description of image blocks is shown in fig. 1, providing clearer semantic information for subsequent text processing and analysis. ② The data in the form is converted into a text description with consistent structure, and the trend of the data is highlighted. For example, the title company operation cost will be described. What is more, the 8% decrease in the operational cost over the last year is converted into a textual description of the form blocks in order to extract and understand the trends and changes behind the data.
(3) Semantic enhancement, including ① calculating cosine similarity between different text blocks by using SBERT (Sentence-BERT) sentence embedding techniques. When the similarity of two text blocks reaches or exceeds a preset threshold (e.g., 0.95), they are considered repetitive, one of the blocks is merged, redundant content is removed, and information storage and extraction is optimized. ② For each text blockAdding structured metadataMetadata includes chapter title, page location, data type, and time information (e.g., "damage table-page 5-table-year 9, month 1 of 2024"), which helps to quickly locate and identify the source of the block and related information during subsequent retrieval. ③ Text normalization, including ①, extends the abbreviations presented herein to full form. For example, "EBITDA" is extended to "profit before tax return amortization", and normalization of the text is ensured. ② The digits representing the different formats are unified, for example, "$1.2M" is converted into $ 120 ten thousand ", ensuring a standardized treatment of the values. ③ For legal related terms, standardized substitutions are made, such as unifying "non-compliance" as "non-compliance" to promote readability and consistency of legal documents.
(4) Outputting the processed text block setWherein each text blockIs composed of five components including reinforcedMetadata (metadata)Text dense embeddingBM25 sparse vectorAnd metadata intensive embeddingIt is ensured that the multidimensional information of the document can be represented in the structured vector database.
Further, in step S3, according to the target query statement in the target query request, a multi-path search process is entered, including:
s31, when a client initiates a target query request to perform database query, an executor disassembles a target query sentence into a plurality of independent sub-query sentences by using a natural language processing technology through a large language model, replaces pronouns in the independent sub-query sentences with clear entities through coreference resolution, and automatically associates context information;
step S32, a multi-path searching module calculates similarity scores between sub-query sentences and document blocks through a plurality of retrievers respectively, wherein the retrievers comprise a BM25 sparse retriever, a FAISS dense retriever, a metadata retriever and a HyDE retriever;
step S33, carrying out weighted fusion on the similarity score of each retriever and a preset weight coefficient of each retriever, sequencing text blocks according to the scores after weighted fusion, and selecting the first K text blocks with the scores from high to low as candidate text blocks, wherein K is a preset natural numerical value larger than 0;
And step S34, carrying out dynamic binding expansion combination on the candidate text blocks based on the similarity threshold value, and generating a binding package for responding to the sub-query statement.
It should be specifically noted that, in general, the conventional RAG system mainly relies on dense search (such as semantic matching based on deep learning) or sparse lexical matching (such as BM25 algorithm), and these methods can provide better effects in general scenarios, but have the following limitations in application in the financial field:
① Conventional RAG systems often do not undergo specialized domain adaptation, and therefore, when processing financial documents, terms, regulatory requirements, or legal relationships of a particular domain may not be effectively captured, resulting in insufficient accuracy in answering complex compliance questions.
② Often, financial documents contain implicit or indirect regulatory requirements, and traditional search methods based on semantic similarity or keyword matching may not accurately capture these implicit relationships, thereby affecting the answers to compliance questions.
In order to solve the above problems, the present application introduces the following two innovative steps:
combining sparse (e.g., BM 25) and dense (e.g., FAISS model-based semantic matching) searches, and enhancing the understanding of legal terms, policy regulations, and industry standards in financial documents through domain-specific fine tuning. For example, the search model is domain adapted by custom data sets to identify regulatory keywords such as "major defects", "non-compliance" and the like.
② In the retrieval process, the query is weighted by combining metadata (such as chapter titles, form labels, context abstracts and the like) of the financial documents, so that the retrieval can better understand the structural information of the documents, and the explicit and implicit supervision relations in the documents can be effectively captured.
③ A large language model is used to generate hypothetical document paragraphs, extending the semantic scope of the query. This process helps the retrieval system identify and capture more implicit relationships by translating the relevant questions into hypothetical scenarios. For example, for the query "liquidity risk factor," the system directs the retrieval system to mine relevant regulatory document content by generating a hypothesis that the paragraph "liquidity coverage decrease may affect payability.
④ By combining multi-path search results and dense search, sparse search and metadata search, the recall rate is improved, and higher correlation search on complex financial problems is ensured. The scores of the paths are weighted and fused, so that the retrieval precision of related documents can be effectively improved, and finally, the document content is ensured to meet the compliance requirement.
Through the steps, the system can effectively overcome the defects of the traditional RAG system in the financial field, particularly in the aspects of capturing the supervision requirements and the implicit legal relations, and improves the performance of the system in the financial compliance task. The specific implementation process is as follows:
(1) After receiving the original target query statement of the user, query expansion is firstly carried out through a HyDE (hypothetical document expansion) method, and hypothetical documents are generated to expand query semantics, so that wider relevant information is captured. For example, for the query "financial risk factor", hyDE the expanded query generated may be "query 'financial risk factor' may involve liquidity ratios or debt terms" to expand the semantic scope of the original query.
(2) Based on the query expansion, the system promotes recall and relevance of the search by:
FAISS dense retrievers that calculate cosine similarity between queries and text blocks based on text dense embedding using the BGE-M3 model.
The BM25 sparse retriever matches keywords using a BM25 algorithm while weighting the metadata fields (e.g., chapter title weights x 2).
The metadata retriever retrieves the chapter abstract related to the query through metadata embedding and increases the retrieved context information. And splicing the chapter titles and the abstracts of each text block into unified metadata embedding, so as to ensure that all blocks in the same semantic segment share the same metadata representation. When a single metadata instance is searched, all blocks in the same chapter are automatically associated, cross-block context association is enhanced, and searching ambiguity of a multi-document scene is reduced.
And comprehensively scoring the results through weighted score fusion (weights are optimized through grid search), finally selecting Top-50 candidate text blocks, binding and expanding the searched candidate text blocks, splicing the candidate blocks and adjacent blocks thereof to form a context-coherent binding package (bundle), and solving the fragmentation problem.
In addition, the step introduction part of the multi-path search method further includes a query translation process, which mainly optimizes the input query by using a large language model, and includes:
query decomposition-splitting a complex query into separate sub-queries. For example, "2023 revenue growth and cost control measures" are broken down into "2023 revenue growth rate" and "2023 cost control strategy".
Coreference resolution-substitution of a pronoun for a specific entity. For example, "its financial risk" is converted to "XX company's financial risk".
Context integration-if the query involves a history dialogue, automatically associating context information.
Further, in step S34, the dynamic binding extension combining the candidate text blocks based on the similarity threshold, generating the bundle package for responding to the sub-query statement includes:
performing preliminary retrieval by taking the candidate text blocks as independent units, and calculating corresponding intensive embeddings through a pre-selected trained text embedding model, wherein the intensive embeddings are vector representations of the candidate text blocks;
cosine similarity of adjacent text blocks of candidate text blocks is calculated based on dense embedding and is used for judging content relativity of the adjacent text blocks of the candidate text blocks
Judging whether cosine similarity of adjacent text blocks reaches a similarity threshold value or not;
When the cosine similarity of the adjacent text blocks reaches a similarity threshold, adding the adjacent text blocks into the candidate block set, and dynamically combining the adjacent text blocks and the candidate text blocks into a bundle, otherwise, not combining and keeping independent.
It should be specifically noted that, to solve the problem of cross-block distribution of key information, a dynamic binding strategy is designed, including:
1. The initial search, from the beginning of the multi-path search phase, breaks the document into a plurality of independent text blocks, each of which is treated as a separate search object, to participate in the search. Each text block may be a paragraph, form, image or other unit of information from the document, the system will first evaluate the relevance of the text blocks to the query, and each retrieved text block will be assigned a matching score for subsequent processing.
2. And (3) adjacent expansion, namely dynamically expanding the candidate set through the similarity between the adjacent text blocks of the candidate text blocks and the user query, so as to solve the problem that key information is scattered in different blocks. For the text block which has been retrievedThe system will check the text blocks that are adjacent to each other (i.eAnd) And similarity between user queries. The cosine similarity between them is calculated mainly using dense embedding. When the cosine similarity of the neighboring text block and the user query is greater than a certain similarity threshold, for example, the set similarity threshold is 0.85, then the neighboring text block is considered to have sufficient semantic relevance. The similarity threshold is typically tuned to the document type and domain requirements to ensure that only text blocks highly relevant to the user query are merged. By the expansion, the adjacent text blocks related to the semantics can be gathered together, and the loss of fragmentation information is avoided. A binding example is where the candidate text block "2023 revenue" was found during the retrieval of the user query "2023 company net profit" and its neighboring "cost analysis" text blocks also have a higher similarity to the user query (e.g., the similarity reaches 0.86, exceeding the similarity threshold of 0.85). The two text blocks are combined into a bundle, named "2023 financial performance", to form a semantically coherent, complete context unit. Text blocks in the bundle are not necessarily simply spliced, but rather are reasonably combined according to their semantics and context. This binding helps provide a more consistent and comprehensive answer when answering a target query statement, especially when dealing with long text or information fragmentation.
Through the two steps, the problem of key information dispersion and fragmentation is effectively solved. Firstly, obtaining text blocks through initial retrieval, then dynamically judging and merging semantically related blocks through adjacent expansion, and finally, generating coherent contexts through binding adjacent blocks, so as to ensure the integrity of information and the accuracy of answers.
Further, the document reordering module calculates the semantic alignment degree of each bundle package and the sub-query statement through a cross encoder, and calculates a calculation formula of obtaining a matching score by combining a corresponding time rewarding score with a time rewarding mechanism, wherein the calculation formula is as follows:
;
Wherein, the Matching scores of the sub-query sentences and candidate text blocks in the bundle; Calculating semantic alignment of the bundle and sub-query statements through a cross encoder; the function is a Sigmoid function and is used for mapping the calculation result to a [0,1] interval; For the transposition of the weight vector W, Awarding points for time; is a bias term.
Further, ranking optimization of the matching scores via a direct preference optimization mechanism, generating a corresponding set of preferred text chunks to respond to the target query statement and generating a corresponding answer includes:
Preliminarily calculating a matching score between the text block and the query sentence in a BAAI/bge-reranker-v2-Gemma model of the document reordering module;
constructing positive and negative sample pairs, and using a direct preference optimization mechanism to adjust the ordering of BAAI/bge-reranker-v2-Gemma model optimized text blocks by minimizing a cross entropy loss function to form a corresponding preference text block set;
And generating a final answer responding to the target query statement according to the preference text block set.
Further, the cross entropy loss function is calculated by the formula,
;
Wherein, the As a match score for a positive sample,A match score that is a negative sample,Is the expected value of the positive and negative samples.
Specifically, the document reordering module uses BAAI/bge-reranker-v2-Gemma model as a base model, which is a multi-language and high-performance reordering model. The model can process text data from different languages and fields, and is suitable for cross-language and cross-field document ordering tasks.
Cross encoder CrossEncoder is used to calculate query q and text blockThe purpose of which is to evaluate semantic alignment between the query and candidate text blocks, ensuring that the returned text blocks are highly relevant to the query.
The time rewarding mechanism introduced by the application adjusts the score of the text block based on the release time of the text block. The function isThe score is dynamically adjusted based on the release time, and recently released text blocks will get a higher prize (e.g., a text block prize coefficient x 1.2 released within 1 year). This mechanism may ensure that the system favors more recent, more relevant content in the ranking.
The direct preference optimizing DPO in the application is to generate positive and negative sample pairs from multi-path search results, wherein positive samples are text blocks containing compliance keywords such as 'great litigation', negative samples are text blocks which are related to semantics but are not related to compliance, such as 'enterprise social responsibility', and the system can learn how to correctly distinguish compliance and non-compliance content through manual labeling and screening.
The iterative optimization flow of the DPO comprises the following steps:
And (5) searching and labeling, namely obtaining candidate documents through a FAISS equal vector search engine. These candidate documents are then manually annotated to generate a preference dataset comprising positive and negative pairs of samples. In this way, high quality training data can be provided to the model, enabling it to accurately distinguish between compliant and non-compliant content in subsequent learning.
And (3) model evaluation, namely measuring the performance of the model in the sorting task by adopting evaluation indexes such as NDCG normalized damage accumulation gain, MRR average reciprocal rank and the like. NDCG evaluates the relevance of the model to the document ordering, while MRR evaluates the efficiency of the model when the correct document is retrieved.
Dynamic adjustment, namely when the model performs poorly on the new test set, the data needs to be remarked and parameters of the model need to be fine-tuned. In this way, the model can adapt to new data and scenarios, constantly optimizing its performance.
Second embodiment
Based on the same inventive concept, the invention also provides a RAG system for multi-modal financial document compliance analysis, which executes the RAG method for multi-modal financial document compliance analysis, comprising:
Based on the same inventive concept, the present invention provides a RAG system for multi-modal financial document compliance analysis, performing a RAG method for multi-modal financial document compliance analysis as described above, comprising:
the file preprocessing module is used for preprocessing an input multi-mode financial document, generating corresponding text blocks from multi-mode data in the multi-mode financial document, and vectorizing and encoding all the preprocessed text blocks and metadata thereof by adopting a BGE-M3 dense encoder to construct a corresponding vector database;
The response module is used for responding to the target query request;
The multi-path searching stage process is a process of decomposing the target query statement into at least one or more sub-query statements, searching text blocks matched with the sub-query statements from a vector database, and dynamically binding and expanding the text blocks according to a similarity threshold value to generate a binding packet for responding to the sub-query statements;
The document reordering module is used for calculating the semantic alignment degree of each bundle package and sub-query sentences through the domain-specific document reordering module based on the bundle packages generated by the multi-path retrieval module, calculating corresponding time rewards according to a time rewards mechanism to obtain matching scores, and ranking and optimizing the matching scores through a direct preference optimization mechanism to obtain corresponding preference text block sets to respond to target query sentences and generate corresponding answers
Further, the calculation formula of the matching score is:
;
Wherein, the Matching scores of the sub-query sentences and candidate text blocks in the bundle; Calculating semantic alignment of the bundle and sub-query statements through a cross encoder; the function is a Sigmoid function and is used for mapping the calculation result to a [0,1] interval; For the transposition of the weight vector W, Awarding points for time; is a bias term.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1.一种用于多模态金融文档合规分析的RAG方法,其特征在于,包括:1. A RAG method for compliance analysis of multimodal financial documents, comprising: 步骤S1:对输入的多模态金融文档进行预处理,通过文件预处理模块将所述多模态金融文档中的多模态数据生成对应的文本块,采用BGE-M3密集编码器对所有预处理的所述文本块及其元数据进行向量化编码,构建对应的向量数据库;Step S1: Preprocessing the input multimodal financial document, generating corresponding text blocks from the multimodal data in the multimodal financial document using a file preprocessing module, and vectorizing all the preprocessed text blocks and their metadata using a BGE-M3 dense encoder to construct a corresponding vector database; 步骤S2:响应于目标查询请求;Step S2: responding to the target query request; 步骤S3:根据目标查询请求中的目标查询语句,进入多路径检索过程;所述多路径检索阶段过程为将所述目标查询语句分解为至少一个或多个子查询语句,并通过多路径检索模块从所述向量数据库中检索与所述子查询语句匹配的所述文本块,并根据相似度阈值对所述文本块进行动态捆绑扩展组合,生成用于响应所述子查询语句的捆绑包的过程;Step S3: Entering a multi-path retrieval process based on the target query statement in the target query request; the multi-path retrieval process is a process of decomposing the target query statement into at least one or more sub-query statements, retrieving the text blocks matching the sub-query statements from the vector database through a multi-path retrieval module, and dynamically bundling and expanding the text blocks based on a similarity threshold to generate a bundle package for responding to the sub-query statements; 步骤S4:基于多路径检索模块生成的所述捆绑包,通过领域专用文档重排序模块计算每个所述捆绑包与所述子查询语句的语义对齐度,结合时间奖励机制计算对应的时间奖励分获取匹配得分,通过直接偏好优化机制对匹配得分进行排名优化,用于获取对应的偏好文本块集来响应所述目标查询语句并生成对应答案。Step S4: Based on the bundles generated by the multi-path retrieval module, the semantic alignment between each bundle and the sub-query statement is calculated through the domain-specific document reranking module, and the corresponding time reward score is calculated in combination with the time reward mechanism to obtain the matching score. The matching score is ranked and optimized through the direct preference optimization mechanism to obtain the corresponding preference text block set to respond to the target query statement and generate the corresponding answer. 2.根据权利要求1所述的用于多模态金融文档合规分析的RAG方法,其特征在于,在步骤S1中,对输入的多模态金融文档进行预处理,通过文件预处理模块将所述多模态金融文档中的多模态数据生成对应的文本块,采用BGE-M3密集编码器对所有预处理的所述文本块及其元数据进行向量化编码,构建对应的向量数据库,包括:2. The RAG method for compliance analysis of multimodal financial documents according to claim 1, characterized in that, in step S1, the input multimodal financial document is preprocessed, and corresponding text blocks are generated from the multimodal data in the multimodal financial document using a file preprocessing module. All preprocessed text blocks and their metadata are vectorized using a BGE-M3 dense encoder to construct a corresponding vector database, including: 步骤S11:获取用于处理所述目标查询请求的所述金融文档;Step S11: obtaining the financial document for processing the target query request; 步骤S12:使用开源PDF工具MinerU按照所述金融文档的所述多模态数据的原始布局顺序分解包括文本、图像和表格三类的多模态块;Step S12: using the open source PDF tool MinerU to decompose the multimodal blocks including text, images, and tables according to the original layout order of the multimodal data of the financial document; 步骤S13:通过大语言模型将所述多模态块转换为所述文本块,通过视觉语言模型将图像块转换为结构化的文本摘要作为所述图像的文本块,并将表格块转换为结构一致的文本描述作为所述表格的文本块;Step S13: converting the multimodal block into the text block using a large language model, converting the image block into a structured text summary as the text block of the image using a visual language model, and converting the table block into a text description with consistent structure as the text block of the table; 步骤S14:基于语义增强技术对所述文本块进行处理,输出用于构建所述向量数据库的文本块集合。Step S14: Processing the text blocks based on semantic enhancement technology, and outputting a text block set for constructing the vector database. 3.根据权利要求2所述的用于多模态金融文档合规分析的RAG方法,其特征在于,在步骤S14中,基于语义增强技术对所述文本块进行处理,输出用于构建所述向量数据库的文本块集合,包括:3. The RAG method for compliance analysis of multimodal financial documents according to claim 2, characterized in that, in step S14, the text blocks are processed based on semantic enhancement technology to output a set of text blocks for constructing the vector database, including: 通过SBERT句嵌入计算不同所述文本块之间的余弦相似度,并且当所述余弦相似度超过设定的阈值时,将所述文本块合并,用于完成冗余去重;Calculate the cosine similarity between different text blocks through SBERT sentence embedding, and when the cosine similarity exceeds a set threshold, merge the text blocks to complete redundancy removal; 利用所述大语言模型进行指代消解,通过迭代解析同一章节中的代词,通过上下文信息识别并替换为明确实体,将代表同一实体的不同指称划分到一个等价集合,用于解决所述文本块中指代不明确的问题;Using the large language model to perform coreference resolution, pronouns in the same section are iteratively parsed, and replaced with clear entities based on contextual information. Different references representing the same entity are grouped into an equivalent set to resolve the problem of ambiguous references in the text block. 利用所述大语言模型为每个所述文本块添加结构化的元数据,所述元数据包括章节标题、页面位置和数据类型;Adding structured metadata to each of the text blocks using the large language model, the metadata including chapter title, page location, and data type; 将所有经过预处理的所述文本块以及所述文本块对应的所述元数据,通过所述BGE-M3密集编码器进行文本向量化编码并构建向量数据库,用于客户端的所述目标查询语句查询。All the pre-processed text blocks and the metadata corresponding to the text blocks are subjected to text vectorization encoding through the BGE-M3 dense encoder and a vector database is constructed for querying the target query statement of the client. 4.根据权利要求3所述的用于多模态金融文档合规分析的RAG方法,其特征在于,在步骤S3中,根据目标查询请求中的目标查询语句,进入多路径检索过程,包括:4. The RAG method for compliance analysis of multimodal financial documents according to claim 3, characterized in that in step S3, according to the target query statement in the target query request, a multi-path retrieval process is entered, comprising: 步骤S31:当客户端发起目标查询请求进行数据库查询时,执行器通过所述大语言模型使用自然语言处理技术将所述目标查询语句拆解为多个独立的所述子查询语句,并通过共指消解将所述独立子查询语句中的所述代词替换为所述明确实体,并自动关联所述上下文信息;Step S31: When the client initiates a target query request to query the database, the executor uses the large language model and natural language processing technology to decompose the target query statement into multiple independent sub-query statements, replaces the pronouns in the independent sub-query statements with the explicit entities through coreference resolution, and automatically associates the context information; 步骤S32:所述多路径检索模块通过多个检索器分别计算所述子查询语句与所述文档块之间的相似度得分,其中,所述检索器包括BM25稀疏检索器、FAISS密集检索器、元数据检索器和HyDE检索器;Step S32: the multi-path retrieval module calculates the similarity scores between the sub-query statement and the document block respectively through multiple search engines, wherein the search engines include a BM25 sparse search engine, a FAISS dense search engine, a metadata search engine, and a HyDE search engine; 步骤S33:基于各个所述检索器的所述相似度得分和所述检索器的预设权重系数进行加权融合,并根据所述加权融合后的得分对所述文本块进行排序,选择得分从高到低的前K个所述文本块作为候选文本块,其中,所述K为预设的大于0的自然数值;Step S33: performing weighted fusion based on the similarity scores of each of the retrievers and the preset weight coefficients of the retrievers, sorting the text blocks according to the weighted fusion scores, and selecting the top K text blocks with the highest to lowest scores as candidate text blocks, where K is a preset natural number greater than 0; 步骤S34:基于所述相似度阈值将所述候选文本块进行所述动态捆绑扩展组合,生成用于响应所述子查询语句的所述捆绑包。Step S34: dynamically bundling and expanding the candidate text blocks based on the similarity threshold to generate the bundle package for responding to the sub-query statement. 5.根据权利要求4所述的用于多模态金融文档合规分析的RAG方法,其特征在于,在步骤S34中,基于所述相似度阈值将所述候选文本块进行所述动态捆绑扩展组合,生成用于响应所述子查询语句的所述捆绑包,包括:5. The RAG method for compliance analysis of multimodal financial documents according to claim 4, characterized in that in step S34, the candidate text blocks are dynamically bundled and extended based on the similarity threshold to generate the bundle package for responding to the sub-query statement, comprising: 将所述候选文本块作为独立单元进行初步检索,并通过预选训练的文本嵌入模型计算对应的密集嵌入,其中,所述密集嵌入为所述候选文本块的向量表示;Performing preliminary retrieval on the candidate text blocks as independent units, and calculating corresponding dense embeddings using a pre-selected and trained text embedding model, wherein the dense embeddings are vector representations of the candidate text blocks; 基于所述密集嵌入计算相邻候选文本块的余弦相似度,用于判断所述相邻候选文本框的内容相关性Calculate the cosine similarity of adjacent candidate text blocks based on the dense embedding to determine the content relevance of the adjacent candidate text boxes 判断所述相邻候选文本块的余弦相似度是否达到所述相似度阈值;Determining whether the cosine similarity of the adjacent candidate text blocks reaches the similarity threshold; 当所述相邻候选文本块的余弦相似度达到所述相似度阈值,则将所述相邻候选文本块动态合并为捆绑包;否则不进行合并,保持独立。When the cosine similarity of the adjacent candidate text blocks reaches the similarity threshold, the adjacent candidate text blocks are dynamically merged into a bundle; otherwise, they are not merged and remain independent. 6.根据权利要求5所述的用于多模态金融文档合规分析的RAG方法,其特征在于,在步骤S4中,基于多路径检索模块生成的所述捆绑包,通过领域专用文档重排序模块计算每个所述捆绑包与所述子查询语句的语义对齐度,结合时间奖励机制计算对应的时间奖励分获取匹配得分的计算公式为:6. The RAG method for compliance analysis of multimodal financial documents according to claim 5, characterized in that in step S4, based on the bundles generated by the multi-path retrieval module, the domain-specific document re-ranking module calculates the semantic alignment between each bundle and the sub-query statement, and the time reward mechanism is used to calculate the corresponding time reward points to obtain the matching score. The calculation formula is: ; 其中,为所述子查询语句与所述捆绑包中的所述候选文本块的所述匹配得分;为通过交叉编码器计算所述捆绑包与所述子查询语句的所述语义对齐度;所述为Sigmoid 函数,用于计算结果映射到 [0, 1] 区间;为权重向量W的转置,为所述时间奖励分;为偏置项。in, The matching score between the sub-query statement and the candidate text block in the bundle; The method is to calculate the semantic alignment between the bundle and the sub-query statement through a cross encoder; is the Sigmoid function, which is used to map the calculation results to the interval [0, 1]; is the transpose of the weight vector W, awarding points for said time; is the bias term. 7.根据权利要求6所述的用于多模态金融文档合规分析的RAG方法,其特征在于,在步骤S4中,通过直接偏好优化机制对匹配得分进行排名优化,用于获取对应的偏好文本块集来响应目标查询并生成对应答案,包括:7. The RAG method for compliance analysis of multimodal financial documents according to claim 6, characterized in that, in step S4, the matching scores are ranked and optimized using a direct preference optimization mechanism to obtain a corresponding set of preferred text blocks to respond to the target query and generate a corresponding answer, including: 在所述文档重排序模块的BAAI/bge-reranker-v2-Gemma模型中初步计算所述文本块和所述查询语句之间的所述匹配得分;Preliminarily calculating the matching score between the text block and the query statement in the BAAI/bge-reranker-v2-Gemma model of the document reranking module; 构建正负样本对,并使用所述直接偏好优化机制,通过最小化交叉熵损失函数,调整所述BAAI/bge-reranker-v2-Gemma模型权重以优化所述文本块的排序,构成对应的偏好文本块集合;Constructing positive and negative sample pairs, and using the direct preference optimization mechanism to adjust the BAAI/bge-reranker-v2-Gemma model weights by minimizing the cross-entropy loss function to optimize the order of the text blocks and form a corresponding set of preferred text blocks; 根据偏好文本块集合,生成响应所述目标查询语句的最终答案。A final answer is generated in response to the target query statement based on the preferred text block set. 8.根据权利要求7所述的用于多模态金融文档合规分析的RAG方法,其特征在于,所述交叉熵损失函数的计算公式为,8. The RAG method for compliance analysis of multimodal financial documents according to claim 7, wherein the calculation formula of the cross entropy loss function is: ; 其中,为正样本的匹配得分,为负样本的匹配得分,为所述正负样本的期望值。in, is the matching score of the positive sample, is the matching score of the negative sample, is the expected value of the positive and negative samples. 9.一种用于多模态金融文档合规分析的RAG系统,执行如权利要求1至8任意一项所述的用于多模态金融文档合规分析的RAG方法,其特征在于,包括:9. A RAG system for compliance analysis of multimodal financial documents, executing the RAG method for compliance analysis of multimodal financial documents according to any one of claims 1 to 8, characterized by comprising: 文件预处理模块,用于对输入的多模态金融文档进行预处理,将所述多模态金融文档中的多模态数据生成对应的文本块,采用BGE-M3密集编码器对所有预处理的所述文本块及其元数据进行向量化编码,构建对应的向量数据库;a file preprocessing module for preprocessing an input multimodal financial document, generating corresponding text blocks from the multimodal data in the multimodal financial document, vectorizing all preprocessed text blocks and their metadata using a BGE-M3 dense encoder, and constructing a corresponding vector database; 响应模块,用响应于目标查询请求;A response module, used to respond to target query requests; 多路径检索模块,用于根据目标查询请求中的目标查询语句,进入多路径检索过程;所述多路径检索阶段过程为将所述目标查询语句分解为至少一个或多个子查询语句,并从所述向量数据库中检索与所述子查询语句匹配的所述文本块,并根据相似度阈值对所述文本块进行动态捆绑扩展组合,生成用于响应所述子查询语句的捆绑包的过程;A multi-path retrieval module is configured to enter a multi-path retrieval process based on a target query statement in a target query request; the multi-path retrieval process is a process of decomposing the target query statement into at least one or more sub-query statements, retrieving the text blocks matching the sub-query statements from the vector database, and dynamically bundling and expanding the text blocks based on a similarity threshold to generate a bundle for responding to the sub-query statements; 文档重排序模块,基于所述多路径检索模块生成的所述捆绑包,通过领域专用文档重排序模块计算每个所述捆绑包与所述子查询语句的语义对齐度,结合时间奖励机制计算对应的时间奖励分获取匹配得分,通过直接偏好优化机制对匹配得分进行排名优化,用于获取对应的偏好文本块集来响应所述目标查询语句并生成对应答案。A document re-ranking module, based on the bundles generated by the multi-path retrieval module, calculates the semantic alignment of each bundle with the sub-query statement through a domain-specific document re-ranking module, calculates the corresponding time reward score in combination with a time reward mechanism to obtain a matching score, and optimizes the ranking of the matching score through a direct preference optimization mechanism to obtain a corresponding set of preferred text blocks to respond to the target query statement and generate a corresponding answer. 10.根据权利要求9所述的用于多模态金融文档合规分析的RAG系统,其特征在于,所述匹配得分的计算公式为:10. The RAG system for compliance analysis of multimodal financial documents according to claim 9, wherein the calculation formula for the matching score is: ; 其中,为所述子查询语句与所述捆绑包中的所述候选文本块的所述匹配得分;为通过交叉编码器计算所述捆绑包与所述子查询语句的所述语义对齐度;所述为Sigmoid 函数,用于计算结果映射到 [0, 1] 区间;为权重向量W的转置,为所述时间奖励分;为偏置项。in, The matching score between the sub-query statement and the candidate text block in the bundle; The method is to calculate the semantic alignment between the bundle and the sub-query statement through a cross encoder; is the Sigmoid function, which is used to map the calculation results to the interval [0, 1]; is the transpose of the weight vector W, awarding points for said time; is the bias term.
CN202510490735.5A 2025-04-18 2025-04-18 A RAG method and system for compliance analysis of multimodal financial documents Pending CN120407878A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510490735.5A CN120407878A (en) 2025-04-18 2025-04-18 A RAG method and system for compliance analysis of multimodal financial documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510490735.5A CN120407878A (en) 2025-04-18 2025-04-18 A RAG method and system for compliance analysis of multimodal financial documents

Publications (1)

Publication Number Publication Date
CN120407878A true CN120407878A (en) 2025-08-01

Family

ID=96504029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510490735.5A Pending CN120407878A (en) 2025-04-18 2025-04-18 A RAG method and system for compliance analysis of multimodal financial documents

Country Status (1)

Country Link
CN (1) CN120407878A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120610686A (en) * 2025-08-11 2025-09-09 长江证券股份有限公司 Method, device, equipment and medium for realizing intelligent quality valve in demand submission process
CN120873029A (en) * 2025-09-23 2025-10-31 国网浙江省电力有限公司信息通信分公司 Sliding window-based electric power multi-mode corpus construction query method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120610686A (en) * 2025-08-11 2025-09-09 长江证券股份有限公司 Method, device, equipment and medium for realizing intelligent quality valve in demand submission process
CN120873029A (en) * 2025-09-23 2025-10-31 国网浙江省电力有限公司信息通信分公司 Sliding window-based electric power multi-mode corpus construction query method and system

Similar Documents

Publication Publication Date Title
CN119988588A (en) A large model-based multimodal document retrieval enhancement generation method
US9280535B2 (en) Natural language querying with cascaded conditional random fields
Gupta et al. A survey of text question answering techniques
US8639708B2 (en) Fact-based indexing for natural language search
JP7089513B2 (en) Devices and methods for semantic search
US9449081B2 (en) Identification of semantic relationships within reported speech
US10339453B2 (en) Automatically generating test/training questions and answers through pattern based analysis and natural language processing techniques on the given corpus for quick domain adaptation
US9286290B2 (en) Producing insight information from tables using natural language processing
US9715493B2 (en) Method and system for monitoring social media and analyzing text to automate classification of user posts using a facet based relevance assessment model
US8595245B2 (en) Reference resolution for text enrichment and normalization in mining mixed data
RU2488877C2 (en) Identification of semantic relations in indirect speech
US20150227505A1 (en) Word meaning relationship extraction device
CN102253930B (en) A kind of method of text translation and device
US20040117352A1 (en) System for answering natural language questions
US20110078192A1 (en) Inferring lexical answer types of questions from context
CN120407878A (en) A RAG method and system for compliance analysis of multimodal financial documents
CN113076411A (en) Medical query expansion method based on knowledge graph
CN111814485A (en) Semantic analysis method and device based on massive standard document data
CN119415657A (en) A search method and a search device
US20240046039A1 (en) Method for News Mapping and Apparatus for Performing the Method
US20240070175A1 (en) Method for Determining Company Related to News Based on Scoring and Apparatus for Performing the Method
CN120874999B (en) Knowledge base enhancement generation method and system based on mixed retrieval and fact verification
CN111339272A (en) Code defect report retrieval method and device
US20240070387A1 (en) Method for Determining News Ticker Related to News Based on Sentence Ticker and Apparatus for Performing the Method
US20240070396A1 (en) Method for Determining Candidate Company Related to News and Apparatus for Performing the Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination