CN119938832A

CN119938832A - A complex problem decomposition and multimodal knowledge retrieval method combining large models

Info

Publication number: CN119938832A
Application number: CN202411982091.3A
Authority: CN
Inventors: 任飞亮; 黑磊; 齐屹旸
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2024-12-31
Filing date: 2024-12-31
Publication date: 2025-05-06
Anticipated expiration: 2044-12-31
Also published as: CN119938832B

Abstract

The present invention provides a complex problem decomposition and multimodal knowledge retrieval method combined with a large model, and relates to the field of artificial intelligence and information retrieval technology. The method provides a knowledge retrieval method with image understanding capabilities by extracting semantic information contained in image modal knowledge data for a multimodal knowledge base, and by decomposing complex problems, constructs semantic and structural relationships between numerous sub-problems that may be contained in complex problems, which are used for complex problem decomposition and multimodal knowledge retrieval to generate accurate and comprehensive answers, thereby helping the large model to answer complex problems more accurately and comprehensively; in addition, by integrating relevant background knowledge of complex problems, the retrieval ability of the retrieval tool for the multimodal knowledge base is expanded.

Description

Complex problem decomposition and multi-modal knowledge retrieval method combined with large model

Technical Field

The invention relates to the technical field of artificial intelligence and information retrieval, in particular to a complex problem decomposition and multi-modal knowledge retrieval method combining a large model.

Background

With the wide spread of internet multimedia information and the rapid development of artificial intelligence technology, machine reading understanding and question answering systems have become research hotspots. The core of such tasks is to understand the questions posed by the user, to retrieve the relevant information required for the questions and to generate accurate answers in combination with a knowledge base provided by the user or built in the system. The research direction can not only remarkably reduce the time for searching and reading data by users, but also improve the flexibility of the question-answer interaction process, and has important application value in the scenes of man-machine conversation, intelligent teaching assistance, medical consultation and the like.

In recent years, large language models (abbreviated as large models) represented by ChatGPT, GPT-4 and LLaMA systems exhibit excellent generalization capability and diversified instruction following capability by virtue of their huge model parameters and pre-training corpus scale. In order to further improve the accuracy of the large model in answering the questions (i.e. reduce the illusion questions of the large model), the knowledge base is searched according to the questions of the user to provide context for the large model, and the method combining the knowledge retrieval enhancement capability has become the mainstream method of the current machine reading understanding and question-answering task. However, this method has a limitation that, firstly, the processing capacity of the complex problem is insufficient, the problem proposed by the user often contains a plurality of sub-problems, and even the problem needs to be further inferred to understand the user intention, under such a scenario, a large model may be difficult to accurately understand and analyze the problem, and a retrieval tool cannot be directly matched from a knowledge base to the content required for solving the complex problem. Second, in many application scenarios (e.g., education and medical, etc.), the user's problem may relate to the context-bound information, where critical information may need to be solved in conjunction with images, forms, or other non-textual forms of data, however, most current approaches focus on text-modality retrieval and it is difficult to efficiently integrate and process multimodal information.

In summary, despite the great progress made by current machine reading understanding and question-answering systems driven by large models, challenges still remain in lacking the ability to resolve complex questions, retrieve and integrate multimodal knowledge, and the like. Therefore, in order to solve the problems of searching what knowledge and how to search knowledge, developing a method which takes a large model as a core and is provided with the functions of processing complex problems and searching multi-mode knowledge is a key point for improving the answer accuracy and comprehensiveness of a question-answering system.

Along with the continuous enhancement of the general capability and instruction following capability of the large model, the prior related question-answering system has turned to knowledge retrieval research work oriented to the large model as a core, namely, corresponding knowledge is retrieved for related questions through different retrieval algorithms, so that questions are answered by means of summarization and reasoning capability of the large model.

In short, because the questions that people can present are flexible and various, the background knowledge required for answering the corresponding questions is not the same, and thus the traditional machine reading understanding and question answering system has difficulty in meeting the demands of people. This is because as the complexity and flexibility of questions grows, it is difficult to understand the class model to answer the user's questions correctly, which is why the traditional task is to answer questions in a multiple choice manner.

The large model is used as a generating model, a large amount of basic knowledge is accumulated in the pre-training stage, various instructions given by a user can be followed to a certain extent, and most importantly, the large model can generate corresponding answers according to different questioning styles and requirements of the user, so that the practicability is higher.

Chinese patent CN202410213818 is a large language model question-answering method and device based on knowledge retrieval enhancement, which can enhance the question-answering capability of a large model by retrieving knowledge possibly related to questions from a knowledge base and combining historical questions and answers in multiple conversations.

Chinese patent CN202410343664 is a large language model reasoning method and system based on multi-level knowledge retrieval enhancement, and the accuracy and the relevance of the answer generated by a large model are optimized through the layer-by-layer retrieval of an operation knowledge base and an industry knowledge base. The method comprises the steps of firstly vectorizing a question sentence, comparing the question sentence with a question-answer pair of an operation knowledge base, outputting a corresponding answer if the question-answer pair is successful, turning to an industry knowledge base if the question-answer pair is failed, acquiring a text segment with high correlation through segmentation vectorization matching, and generating an answer by combining a large model. The core method comprises the steps of quantification by a BGE algorithm, cosine similarity calculation, multi-way recall and precise sorting. Aiming at relieving the 'illusion' of a large model and realizing efficient and accurate industry knowledge question and answer

Current related art focuses on either how to better retrieve the knowledge base or how to rank the retrieved knowledge, but ignores the problem of what knowledge should be retrieved. That is, the prior art has the assumption that a given question (also known as a query) is itself a text that can be used directly to retrieve a knowledge base, which itself is not complex and whose intent and semantics can be directly understood by the retrieval tool. However, in practical applications, the problem presented by the user may be complex, and it is necessary to carefully analyze and understand the sub-problems and reasoning paths that may be included behind the complex problem, and decompose the sub-problems and reasoning paths, so that the sub-problems and reasoning paths can be directly taken out of the search knowledge base by the search tool, or search the corresponding knowledge by using the search method optimized by the prior art.

Furthermore, the knowledge bases used in the prior art are basically text-based, however, in many cases, when the knowledge base is given in a multi-modal form (in particular, in the form of a document), many image contents such as pdf, word documents, etc. are contained therein, and some of the contents are not directly contained in the text, but need to be understood with reference to the accompanying image. Thus, the prior art also lacks a simple and efficient way to retrieve multimodal knowledge base.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a complex problem decomposition and multi-modal knowledge retrieval method combining a large model for generating accurate and comprehensive answers aiming at the defects of the prior art.

In order to solve the technical problems, the invention adopts the following technical scheme:

The invention provides a complex problem decomposition and multi-mode knowledge retrieval method combining a large model, which specifically comprises the following steps:

Step 1, selecting a large model to be used as a basic model;

Step 2, constructing a named entity identification data set for training a selected basic model according to specific business scene requirements to obtain a trained basic model f _LLM;

Constructing a named entity recognition dataset D _NER＝{(S_p;{e_p1,e_p2,…,e_pd }) p = 1,2, D for training a base model, wherein S _p is the p-th text in the dataset with a sample number D, { e _p1,e_p2,…,e_pd } is the D-th entity contained in the p-th text in the named entity recognition dataset D _NER;

Step 3, establishing a multi-modal knowledge base, preprocessing the multi-modal knowledge base, and converting the multi-modal knowledge base into a pure text knowledge base, wherein the method specifically comprises the steps of extracting semantic information contained in image modal knowledge data in the multi-modal knowledge base, converting the image modal knowledge data into text modal knowledge data, and converting the text modal knowledge data into text modal knowledge vectors;

The multi-mode knowledge base refers to a knowledge base containing text-image mode knowledge data, the data in the knowledge base is not limited to be multi-mode data, the mode of the data is specifically dependent on actual service and use scene, and if the given knowledge base contains the image mode knowledge data, the image mode data needs to be preprocessed;

The multi-modal knowledge base comprises a knowledge base built-in the system and a knowledge base provided by a user, wherein the knowledge base built-in the system refers to the existing professional knowledge base, including but not limited to a knowledge map, a graph database and a vector query database;

When the multi-mode data contained in the multi-mode knowledge base is preprocessed, the knowledge base built in the system only needs to be processed once in an offline mode, and the data contained in the knowledge base built in the system can be directly retrieved and used by preprocessing operation; if the knowledge base is not provided when the user asks, the real-time processing of the knowledge base given by the user is skipped, and the knowledge base built in the system is only preprocessed;

Extracting optical character information of each image contained in all image mode knowledge data in the multi-mode knowledge base by using an optical character recognition tool f _OCR, and recognizing semantic information contained in the optical character information of each image by using a large model f _MLLM with multi-mode processing capability;

3.1.1, extracting optical character information of each image contained in all image mode knowledge data in the multi-mode knowledge base by using an optical character recognition tool f _OCR;

Extracting optical character information in the image modal knowledge data by using an optical character recognition tool f _OCR, wherein for an image I positioned at a position P of the multi-modal knowledge base, the optical character information of the image is expressed as OCR=f _OCR (I), and if the image does not contain the optical character information, the OCR is null;

Step 3.1.2, according to the optical character recognition information, the position information and the context information of the images, using a large model with multi-mode processing capability to recognize semantic information contained in the optical character information of each image;

For the position P of the image I, define the content of C ¹(δ₁ from the row δ ₁ thereof), the content of C ²(δ₂ from the row δ ₂ thereof), the optical character information of the finally identified image contains semantic information F as:

F=f_MLLM(I;C¹(δ₁);OCR;C²(δ₂)) (1)

step 3.2, replacing the corresponding images at each position in the multi-modal knowledge base with the optical character information of the images to contain semantic information, and converting the multi-modal knowledge base into a pure text knowledge base;

Step 4, receiving a complex problem given by a user, decomposing the complex problem based on an entity identification model, and obtaining entity information in the complex problem by utilizing a dependency syntax analysis model;

The complex problem is a problem consisting of two or more entities or a problem consisting of two or more sub-problems;

Step 4.1 obtaining a set of potential entities in the Complex problem based on the entity identification model f _NER The following formula is shown:

where Q is a complex problem, { E ₁,e₂,…,e_E } is E potential entities identified from complex problem Q;

Step 4.2 obtaining a set of key entities in the Complex problem through the syntactic analysis model f _DSP And updating the potential entity set to obtain an updated potential entity set

The key entity is defined as nouns of all types obtained after the dependency syntax analysis;

Key entity set And the updated set of potential entities is shown in the following formula:

Wherein { s ₁,s₂,…,s_N } is N key entities in the complex problem obtained by the syntactic analysis model f _DSP, { c ₁,c₂,…,c_m } is m key potential entities in the updated set of potential entities;

step 4.3 based on the dependency syntax structure and the set of key entities Forming a set of associated entities by the existence of direct, indirect or clause-form associations between nounsWherein (s _i,s_j) is the association entity pair formed by the ith association entity and the jth association entity, (s _i,s_j)_z is the z-th association entity pair in the association entity set, and n is the number of association entity pairs contained in the complex problem Q;

Step 5, converting the complex problem into a plurality of sub-problems according to entity information in the complex problem, and retrieving direct knowledge and associated knowledge of each sub-problem from a pure text knowledge base;

step 5.1. For the updated potential entity set Generating a direct sub-problem set Q _sub based on the direct problem template T _sub;

Based on the direct question template T _sub, a direct sub-question set Q _sub is generated as follows:

Wherein Q _y is the direct sub-problem in the direct sub-problem set Q _sub, c _y is the key potential entity, and for any key potential entity c _y, the corresponding direct sub-problem Q _y is generated after filling by the direct problem template until the updated potential entity set is obtained The key potential entities in the system are completely filled;

Step 5.2 for associated entity set Generating an associated sub-problem set Q _rel based on the associated problem template T _rel;

Based on the associated problem template T _rel, an associated sub-problem set Q _rel is generated as follows:

Where Q _i,j is the associated sub-question in associated sub-question set Q _rel, (Q _i,j)_w is the w-th associated sub-question in associated sub-question set Q _rel);

Step 5.3, searching each direct sub-problem Q _y in the direct sub-problem collection Q _sub in the pure text knowledge base to obtain direct knowledge K _sub about each direct sub-problem Q _y;

For the direct sub-questions q _y, searching related knowledge from the pure text knowledge base by using a searching tool f _retrieval matched with the pure text knowledge base, if the direct knowledge is not searched, singly answering the questions by a basic model, taking the direct knowledge as reference knowledge and generating direct knowledge K _sub, and if the direct knowledge is one or more, sequencing all the searched direct knowledge according to the basic model, splicing the direct knowledge according to the relevance scores from high to low, and generating direct knowledge K _sub:

K_sub＝f_retrieval(Q_sub)＝{k_u|u=1,2,...,U} (6)

Wherein k _u is the U-th direct knowledge of the direct sub-problem q _y, U is the number of the direct knowledge, and f _retrieval is the knowledge retrieved by using a retrieval tool in the plain text knowledge base and returned;

Step 5.4, searching each associated sub-problem Q _i,j in the associated sub-problem collection Q _rel in the pure text knowledge base to obtain associated knowledge K _rel about the associated sub-problem Q _i,j;

For the associated sub-questions q _i,j, retrieving the related knowledge from a pure text knowledge base, if the associated knowledge is not retrieved, splicing the 'possible non-existence relation' and the answer obtained by directly answering by the basic model, making reference knowledge and generating associated knowledge K _rel, if the retrieved associated knowledge is one or more, sorting the relevance scores of the associated knowledge according to the basic model, splicing the associated knowledge according to the relevance scores from high to low, and generating associated knowledge K _rel:

K_rel＝f_retrieval(Q_rel)＝{(k_i,j)_v|v=1,2,...,V} (7)

Wherein, (k _i,j)_v is the V-th associated knowledge of the associated sub-problem q _i,j, and V is the number of associated knowledge;

Step 6, constructing a knowledge graph G of the complex problem based on the knowledge graph embedding model f _KGE on the complex problem, the sub-problem set, the direct knowledge related to the sub-problem and the associated knowledge;

Step 6.1, defining a binary logic relation set R existing among all key entities in the complex problem;

the binary logic relation set R existing between key entities in the complex problem comprises three kinds of symmetrical relation, antisymmetric relation and irrelevant relation;

for potential entity sets The symmetry relationship of any two entities e _l and e _h is shown as follows:

Wherein r is the relation between two entities, and on the premise that the relation r between e _l and e _h is known, the relation r between e _h and e _l can be directly deduced, so that the two entities have a symmetrical relation;

The antisymmetric relationship is shown in the following equation:

In contrast to the symmetrical relationship, on the premise that the relationship r between the entities e _l and e _h is known, it can be deduced that the relationship r between the entities e _h and e _l does not exist, and then the two entities have an antisymmetric relationship;

no relation, i.e. there is no semantic relation between two entity pairs;

Step 6.2, using an encoder f _emb of the knowledge graph embedding model f _KGE to convert direct knowledge corresponding to the direct sub-problem obtained by decomposing the complex problem and associated knowledge corresponding to the associated sub-problem obtained by decomposing the complex problem into embedding vectors required by the knowledge graph embedding model f _KGE;

Using the encoder f _emb of the knowledge-graph embedding model f _KGE to obtain an embedding vector E _Q＝f_emb (Q) of the complex problem Q;

Each direct sub-problem q _y and a plurality of direct knowledge thereof are spliced together, so that the entity and the background knowledge of the entity are fused together, and an encoder f _emb of a knowledge graph embedding model f _KGE is used to obtain an embedding vector E _y＝f_emb(q_y;k_u of the direct knowledge;

Splicing each associated sub-problem q _i,j and a plurality of associated knowledge thereof, so as to fuse the associated entity pairs and the background knowledge of possible relations between the associated entity pairs together, and obtaining an embedded vector E _i,j＝f_emb(q_i,j;(k_i,j)_v of the associated knowledge by using an encoder f _emb of a knowledge graph embedded model f _KGE;

Step 6.3, inputting the embedded vector of the complex problem Q, the embedded vector of the direct knowledge and the embedded vector of the associated knowledge into a knowledge graph embedded model f _KGE to obtain a knowledge graph G corresponding to the complex problem after decomposition;

Potential entity set obtained by using knowledge graph embedding model f _KGE The logical relationship between any entity e _l and e _h is:

r_l.h＝f_KGE(E_Q,E_l,E_h,E_l,h),r_l,h∈R (10)

further get a set of potential entities The knowledge graph G of any entity e _l and e _h with logical relationship r _lh is:

G={{(e_l,k_l),r_l.h,(e_h,k_h)}_g|g=1,2,...,G} (11)

Step 7, integrating each element contained in the knowledge graph G into a new question Q ^*, and generating a new question Q ^* by the trained basic model f _LLM to finish and generate an answer A;

step 7.1, judging whether the length of an entity-direct knowledge pair (e _l,k_l) contained in each element of the knowledge graph G exceeds the maximum context length specified by the basic model, and summarizing and abbreviated the direct knowledge exceeding the maximum context length specified by the basic model, so that the total length of the direct knowledge does not exceed the maximum context length theta of the basic model;

Defining the maximum context length theta that the base model can handle, defining the maximum length theta _l of each entity-direct knowledge pair (e _l,k_l), if the length of a certain entity-direct knowledge pair (e _l,k_l) exceeds its prescribed maximum length theta _l, summarizing and abbreviated by the base model f _LLM for its direct knowledge k _l until the length constraint is met, so that the maximum context length theta that the base model can handle is:

step 7.2, filling each element contained in the original complex problem Q and the knowledge graph G into a new problem template T _new in sequence to generate a new problem Q ^*, and generating a reply A according to the new problem Q ^* by a trained basic model f _LLM:

Defining a new question template T _new, sequentially filling each element contained in the knowledge graph G into the new question template T _new to form a new question Q ^*, and generating a reply A by the trained basic model f _LLM, wherein the reply A is an answer of the new question Q ^* and an answer of the original question Q.

The technical scheme has the advantages that the method for decomposing the complex problems and searching the multi-mode knowledge by combining the large model is used for solving the complex problems and the multi-mode knowledge, aiming at the multi-mode knowledge base, the knowledge searching method with the image understanding capability is provided by extracting semantic information contained in image mode knowledge data, semantic and structural relations among a plurality of sub-problems possibly contained in the complex problems are constructed by decomposing the complex problems, so that the large model can answer the complex problems more accurately and comprehensively, and in addition, the searching capability of a searching tool on the multi-mode knowledge base is expanded by integrating relevant background knowledge of the complex problems.

Drawings

FIG. 1 is a flowchart of a method for complex problem decomposition and multi-modal knowledge retrieval in combination with a large model according to an embodiment of the present invention.

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

The embodiment is a complex problem decomposition and multi-mode knowledge retrieval method combining a large model, as shown in fig. 1, comprising the following steps:

Step 1, selecting a large model to be used as a basic model;

In some embodiments, if a large model (Large Language Model, LLM) with multi-modal processing capability is selected, such as but not limited to GPT-4o, gemini, LLaVA and LLaMA, an additional model with image content description capability, such as a BLIP model with image content understanding capability, is required. In this embodiment, a large model with multi-modal processing capability is selected as the base model f _LLM;

in this embodiment, the business scenario includes, but is not limited to, a real application scenario in specific fields such as education field, medical field, legal field, etc.;

Named Entity Recognition (NER) refers to a technique of recognizing all potential named entities (hereinafter referred to as entities) in a given text, including person names, place names, organization names, and professional terms in a specific business scenario, etc.;

Constructing a named entity recognition dataset D _NER＝{(S_p;{e_p1,e_p2,…e_pD }) p=1, 2, D for training a base model, wherein S _i is the p-th text in the dataset of sample number D, typically a natural sentence, { e _p1,e_p2,…e_pd } is D entities contained in the p-th text of the named entity recognition dataset D _NER;

in this embodiment, it is not limited to building a named entity recognition dataset, and it may be determined whether to build the named entity recognition dataset according to whether there is an optimization requirement for one or more business scenarios, if there is an optimization requirement for one or more business scenarios, then building the named entity recognition dataset for optimizing the base model f _LLM, otherwise, building the named entity recognition dataset is not required. Constructing a named entity identification data set of a required service scene through technologies such as prompt engineering and the like, and training a basic model f _LLM through fine tuning;

In this embodiment, the multi-modal knowledge base refers to a knowledge base containing text-image modal knowledge data, and it is not limited that the data in the knowledge base must be multi-modal data, the mode of the data depends on the actual service and usage scenario, if the given knowledge base contains only knowledge data of pure text mode, the subsequent steps of this embodiment are not affected, and if the given knowledge base contains knowledge data of image mode, preprocessing of the image mode data is required.

The multi-modal knowledge base comprises a knowledge base built-in the system and a knowledge base provided by a user, wherein the knowledge base built-in the system refers to the existing professional knowledge base, including but not limited to a knowledge map, a graph database, a vector query database and the like;

when the multi-mode data contained in the multi-mode knowledge base is preprocessed, the knowledge base in the system only needs to be processed once in an offline mode, and the data contained in the knowledge base in the system can be directly retrieved and used without repeated preprocessing operation; for a knowledge base given by a user, real-time processing is needed to meet the special requirements of the user; if the knowledge base is provided when the user asks, the pretreatment of the knowledge base in the system is skipped, and the real-time treatment of the knowledge base provided by the user is performed;

Because existing knowledge retrieval tools often retrieve based on text, lack of retrieval capabilities for image data, or provide image content directly to a large model through a visual encoder, greatly affects the context processing capabilities of the large model. In order to enhance the retrieval capability of the image mode knowledge data and reduce the data volume of the image mode knowledge, the embodiment converts the image mode information into a text mode by using an optical character recognition tool f _OCR and an image content description technology through a mode conversion method;

For tables, notes, etc. provided in image modalities with formatted text or digital knowledge information, it is difficult for conventional visual encoders and image content description tools to accurately recognize the text or digital information therein, and therefore, the present embodiment employs the optical character recognition tool f _OCR to extract the optical character information in the knowledge data of the image modalities. For an image I located at position P of the knowledge base, the representation of the optical character information of the image is ocr=f _OCR (I), if the optical character information is not contained in the image, OCR is null;

In order to identify semantic information in an image, such as the rough content of the image, a large model with multi-mode processing capability is used, optical character identification information, position information and context information of the image are provided to identify the semantic information in image mode knowledge data, for the position P of the image I, the upper content of delta ₁ lines is defined as C ¹(δ₁, the lower content of delta ₂ lines is defined as C ²(δ₂), and finally the optical character information of the identified image comprises semantic information F _x as follows:

F=f_LLM(I;C¹(δ₁);OCR;C²(δ₂)) (1)

A complex problem is defined as a problem that consists of two or more entities or a problem that consists of two or more sub-problems in combination. Complex questions cannot be answered directly, but rather require an inference path formed from the entities involved in the complex questions and the relationships between the entities. If the resulting inference path is ambiguous or the underlying model used has no knowledge of the relevant background of the entity, the question may not be answered correctly or comprehensively. For example, a problem consisting of a single entity such as "who is A" can be considered as a simple problem, while a problem consisting of multiple entities such as "date A obtains B" requires knowledge of not only who A is, but also what B is, and what the date is, and thus the meaning of the entity in this sentence is a complex problem;

In addition, according to the named entity recognition data set D _NER, whether the selected entity recognition model needs to be further trained or not can be selected according to a specific service scene, and in the embodiment, the training mode for the entity recognition model is not limited;

The dependency syntax analysis model f _DSP (hereinafter referred to as syntax analysis model) decomposes the complex question text input by the user into a syntax tree having a dependency structure including noun subjects, direct objects, indirect objects, adjective clauses, and system verbs according to the dependency syntax.

The key entity is defined as all types of nouns obtained according to dependency syntactic analysis, and the nouns serve as noun subjects, direct objects, indirect objects and various types of clauses in sentences and play key roles in syntax. Since these nouns are also part of the entity at the same time, to avoid the potential entity and key entity being different, the potential entity set is updated after it is obtained, as shown in the following formula:

An associated entity refers to an entity pair with association in the form of a subject, an object and a clause in a dependency syntax tree, specifically, the noun subject is associated with a direct object thereof, the direct object is associated with an indirect object thereof, any noun is associated with a clause thereof, and if a certain noun is not associated with any other noun, the noun is used as an exception and is singly associated with any other noun;

in this embodiment, the search tool and the search mode are not limited, and the search tool f _retrieval matched with the plain text knowledge base is used as an example;

The direct question template refers to a simple question which is formed by filling each entity according to the template and can be directly searched and answered; in this embodiment, "what/who is c _y" is used as a direct question template to generate a direct sub-question corresponding to a key potential entity, e.g., for "a" this entity, a direct sub-question shaped as "what/who is a;

In the embodiment, the relationship between s _i and s _j is taken as the association problem template, and for the association entity pair (A, B), an association sub-problem is generated as to the relationship between A and B;

based on the associated question template T _rel, the associated sub-question set Q _rel is shown as follows:

Step 5.3, searching each direct sub-problem Q _a in the direct sub-problem collection Q _sub in the pure text knowledge base to obtain direct knowledge K _sub about each direct sub-problem Q _y;

K_sub＝f_retrieval(Q_sub)＝{k_u|u=1,2,...,U} (6)

K_rel＝f_retrieval(Q_rel)＝{(k_i,j)_v|v=1,2,...,V} (7)

Knowledge graph embedding models (Knowledge Graph Embedding, KGE) refer to a class of models that capture semantic relationships between entities by learning vector representations between the entities and the relationships, thereby helping knowledge graph modeling and reasoning, including but not limited to TransE, rotateE and ConvE, etc. In the embodiment, the knowledge graph embedding model is not limited, and the knowledge graph with semantic relation is established to help the large model to better complete reasoning on the complex problem;

The relation type set mainly refers to semantic relation types possibly existing among entities and even relations among the entities in a complex problem presented by a user, and the relation types are generally expressed by binary logic relations in discrete mathematics, including symmetrical relations, anti-symmetrical relations, inverse relations, transfer relations, combination relations and the like. By establishing a knowledge graph taking entity semantics as nodes and logical relations, the theory of discrete mathematics can help a large model to establish an inference path of a complex problem, so that the complex problem can be answered more accurately and comprehensively. In this embodiment, the binary logical relationship set R existing between key entities in the complex problem includes three kinds of symmetric relationships, anti-symmetric relationships and independent relationships;

Wherein r is the relationship between two entities, and on the premise that the relationship r between e _l and e _h is known, the relationship r between e _h and e _l can be directly deduced, and the two entities have symmetrical relationship such as a study relationship, a friend relationship, a lover relationship and the like;

The antisymmetric relationship is shown in the following equation:

In contrast to the symmetrical relationship, on the premise that the relationship r between the entities e _l and e _h is known, it can be deduced that the relationship r between the entities e _h and e _l does not exist, and then the two entities have an antisymmetric relationship, such as a parent-child relationship;

no relation, i.e. there is no semantic relation between two entity pairs;

r_l.h＝f_KGE(E_Q,E_l,E_h,E_l,h),r_l,h∈R (10)

further get a set of potential entities The knowledge graph G of any entity e _l and e _h with logical relationship r _l,h is:

G={{(e_l,k_l),r_l.h,(e_h,k_h)}_g|g=1,2,...,G} (11)

the maximum context length that can be processed by the basic model generally depends on the selected basic model itself, in this embodiment, for simple calculation, the maximum length of each entity-knowledge pair is set to be equal, and the maximum context length that can be processed by the basic model is equally divided into the number of elements in the knowledge graph;

In the embodiment, a new question template T _new is defined as ' please answer the question Q based on the given background knowledge ' (the logic relationship between e _l,k_l),(e_h,k_h),e_l and e _h is r _l,h ', after each element contained in the knowledge graph G is filled in the new question template T _new in turn, a new question Q ^* is formed, and the answer A generated by the trained basic model f _LLM is the answer of the new question Q ^* and the answer of the original question Q.

By means of the method, relevant background knowledge is integrated, the retrieval capability of a retrieval tool to the multi-mode knowledge base is expanded, and semantic and structural relations among a plurality of sub-questions possibly contained in the complex questions are constructed through decomposition of the complex questions, so that a large model is helped to answer the complex questions more accurately and comprehensively.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will understand that they can make modifications to the technical solutions described in the above-mentioned embodiments or make equivalent substitutions of some or all of the technical features, without departing from the essence of the corresponding technical solutions from the scope of the invention defined by the claims.

Claims

1. A complex problem decomposition and multi-mode knowledge retrieval method combining a large model is characterized by comprising the following steps:

step 1, selecting a large model to be used as a basic model;

Step 2, constructing a named entity identification data set for training a selected basic model according to specific business scene requirements to obtain a trained basic model;

Step 3, establishing a multi-modal knowledge base, preprocessing the multi-modal knowledge base, and converting the multi-modal knowledge base into a pure text knowledge base;

step 6, constructing a knowledge graph of the complex problem based on the knowledge graph embedding model on the complex problem, the sub-problem set, the direct knowledge related to the sub-problem and the associated knowledge;

And 7, integrating all elements contained in the knowledge graph into a new question, and generating a new question by the trained basic model to finish and generate an answer.

2. The method for complex problem decomposition and multimodal knowledge retrieval in combination with large models of claim 1, wherein step 2 comprises constructing a named entity recognition dataset as follows:

The named entity recognition dataset D _NER＝{(S_p;{e_p1,e_p2,…,e_pd }) p=1, 2,..d } is constructed for training the base model, where S _p is the p-th text in the dataset with the number of samples D and { e _p1,e_p2,…,e_pd } is the D-th entity contained in the p-th text in the named entity recognition dataset D _NER.

3. The method for complex problem decomposition and multimodal knowledge retrieval combined with large models according to claim 2, wherein said step 3 comprises:

extracting semantic information contained in the image mode knowledge data in the multi-mode knowledge base, converting the image mode knowledge data into text mode knowledge data, and converting the text mode knowledge data into text mode knowledge vectors;

when the multi-mode data contained in the multi-mode knowledge base is preprocessed, the knowledge base built in the system only needs to be processed once in an offline mode, the data contained in the knowledge base built in the system can be directly searched for use, the knowledge base given by a user needs to be processed in real time, if the knowledge base is not provided when the user asks, the real-time processing of the knowledge base given by the user is skipped, and only the knowledge base built in the system is preprocessed, otherwise, if the knowledge base is provided when the user asks, the preprocessing of the knowledge base built in the system is skipped, and the knowledge base provided by the user is processed in real time.

4. The method for decomposing complex problems and retrieving multi-modal knowledge by combining large models as claimed in claim 3, wherein the specific method in step 3 is as follows:

step 3.1.2, according to the optical character recognition information, the position information and the context information of the images, using a large model f _MLLM with multi-mode processing capability to recognize semantic information contained in the optical character information of each image;

F=f_MLLM(I;C¹(δ₁);OCR;C²(δ₂)) (1)

And 3.2, replacing the corresponding images at each position in the multi-modal knowledge base with the optical character information of the images to contain semantic information, and converting the multi-modal knowledge base into a plain text knowledge base.

5. The method for decomposing and retrieving complex problems and multi-modal knowledge in combination with large models according to claim 4, wherein the complex problems in step 4 are problems composed of two or more entities or problems composed of two or more sub-problems;

the specific method of the step4 is as follows:

step 4.3 based on the dependency syntax structure and the set of key entities Forming a set of associated entities by the existence of direct, indirect or clause-form associations between nounsWherein (s _i,s_j) is the association entity pair formed by the ith association entity and the jth association entity, (s _i,s_j)_z is the z-th association entity pair in the association entity set, and n is the number of association entity pairs contained in the complex problem Q.

6. The method for complex problem decomposition and multimodal knowledge retrieval in combination with a large model according to claim 5, wherein said step 5 comprises the steps of:

K_sub＝f_retrieval(Q_sub)＝{k_u|u=1,2,...,U} (6)

K_rel＝f_retrieval(Q_rel)＝{(k_i,j)_v|v=1,2,...,V} (7)

Where, (k _i,j)_v is the V-th associated knowledge of the associated sub-problem q _i,j and V is the number of associated knowledge.

7. The method for complex problem decomposition and multi-modal knowledge retrieval combined with large models according to claim 6, wherein the specific method in step 6 is as follows:

The antisymmetric relationship is shown in the following equation:

no relation, i.e. there is no semantic relation between two entity pairs;

r_l.h＝f_KGE(E_Q,E_l,E_h,E_l,h),r_l,h∈R (10)

G={{(e_l,k_l),r_l.h,(e_h,k_h)}_g|g=1,2,...,G} (11)。

8. The method for complex problem decomposition and multi-modal knowledge retrieval combined with large models according to claim 7, wherein the specific method in step 7 is as follows: