CN120764701A - An intelligent interactive system based on multimodal data fusion - Google Patents
An intelligent interactive system based on multimodal data fusionInfo
- Publication number
- CN120764701A CN120764701A CN202511293564.3A CN202511293564A CN120764701A CN 120764701 A CN120764701 A CN 120764701A CN 202511293564 A CN202511293564 A CN 202511293564A CN 120764701 A CN120764701 A CN 120764701A
- Authority
- CN
- China
- Prior art keywords
- text
- question
- user
- input
- answer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
- G06F16/33295—Natural language query formulation in dialogue systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an intelligent interaction system for multi-mode data fusion, which comprises a user question-answering module, an AI text detection module and a meeting summary generation module, wherein the user question-answering module is used for acquiring a target question input by a user, accurately searching a question answer and an answer source of the target question from a pre-built knowledge base, feeding back the question answer and the answer source to the user, the AI text detection module is used for acquiring a text to be detected input by the user, judging whether the text input by the user is an AI generated text or not by utilizing a pre-built AI text detection model, and outputting a judgment result, and the meeting summary generation module is used for acquiring a meeting record input by the user and generating a meeting summary corresponding to the meeting record.
Description
Technical Field
The invention relates to the technical field of intelligent interaction, in particular to an intelligent interaction system for multi-mode data fusion.
Background
With the continuous development of artificial intelligence technology, intelligent interactive systems are becoming more popular, and can help users or enterprises to solve problems and acquire information.
The existing intelligent interaction system still has a certain technical immature problem, for example, a knowledge base is stored in a single mode (only text or only file), multi-source data cannot be effectively fused, the information acquisition capability is poor, the AI generation text reverse monitoring capability is generally not provided, the universality and the flexibility are lacking, and the interaction requirements of different users in different scenes cannot be met.
Disclosure of Invention
The embodiment of the invention provides an intelligent interaction system for multi-mode data fusion, which can improve the practicability and flexibility of intelligent interaction and meet the interaction requirements of different users in different scenes.
The intelligent interaction system for multi-mode data fusion of the embodiment of the invention comprises:
The user question-answering module is used for acquiring a target question input by a user, accurately searching a question answer and an answer source of the target question from a pre-constructed knowledge base, and feeding back the question answer and the answer source to the user;
The AI text detection module is used for acquiring a text to be detected input by a user, judging whether the text input by the user is an AI generated text or not by utilizing a pre-constructed AI text detection model, and outputting a judgment result;
The conference summary generation module is used for acquiring conference recordings input by a user and generating conference summary corresponding to the conference recordings.
Further, the construction process of the knowledge base comprises the following steps:
collecting a history consultation record and a knowledge original file within a preset period, wherein the history consultation record comprises history consultation questions and answers to the history questions;
uploading the history consultation record and the knowledge source file to a knowledge base;
extracting file information of the knowledge original file, generating a metadata table corresponding to each type according to the file type, and storing the file information into the metadata table;
And establishing an association relation between the metadata table and the knowledge source file by recording the unique identification of the knowledge source file in the metadata table, so that the knowledge source file is quickly positioned through the unique identification during knowledge retrieval.
Further, the obtaining the target question input by the user, accurately retrieving the answer and the answer source of the target question from the pre-constructed knowledge base, includes:
Acquiring a questioning input of a user, and preprocessing the questioning input to generate a target question, wherein the questioning input forms comprise but are not limited to text input, picture input, file input and video input;
And determining a question-answer prompt word aiming at the target question, and accurately searching a question answer and an answer source of the target question from a pre-constructed knowledge base by combining the question-answer prompt word.
Further, the user question and answer module is further configured to:
receiving feedback marks of the user on the answers of the questions, and triggering corresponding processing logic aiming at different feedback marks, wherein the feedback marks are used for indicating the accuracy of the answers of the questions;
when the accuracy does not meet the preset accuracy requirement, pushing the target question to a manual answer module, and correspondingly adjusting a retrieval algorithm;
and when the accuracy meets the preset accuracy requirement, not processing.
Further, the AI text detection module includes:
The model detection unit is used for inputting the text to be detected into a pre-trained AI generation detection model so that the AI generation detection model detects the text to be detected and outputs a probability value that the text to be detected is the AI generation text;
And the comprehensive judging unit is used for calculating the text entropy value anomaly degree and the professional term density difference value of the text to be detected, and combining the probability value to obtain a judging result of whether the text to be detected is an AI generated text.
Further, the training process of the AI generation detection model includes:
Constructing an initial model by adopting a convolutional neural network and a two-way long-short-term memory network;
Dividing a certain number of positive samples and negative samples into a training set, a verification set and a test set according to a preset proportion, wherein the positive samples are manually written texts, and the negative samples are texts generated by using an AI tool;
Training the initial model by using a training set, and adjusting parameters of the initial model through a back propagation algorithm so that the model can accurately distinguish AI generated texts from manual written texts;
evaluating and optimizing the initial model by using the verification set, and selecting model parameters with optimal performance;
and testing the initial model by using the test set to finally obtain a trained AI generation detection model.
Further, the calculating the text entropy value anomaly degree and the term density difference value of the text to be detected includes:
dividing the text to be detected into a plurality of paragraphs according to preset text intervals, and calculating the information entropy of each paragraph;
Calculating the abnormal degree of the paragraph entropy value of each paragraph and the abnormal degree of the text entropy value of the text to be detected based on the information entropy and a preset reference entropy value;
And calculating a measured value of the special term density in the text to be detected according to a special term library, and calculating a special term density difference value between the measured value of the special term density and a preset minimum threshold value of the special term density.
Further, the combining the probability value to obtain a result of determining whether the text to be detected is an AI generated text includes:
Calculating to obtain a comprehensive judgment value based on the text entropy value anomaly degree, the technical term density difference value, the probability value, the pre-allocated anomaly degree weight, the density difference value weight and the probability value weight;
And determining whether the text to be detected is an AI generated text judgment result according to the comprehensive judgment value.
Further, the conference summary generating module includes:
the text conversion unit is used for calling a voice-to-text model when the question input is a conference recording, converting the conference recording into a text recording, and checking and coloring the text recording;
the template matching unit is used for matching the target template from the conference summary template library in the knowledge base according to a preset conference summary template matching rule;
the information extraction unit is used for extracting meeting key information from the processed word records by utilizing a natural language processing technology;
And the summary generating unit is used for filling the meeting key information into the target template and generating a meeting summary corresponding to the meeting record.
Further, the meeting summary template matching rule includes:
The attribute matching rule is used for preferentially matching templates with completely consistent attributes in a conference summary template library according to attribute information of the conference, and if the templates are not matched, matching is sequentially carried out on the superior attributes;
version matching rules are used for preferentially selecting templates with latest versions when the templates are matched with a plurality of templates meeting the requirements of the affiliated area;
And the content matching rule is used for calculating the similarity between the text record and the template content and selecting the template with the highest similarity when the content matching rule is matched with the templates with the same versions.
Compared with the prior art, the intelligent interaction system for multi-mode data fusion has the advantages that a user question-answering module is used for acquiring target questions input by a user, accurately searching out answers and answer sources of the target questions from a pre-built knowledge base, feeding back the answers and answer sources to the user, the multi-mode knowledge base is used, relevant information can be rapidly located, time for acquiring the answers by the user is greatly shortened, answer sources are provided while the answers of the questions are given, requirements of the user on information accuracy are met, an AI text detection module is used for acquiring texts to be detected input by the user, a pre-built AI text detection model is used for judging whether the texts input by the user are AI generated or not and outputting judgment results, the module can rapidly detect a large number of texts, mark possibly AI generated contents, improve detection efficiency, a conference record generation module is used for acquiring conference records input by the user, the conference record is generated correspondingly, the module can rapidly generate conference record, manpower investment is reduced, the requirements of the user can be met, the intelligent interaction system can not meet the requirements of different intelligent interaction situations and the intelligent interaction system is different from the intelligent interaction system.
Drawings
In order to more clearly illustrate the technical features of the embodiments of the present invention, the drawings that are required to be used in the embodiments of the present invention will be briefly described below, and it is apparent that the drawings described below are only some embodiments of the present invention and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of an embodiment of a multi-modal data fusion intelligent interaction system provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Referring to fig. 1, a schematic structural diagram of an embodiment of a multi-modal data fusion intelligent interaction system is provided in the present invention. As shown in fig. 1, the intelligent interaction system for multi-modal data fusion includes:
The user question and answer module 11 is used for acquiring a target question input by a user, accurately searching a question answer and an answer source of the target question from a pre-constructed knowledge base, and feeding back the question answer and the answer source to the user;
The AI text detection module 12 is configured to obtain a text to be detected input by a user, determine whether the text input by the user is an AI generated text by using a pre-constructed AI text detection model, and output a determination result;
the conference summary generating module 13 is configured to obtain a conference recording input by a user, and generate a conference summary corresponding to the conference recording.
Specifically, a user question and answer module acquires a target question input by a user, accurately retrieves a question answer and an answer source of the target question from a pre-constructed knowledge base, and feeds back the question answer and the answer source to the user.
The AI text detection module is used for acquiring the text to be detected input by the user, judging whether the text input by the user is an AI generated text or not by utilizing a pre-constructed AI text detection model, and outputting a judgment result.
The conference summary generation module is used for acquiring conference recordings input by a user and generating conference summary corresponding to the conference recordings; the module can quickly generate meeting summary, reduces manpower input, and can avoid omission and errors possibly occurring during manual recording; the intelligent interaction system for multi-mode data fusion can improve the practicability and flexibility of intelligent interaction and meet the interaction requirements of different users in different scenes.
In an alternative embodiment, the knowledge base construction process includes:
collecting a history consultation record and a knowledge original file within a preset period, wherein the history consultation record comprises history consultation questions and answers to the history questions;
uploading the history consultation record and the knowledge source file to a knowledge base;
extracting file information of the knowledge original file, generating a metadata table corresponding to each type according to the file type, and storing the file information into the metadata table;
And establishing an association relation between the metadata table and the knowledge source file by recording the unique identification of the knowledge source file in the metadata table, so that the knowledge source file is quickly positioned through the unique identification during knowledge retrieval.
Specifically, the history consultation record and the knowledge source file within the preset period are collected, the history consultation record covers the history consultation problem and the corresponding history problem answer, the requirements of the past users in knowledge acquisition and the provided solutions are reflected, the knowledge source is a valuable information source in a knowledge base, the knowledge source file can be various documents, such as a product use manual, a technical research and development document, training data and the like, the setting of the preset period can be determined according to the actual requirements and the knowledge updating frequency, the collected knowledge is ensured to have certain timeliness, and the enough history information can be covered.
Selecting a proper knowledge base storage system, uploading a history consultation record and a knowledge source file to a knowledge base, extracting key information of the knowledge source file as metadata aiming at knowledge source files of different types (such as a document, a picture, a video and the like), wherein the key information comprises a file name, a file size, creation time, modification time, an author, a keyword and the like, for example, for an enterprise system file, a system name, a release department, a release date, an application range and the like can be extracted as metadata, corresponding metadata tables are respectively generated according to file types, each metadata table has a specific field structure and is used for storing metadata of the corresponding type file, and for example, the metadata tables of the document types possibly comprise fields of file names, file paths, document types, page numbers and the like, and the extracted file information is accurately stored in the corresponding metadata tables.
And generating a unique identifier for each knowledge source file, wherein the identifier can be a number, a letter or a combination of the numbers and letters, ensuring that the unique identifier is unique in the whole knowledge base, recording the unique identifier corresponding to each file in a metadata table, and recording the unique identifier in a storage path or attribute of the knowledge source file, wherein in the knowledge retrieval process, when related file information is found through metadata, the specific knowledge source file can be quickly positioned according to the unique identifier.
In an alternative embodiment, the obtaining the target question input by the user, accurately retrieving the answer and the answer source of the target question from the pre-constructed knowledge base, includes:
Acquiring a questioning input of a user, and preprocessing the questioning input to generate a target question, wherein the questioning input forms comprise but are not limited to text input, picture input, file input and video input;
And determining a question-answer prompt word aiming at the target question, and accurately searching a question answer and an answer source of the target question from a pre-constructed knowledge base by combining the question-answer prompt word.
Specifically, firstly, diversified question input of a user is obtained, including but not limited to text input, picture input, file input and video input, the question input is preprocessed, irrelevant characters, punctuation marks, special symbols and the like in the text input are removed, text formats are unified, characters in the picture are accurately recognized by using an OCR technology for the picture input, the characters are converted into editable text formats, corresponding parsing tools are used for the file input according to different file formats, text contents in the file are extracted, voice in the video is firstly converted into characters by a voice recognition technology for the video input, the converted characters are preprocessed, and the target problems in the unified formats are generated after preprocessing in a similar manner to the processing mode of the text input, so that subsequent retrieval operation is facilitated.
Extracting core words capable of accurately summarizing the subject and key points of the problem from the target problem, carrying out semantic analysis on the target problem by using a natural language processing technology, understanding the intention and the context relation of the problem, generating a more targeted question-answer prompt word according to the semantic analysis result, searching a knowledge base constructed in advance by taking the determined question-answer prompt word as a search condition, storing a large amount of knowledge information including historical consultation records, knowledge source files and the like in the knowledge base, finding related knowledge content by matching with the prompt word, extracting a question answer which is most matched with the target problem in the searched related knowledge, and simultaneously recording source information of the answer, such as which knowledge source file, which part of the file and the like is from, so that a user can evaluate the reliability and the authority of the answer.
The intelligent interaction system for multi-mode data fusion can accept various forms of question input such as texts, pictures, files and videos, meets the use habit and scene requirements of different users, can process whether the user shoots and uploads a question picture or uploads a document file containing a question through a mobile phone, improves the use convenience of the user, converts various forms of input into unified target questions through preprocessing operation, reduces the influence of the difference of the input forms on retrieval, and simultaneously determines a targeted question and answer prompt word, so that relevant information in a knowledge base can be more accurately positioned, the retrieval result not only comprises the question answer, but also provides source information of the answer, enables the user to know the source of the answer, and meets the requirement of the user on knowledge accuracy.
In an alternative embodiment, the user question and answer module is further configured to:
receiving feedback marks of the user on the answers of the questions, and triggering corresponding processing logic aiming at different feedback marks, wherein the feedback marks are used for indicating the accuracy of the answers of the questions;
when the accuracy does not meet the preset accuracy requirement, pushing the target question to a manual answer module, and correspondingly adjusting a retrieval algorithm;
and when the accuracy meets the preset accuracy requirement, not processing.
Specifically, after the question answer retrieval of the user is completed and presented to the user, the user question answer module further receives a feedback mark of the answer accuracy of the user, and according to the accuracy condition indicated by the feedback mark, the module triggers different processing logics, and the feedback mark can have various forms, such as a simple binary mark, for example, "accurate" and "inaccurate", or a multi-element grading mark, for example, "very accurate", "generally accurate", "inaccurate", "very inaccurate", and the like, and the multi-element grading mark can reflect the evaluation of the answer accuracy of the user more carefully, and an obvious feedback button or interaction area can be set on an answer display interface, so that the user can perform feedback operation conveniently and rapidly.
When the accuracy of the user feedback answers meets the preset standard, the system does not carry out additional processing, when the accuracy of the user feedback answers does not meet the standard, the system forwards the target questions to a manual answer module, the manual answer module can arrange professional customer service personnel, field experts and the like to answer the questions and provide more accurate and detailed answers, the system can carry out optimization adjustment on the retrieval algorithm according to the results of the user feedback and the manual answer, analysis is caused by inaccurate keyword extraction, deviation of semantic understanding, incomplete knowledge in a knowledge base and the like, and the retrieval algorithm is improved for the questions, such as optimizing keyword weight distribution, enhancing the capability of a semantic analysis model, expanding the knowledge base and the like, so as to improve the accuracy of subsequent retrieval answers.
The embodiment of the invention can timely find the problems in the answers through the feedback of the users, push the inaccurate problems to manual processing, and adjust the retrieval algorithm, so that the quality of the answers provided by the system can be gradually improved through continuous optimization, the requirements of the users can be better met, the system can also improve the retrieval algorithm, the knowledge base and the like, and the continuous development and perfection of the system are promoted.
In an alternative embodiment, the AI text detection module includes:
The model detection unit is used for inputting the text to be detected into a pre-trained AI generation detection model so that the AI generation detection model detects the text to be detected and outputs a probability value that the text to be detected is the AI generation text;
And the comprehensive judging unit is used for calculating the text entropy value anomaly degree and the professional term density difference value of the text to be detected, and combining the probability value to obtain a judging result of whether the text to be detected is an AI generated text.
Specifically, the AI text detection module mainly comprises a model detection unit and a comprehensive judgment unit, the model detection unit and the comprehensive judgment unit cooperate with each other to finish the judgment task of whether the text to be detected is an AI generated text or not, the text to be detected is input into a pre-trained AI generated detection model, the AI generated detection model utilizes the characteristics learned by the internal complex neural network structure and a large amount of training data, analysis and detection are carried out on the text to be detected, a probability value of the text to be detected as the AI generated text is output, the probability value is a numerical value between 0 and 1, and the larger the numerical value is, the higher the probability that the text is the AI generated is indicated.
The text entropy is an index for measuring uncertainty of text information, the text generated by the AI possibly has differences with manual writing in terms of vocabulary distribution, information density and the like, the text entropy value is abnormal, the special term density refers to the ratio of the number of special terms in the text to the total vocabulary of the text, the comprehensive judging unit is used for calculating the difference value between the text entropy value abnormality degree and the special term density of the text to be detected, and the probability value output by the model detecting unit and the calculated text entropy value abnormality degree and the special term density difference value are comprehensively analyzed to obtain a judging result of whether the text to be detected is the AI generated text.
According to the embodiment of the invention, the text can be analyzed from different angles by comprehensively judging the probability value, the text entropy value anomaly degree and the professional term density difference value index detected by combining the model, so that the accuracy of the detection result is improved.
In an alternative embodiment, the training process of the AI-generated detection model includes:
Constructing an initial model by adopting a convolutional neural network and a two-way long-short-term memory network;
Dividing a certain number of positive samples and negative samples into a training set, a verification set and a test set according to a preset proportion, wherein the positive samples are manually written texts, and the negative samples are texts generated by using an AI tool;
Training the initial model by using a training set, and adjusting parameters of the initial model through a back propagation algorithm so that the model can accurately distinguish AI generated texts from manual written texts;
evaluating and optimizing the initial model by using the verification set, and selecting model parameters with optimal performance;
and testing the initial model by using the test set to finally obtain a trained AI generation detection model.
Specifically, an initial model is constructed by adopting a Convolutional Neural Network (CNN) and a two-way long-short-term memory network (BiLSTM), the CNN is good at capturing local features of the text, such as specific vocabulary combination, phrase modes and the like, some significant features in the text can be rapidly identified, biLSTM can consider context information of the text, the text sequence is processed in a two-way manner, the semantic and grammar structures of the text are better understood, the two are combined, the advantages of each can be fully exerted, and the extraction capability of the model to the text features is improved.
Collecting a certain number of positive samples and negative samples, wherein the positive samples are manually written texts, the negative samples are texts generated by using an AI tool, the positive samples and the negative samples are divided into a training set, a verification set and a test set according to a preset proportion, the division proportion can be 7:1:2 or 8:1:1, the training set is used for training a model, text data in the training set is input into an initial model, the model is calculated according to the input text data, a text is output to generate a probability value of the text for AI, then a loss function of the model is calculated through a back propagation algorithm, and parameters of the model are adjusted according to gradients of the loss function, the back propagation algorithm can effectively transmit errors from an output layer to an input layer, so that the parameters of the model are continuously optimized, the classification capability of the model on the texts is improved, the training process usually needs to be carried out for a plurality of rounds, all samples in the training set can be used for carrying out complete training once in each round, and super parameters such as learning rate, batch size and the like can be set in the training process, so that learning speed and stability of the model are controlled.
The model in the training process is evaluated by using the verification set, parameters of the model are adjusted according to an evaluation result of the verification set, for example, if the accuracy of the model on the verification set is low, the adjustment of learning rate, the increase of the complexity of the model and the like can be tried, the model parameters with optimal performance are continuously adjusted, the model with optimal performance is tested by using the test set, the data of the test set, the training set and the verification set are mutually independent, the generalization capability of the model can be reflected more truly, and also, various evaluation indexes of the model on the test set are calculated to determine the final performance of the model, if the performance of the model on the test set meets the requirement, the model generates a detection model for trained AI, and if the performance does not meet the requirement, the model structure or the training parameters need to be readjusted, and training and testing are carried out again until the model meeting the requirement is obtained.
According to the embodiment of the invention, a systematic training process is formed from model construction, data set division, training, evaluation and tuning to testing, the scientificity and the effectiveness of model training are ensured, the trained AI generation detection model fully combines the advantages of CNN and BiLSTM, the extraction capability of the model on text characteristics is improved, and therefore the AI generation text and the manual writing text are more accurately distinguished.
In an alternative embodiment, the calculating the text entropy value anomaly degree and the term density difference value of the text to be detected includes:
dividing the text to be detected into a plurality of paragraphs according to preset text intervals, and calculating the information entropy of each paragraph;
Calculating the abnormal degree of the paragraph entropy value of each paragraph and the abnormal degree of the text entropy value of the text to be detected based on the information entropy and a preset reference entropy value;
And calculating a measured value of the special term density in the text to be detected according to a special term library, and calculating a special term density difference value between the measured value of the special term density and a preset minimum threshold value of the special term density.
Specifically, the text to be detected is divided into a plurality of paragraphs according to preset text intervals, the text intervals can be set according to actual requirements, for example, the text intervals are divided according to natural paragraphs, fixed word intervals and the like, a reasonable dividing mode is conducive to more accurately calculating the information entropy of each paragraph so as to reflect the information characteristics of different parts of the text, the information entropy of each divided paragraph is calculated, the calculated information entropy of each paragraph is compared with a preset reference entropy value based on the calculated information entropy of each paragraph, the paragraph entropy value anomaly degree of each paragraph is calculated, the entropy value anomaly degree of all paragraphs is synthesized, and the text entropy value anomaly degree of the text to be detected is obtained, wherein the formula is referred as follows:
;
;
Wherein, the For the degree of anomaly in the entropy value of the text,For the total number of paragraphs after text segmentation,The outlier of the paragraph entropy value for the ith paragraph,For the i-th paragraph it is possible,For the entropy of the paragraph information,Is the artificial text benchmark entropy value.
According to the technical term library, counting the number of the technical terms in the text to be detected, then calculating a technical term density actual measurement value, wherein the technical term density actual measurement value=the number of the technical terms/the total vocabulary of the text, comparing the calculated technical term density actual measurement value with a preset technical term density minimum threshold value, and calculating a technical term density difference value.
In an optional implementation manner, the combining the probability value to obtain a result of determining whether the text to be detected is an AI generated text includes:
Calculating to obtain a comprehensive judgment value based on the text entropy value anomaly degree, the technical term density difference value, the probability value, the pre-allocated anomaly degree weight, the density difference value weight and the probability value weight;
And determining whether the text to be detected is an AI generated text judgment result according to the comprehensive judgment value.
Specifically, according to the importance of three indexes, namely the text entropy value anomaly degree, the technical term density difference value and the probability value, when judging whether the text is an AI generated text, weights are allocated to the text, namely the anomaly degree weight, the density difference value weight and the probability value weight, and according to the allocated weights, the text entropy value anomaly degree, the technical term density difference value and the probability value are weighted and summed, and a comprehensive judgment value is obtained by calculation, wherein the formula is referred as follows:
;
Wherein, the In order to integrate the decision values,The probability values output by the detection model are generated for the AI,For the degree of anomaly in the entropy value of the text,In order to adjust the parameters of the device,For the purposes of the generic term actual measurement of density,As the term of art density minimum threshold,As the weight of the probability value,As the weight of the degree of abnormality,Is the density difference weight.
In an alternative embodiment, the meeting summary generation module includes:
the text conversion unit is used for calling a voice-to-text model when the question input is a conference recording, converting the conference recording into a text recording, and checking and coloring the text recording;
the template matching unit is used for matching the target template from the conference summary template library in the knowledge base according to a preset conference summary template matching rule;
the information extraction unit is used for extracting meeting key information from the processed word records by utilizing a natural language processing technology;
And the summary generating unit is used for filling the meeting key information into the target template and generating a meeting summary corresponding to the meeting record.
Specifically, when the question input is detected as the conference record, a voice-to-text model is automatically invoked, the record content is converted into a text record, the converted text record is checked, possible voice recognition errors such as mispronounced words, places with unsmooth semantics and the like are corrected, the text expression is smooth, accurate and professional, the text expression is enabled to be more smooth and accurate, a target template matched with the current conference is searched and matched from a conference summary template library in a knowledge base according to a preset conference summary template matching rule, the text record is deeply analyzed by using a natural language processing technology such as lexical analysis, syntactic analysis, semantic understanding and the like, key information of the conference is extracted, the conference theme, participants, conference time, discussion points, decision results and the like are filled according to the format and the requirements of the target template, and the key information is placed at the corresponding position, so that conference summary corresponding to the conference record with complete structure and accurate content is generated.
The embodiment of the invention can ensure that the generated meeting summary accords with preset specifications and requirements, ensures that the summary structure and content have consistency and standardization, is convenient to read, archive and subsequent consult and use, realizes automation in the whole meeting summary generation process, is used for converting meeting record into key information extraction and summary generation, does not need a great deal of manual operation, greatly improves the working efficiency, saves the labor cost, and can preset different template matching rules and template libraries according to different meeting types and requirements, thereby having stronger universality and flexibility.
In an alternative embodiment, the meeting summary template matching rules include:
The attribute matching rule is used for preferentially matching templates with completely consistent attributes in a conference summary template library according to attribute information of the conference, and if the templates are not matched, matching is sequentially carried out on the superior attributes;
version matching rules are used for preferentially selecting templates with latest versions when the templates are matched with a plurality of templates meeting the requirements of the affiliated area;
And the content matching rule is used for calculating the similarity between the text record and the template content and selecting the template with the highest similarity when the content matching rule is matched with the templates with the same versions.
Specifically, the meeting summary template matching rule aims to accurately and efficiently select the template most suitable for the current meeting from a meeting summary template library, comprehensively considers a plurality of factors such as attribute information, template version, similarity of text records and template content of the meeting, ensures that the finally matched template can meet the requirements of meeting summary generation to the greatest extent through gradual screening and comparison, takes the attribute information of the meeting as a primary matching basis, firstly tries to search the meeting summary template library for the template completely consistent with the meeting attribute, and if not, sequentially performs matching to the superior attribute according to the attribute hierarchy relation until a proper template is found or all possible attribute hierarchies are traversed, and can enable the generated meeting summary to more meet the local actual conditions and requirements and enhance the applicability and normalization through the attribute matching rule.
When a plurality of templates meeting the requirements of the attribution are found according to the matching rules of the attribution, the latest template of the version can be preferentially selected by the matching rules of the version, so that meeting disciplines can be ensured to follow the latest standards and specifications, and the quality and readability of the disciplines are improved by utilizing the latest functions and formats.
If multiple templates with the same version are found in the matching process, the content matching rule calculates the similarity between the text record and the content of each template, a text similarity algorithm such as cosine similarity, jaccard similarity and the like can be adopted generally, and finally, the template with the highest similarity is selected as a target template, so that the selected template is ensured to be most matched with the actual content of the conference, the generated conference summary can accurately reflect key information such as discussion key points and decision results of the conference, and the accuracy and pertinence of the summary are improved.
While the invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. An intelligent interactive system for multi-modal data fusion, comprising:
The user question-answering module is used for acquiring a target question input by a user, accurately searching a question answer and an answer source of the target question from a pre-constructed knowledge base, and feeding back the question answer and the answer source to the user;
The AI text detection module is used for acquiring a text to be detected input by a user, judging whether the text input by the user is an AI generated text or not by utilizing a pre-constructed AI text detection model, and outputting a judgment result;
The conference summary generation module is used for acquiring conference recordings input by a user and generating conference summary corresponding to the conference recordings.
2. The intelligent interaction system for multi-modal data fusion of claim 1 wherein the knowledge base construction process comprises:
collecting a history consultation record and a knowledge original file within a preset period, wherein the history consultation record comprises history consultation questions and answers to the history questions;
uploading the history consultation record and the knowledge source file to a knowledge base;
extracting file information of the knowledge original file, generating a metadata table corresponding to each type according to the file type, and storing the file information into the metadata table;
And establishing an association relation between the metadata table and the knowledge source file by recording the unique identification of the knowledge source file in the metadata table, so that the knowledge source file is quickly positioned through the unique identification during knowledge retrieval.
3. The intelligent interaction system for multimodal data fusion according to claim 1, wherein said obtaining a target question entered by a user, accurately retrieving a question answer and answer source for said target question from a pre-constructed knowledge base, comprises:
Acquiring a questioning input of a user, and preprocessing the questioning input to generate a target question, wherein the questioning input forms comprise but are not limited to text input, picture input, file input and video input;
And determining a question-answer prompt word aiming at the target question, and accurately searching a question answer and an answer source of the target question from a pre-constructed knowledge base by combining the question-answer prompt word.
4. The intelligent interactive system for multimodal data fusion according to claim 1, wherein the user question-answering module is further configured to:
receiving feedback marks of the user on the answers of the questions, and triggering corresponding processing logic aiming at different feedback marks, wherein the feedback marks are used for indicating the accuracy of the answers of the questions;
when the accuracy does not meet the preset accuracy requirement, pushing the target question to a manual answer module, and correspondingly adjusting a retrieval algorithm;
and when the accuracy meets the preset accuracy requirement, not processing.
5. The intelligent interactive system according to claim 1, wherein said AI text detection module comprises:
The model detection unit is used for inputting the text to be detected into a pre-trained AI generation detection model so that the AI generation detection model detects the text to be detected and outputs a probability value that the text to be detected is the AI generation text;
And the comprehensive judging unit is used for calculating the text entropy value anomaly degree and the professional term density difference value of the text to be detected, and combining the probability value to obtain a judging result of whether the text to be detected is an AI generated text.
6. The intelligent interaction system for multimodal data fusion according to claim 5, wherein the training process for AI-generated detection models includes:
Constructing an initial model by adopting a convolutional neural network and a two-way long-short-term memory network;
Dividing a certain number of positive samples and negative samples into a training set, a verification set and a test set according to a preset proportion, wherein the positive samples are manually written texts, and the negative samples are texts generated by using an AI tool;
Training the initial model by using a training set, and adjusting parameters of the initial model through a back propagation algorithm so that the model can accurately distinguish AI generated texts from manual written texts;
evaluating and optimizing the initial model by using the verification set, and selecting model parameters with optimal performance;
and testing the initial model by using the test set to finally obtain a trained AI generation detection model.
7. The intelligent interactive system according to claim 5, wherein said calculating text entropy value anomaly and term density difference of said text to be detected comprises:
dividing the text to be detected into a plurality of paragraphs according to preset text intervals, and calculating the information entropy of each paragraph;
Calculating the abnormal degree of the paragraph entropy value of each paragraph and the abnormal degree of the text entropy value of the text to be detected based on the information entropy and a preset reference entropy value;
And calculating a measured value of the special term density in the text to be detected according to a special term library, and calculating a special term density difference value between the measured value of the special term density and a preset minimum threshold value of the special term density.
8. The intelligent interaction system for multi-modal data fusion of claim 5, wherein the combining the probability values to obtain a determination of whether the text to be detected is AI-generated text comprises:
Calculating to obtain a comprehensive judgment value based on the text entropy value anomaly degree, the technical term density difference value, the probability value, the pre-allocated anomaly degree weight, the density difference value weight and the probability value weight;
And determining whether the text to be detected is an AI generated text judgment result according to the comprehensive judgment value.
9. The intelligent interaction system of multimodal data fusion of claim 1, wherein the meeting summary generation module comprises:
the text conversion unit is used for calling a voice-to-text model when the question input is a conference recording, converting the conference recording into a text recording, and checking and coloring the text recording;
the template matching unit is used for matching the target template from the conference summary template library in the knowledge base according to a preset conference summary template matching rule;
the information extraction unit is used for extracting meeting key information from the processed word records by utilizing a natural language processing technology;
And the summary generating unit is used for filling the meeting key information into the target template and generating a meeting summary corresponding to the meeting record.
10. The multi-modal data fusion intelligent interaction system of claim 9, wherein the meeting summary template matching rules include:
The attribute matching rule is used for preferentially matching templates with completely consistent attributes in a conference summary template library according to attribute information of the conference, and if the templates are not matched, matching is sequentially carried out on the superior attributes;
version matching rules are used for preferentially selecting templates with latest versions when the templates are matched with a plurality of templates meeting the requirements of the affiliated area;
And the content matching rule is used for calculating the similarity between the text record and the template content and selecting the template with the highest similarity when the content matching rule is matched with the templates with the same versions.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202511293564.3A CN120764701A (en) | 2025-09-11 | 2025-09-11 | An intelligent interactive system based on multimodal data fusion |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202511293564.3A CN120764701A (en) | 2025-09-11 | 2025-09-11 | An intelligent interactive system based on multimodal data fusion |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120764701A true CN120764701A (en) | 2025-10-10 |
Family
ID=97240004
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202511293564.3A Pending CN120764701A (en) | 2025-09-11 | 2025-09-11 | An intelligent interactive system based on multimodal data fusion |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120764701A (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090287678A1 (en) * | 2008-05-14 | 2009-11-19 | International Business Machines Corporation | System and method for providing answers to questions |
| CN117056479A (en) * | 2023-08-07 | 2023-11-14 | 广东电网有限责任公司广州供电局 | Intelligent question-answering interaction system based on semantic analysis engine |
| CN118627628A (en) * | 2024-08-13 | 2024-09-10 | 山东浪潮科学研究院有限公司 | A large language model knowledge question answering method and system integrating multimodal knowledge graph |
| CN119150814A (en) * | 2024-11-13 | 2024-12-17 | 北京奇虎科技有限公司 | Conference summary generation method, device, terminal and computer readable storage medium |
| CN119847389A (en) * | 2024-12-26 | 2025-04-18 | 深圳米唐科技有限公司 | Artificial intelligence automatic response meeting assistant |
| US20250147957A1 (en) * | 2023-11-08 | 2025-05-08 | Data Squared USA Inc. | Accuracy and providing explainability and transparency for query response using machine learning models |
-
2025
- 2025-09-11 CN CN202511293564.3A patent/CN120764701A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090287678A1 (en) * | 2008-05-14 | 2009-11-19 | International Business Machines Corporation | System and method for providing answers to questions |
| CN117056479A (en) * | 2023-08-07 | 2023-11-14 | 广东电网有限责任公司广州供电局 | Intelligent question-answering interaction system based on semantic analysis engine |
| US20250147957A1 (en) * | 2023-11-08 | 2025-05-08 | Data Squared USA Inc. | Accuracy and providing explainability and transparency for query response using machine learning models |
| CN118627628A (en) * | 2024-08-13 | 2024-09-10 | 山东浪潮科学研究院有限公司 | A large language model knowledge question answering method and system integrating multimodal knowledge graph |
| CN119150814A (en) * | 2024-11-13 | 2024-12-17 | 北京奇虎科技有限公司 | Conference summary generation method, device, terminal and computer readable storage medium |
| CN119847389A (en) * | 2024-12-26 | 2025-04-18 | 深圳米唐科技有限公司 | Artificial intelligence automatic response meeting assistant |
Non-Patent Citations (5)
| Title |
|---|
| 三视角的看见: "人类与AI生成文本的风格识别(AI文本检测)方法综述", pages 1 - 7, Retrieved from the Internet <URL:https://mp.weixin.qq.com/s?__biz=Mzg4ODU0NzE4NA==&mid=2247485089&idx=1&sn=d6632dd6f6d1340504daaa957df19a5a&poc_token=HF23JmmjG1MZPgcs6N9JPaW56GGTIdQN7sxHXbZH> * |
| 戎蓉;杨行;韩叙;胡仕: "基于信息熵和GBDT 算法的AI 生成与人类撰写检测研究", 信息技术与信息化, 25 July 2024 (2024-07-25), pages 1 - 4 * |
| 柚子科技: "AIGC检测的基本原理:原理、方法与挑战", pages 1 - 6, Retrieved from the Internet <URL:https://schooltools.blog.csdn.net/article/details/147918540?fromshare=blogdetail&sharetype=blogdetail&sharerId=147918540&sharerefer=PC&sharesource=qq_42524862&sharefrom=from_link> * |
| 第五AI: "如何通过朱雀大模型 AI 检测?2025 最新工作原理与准确率解析", pages 1 - 11, Retrieved from the Internet <URL:https://www.diwuai.com/news/1302.html> * |
| 董守斌,袁华: "网络信息检索", 30 April 2010, 西安电子科技大学出版社, pages: 85 - 88 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110717031B (en) | Intelligent conference summary generation method and system | |
| AU2019263758B2 (en) | Systems and methods for generating a contextually and conversationally correct response to a query | |
| CN119988588A (en) | A large model-based multimodal document retrieval enhancement generation method | |
| CN118708704B (en) | Data query method and device based on text processing model | |
| CN119226441A (en) | A knowledge database retrieval method based on feature extraction | |
| CN119719345A (en) | Enterprise knowledge base query method based on large language model | |
| CN119003788A (en) | Scientific literature flow chart entity and relation extraction method based on retrieval enhancement | |
| CN116049376B (en) | Method, device and system for retrieving and replying information and creating knowledge | |
| CN113094512A (en) | Fault analysis system and method in industrial production and manufacturing | |
| CN120780916B (en) | Resource recommendation method and system based on hybrid search RAG | |
| CN120296146A (en) | Government document citation retrieval method, device, equipment and medium based on big model | |
| CN120297242A (en) | Bid generation method based on retrieval-enhanced generation and large language model | |
| CN119862287A (en) | Intelligent legal document retrieval system based on vector technology | |
| CN111949781B (en) | Intelligent interaction method and device based on natural sentence syntactic analysis | |
| CN117573797A (en) | Test question retrieval method based on large language model | |
| CN120874999B (en) | Knowledge base enhancement generation method and system based on mixed retrieval and fact verification | |
| CN120144876B (en) | Data vectorization tag processing method and device based on search self-feedback | |
| CN118942104B (en) | A method and system for extracting structured information | |
| CN119719277B (en) | Professional intelligent question-answering system and method for oil-gas geological investigation | |
| CN118350368B (en) | Multi-document select and edit method of large language model based on NLP technology | |
| CN119579303A (en) | End-to-end generation method and device for customer credit rating report | |
| CN119720991A (en) | Similarity matching method for text extraction data based on semantic similarity | |
| CN120764701A (en) | An intelligent interactive system based on multimodal data fusion | |
| CN119474380B (en) | A conflict and dispute event early warning method, system, program product and storage medium | |
| CN120045705B (en) | Intelligent assessment and scoring methods, devices, equipment, media, and program products |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |