CN119047485A - Semantic feature recognition method, device, equipment and medium based on depth grammar tree - Google Patents
Semantic feature recognition method, device, equipment and medium based on depth grammar tree Download PDFInfo
- Publication number
- CN119047485A CN119047485A CN202411263093.7A CN202411263093A CN119047485A CN 119047485 A CN119047485 A CN 119047485A CN 202411263093 A CN202411263093 A CN 202411263093A CN 119047485 A CN119047485 A CN 119047485A
- Authority
- CN
- China
- Prior art keywords
- text
- word segmentation
- grammar
- model
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The application provides a semantic feature recognition method, device, equipment and medium based on a deep grammar tree, and relates to the field of natural language processing. The method of the application divides words of the input text through the text analysis model to obtain text word division characteristics, accurate word division processing can ensure correct recognition of vocabulary units in the text, grammar analysis is carried out on the input text through the grammar tree analysis algorithm to obtain grammar characteristics, and the grammar tree analysis algorithm reveals dependency and syntactic relation among words in the text, thereby being beneficial to the model to more accurately understand sentence structures. By combining text word segmentation characteristics and grammar characteristics, the relation between words in the text is subjected to association understanding, the semantic analysis model can perform deeper semantic understanding, the intrinsic meaning of the text is captured, the generated reply text is more fit with the query intention and the requirement of a user, and the accuracy of a text analysis result is improved.
Description
Technical Field
The present application relates to the field of natural language processing, and in particular, to a semantic feature recognition method, apparatus, device, and medium based on a deep syntax tree.
Background
The existing semantic feature understanding technology of the Chinese text in the industry mainly relies on word embedding feature processing of a pre-training model. Although the training is performed by possessing a large-scale corpus, the analysis and extraction of vocabulary according to text grammar cannot be performed well when the feature extraction of the downstream task is actually performed. Pre-training uses text from a general domain to model understanding, lacks understandability for specific downstream tasks in the government domain, and often uses common sense knowledge to misunderstand specialized knowledge.
In government processing scenarios, text exists that is typically difficult to understand semantically using natural language processing models. For example, the Nanjing Yangtze river bridge appears in the text, and the model is hard to understand whether the user says Nanjing/City length/Jiang Daqiao or Nanjing/Yangtze river bridge. These two interpretations differ significantly, and the prior art is more difficult to understand in such government documents.
Therefore, how to improve the accuracy of the text parsing result is a technical problem to be solved.
Disclosure of Invention
The application provides a semantic feature recognition method, device, equipment and medium based on a deep grammar tree, aiming at improving the accuracy of a text analysis result.
In a first aspect, the present application provides a semantic feature recognition method based on a depth syntax tree, the semantic feature recognition method based on the depth syntax tree comprising the steps of:
based on a text word segmentation model, word segmentation processing is carried out on an input text, and text word segmentation characteristics are obtained;
Based on a grammar tree analysis algorithm, carrying out grammar analysis on the input text to obtain grammar characteristics of the input text;
And based on a semantic analysis model, carrying out language probability prediction on the text word segmentation characteristics and the grammar characteristics, and outputting a reply text.
In a second aspect, the present application further provides a semantic feature recognition device based on a depth syntax tree, where the semantic feature recognition device based on the depth syntax tree includes:
the text word segmentation module is used for carrying out word segmentation processing on the input text based on the text word segmentation model to obtain text word segmentation characteristics;
The grammar feature recognition module is used for carrying out grammar analysis on the input text based on a grammar tree analysis algorithm to obtain grammar features of the input text;
The semantic analysis module is used for carrying out language probability prediction on the text word segmentation characteristics and the grammar characteristics based on a semantic analysis model and outputting a reply text.
In a third aspect, the present application also provides a computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the depth syntax tree based semantic feature recognition method as described above.
In a fourth aspect, the present application also provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the depth syntax tree based semantic feature identification method as described above.
The application provides a semantic feature recognition method, a semantic feature recognition device, a semantic feature recognition computer device and a semantic feature recognition storage medium based on a depth grammar tree, wherein the method comprises the steps of performing word segmentation processing on an input text based on a text word segmentation model to obtain text word segmentation features; and carrying out language probability prediction on the text word segmentation characteristics and the grammar characteristics based on a semantic analysis model, and outputting a reply text. Through the mode, the method and the device for word segmentation of the text input through the text analysis model, word segmentation characteristics of the text are obtained, correct word segmentation processing can ensure correct recognition of word units in the text, grammar analysis is carried out on the text input through the grammar tree analysis algorithm to obtain grammar characteristics, the grammar tree analysis algorithm reveals dependency and syntactic relations among words in the text, and the model is helpful for more accurately understanding sentence structures. By combining text word segmentation characteristics and grammar characteristics, the relation between words in the text is subjected to associated understanding, the semantic analysis model can perform deeper semantic understanding, the intrinsic meaning of the text is captured, the semantic analysis model is helped to perform semantic understanding analysis in the correct direction, the generated reply text is further matched with the query intention and the requirement of a user, and the accuracy of a text analysis result is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a first embodiment of a semantic feature identification method based on a deep syntax tree;
fig. 2 is a schematic structural diagram of a semantic analysis model according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a second embodiment of a semantic feature identification method based on a deep syntax tree according to the present application;
FIG. 4 is a schematic flow chart of a third embodiment of a semantic feature identification method based on a deep syntax tree according to the present application;
FIG. 5 is a schematic structural diagram of a first embodiment of a semantic feature recognition device based on a deep syntax tree according to the present application;
fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a flowchart of a first embodiment of a semantic feature recognition method based on a deep syntax tree according to the present application.
As shown in fig. 1, the semantic feature recognition method based on the depth syntax tree includes steps S101 to S103.
S101, based on a text word segmentation model, word segmentation processing is carried out on an input text, and text word segmentation characteristics are obtained;
In one embodiment, the text word segmentation model of the present application may include at least two word segmentation approaches.
Illustratively, as shown in FIG. 2, the text-word segmentation model may segment the input text in two ways. Firstly, word segmentation is carried out on an input text according to token character levels through a pre-training model, and the other mode is that word segmentation is carried out on the input text through a grammar tree analysis module, and word segmentation part of speech in the input text is identified.
In one embodiment, for different processing modes, the pre-training model needs to word the text according to token character level, that is, word by word separation, and then input the text into the pre-training model to extract the features. The disadvantage of this approach is that the word-to-word association information cannot be understood, and the misjudgment of the model is easy to induce. Therefore, the text can be automatically parsed by the grammar tree parsing module to perform better word segmentation.
Further, based on a preset word segmentation mode, segmenting the input text to obtain a second text word, wherein the preset word segmentation mode comprises word segmentation according to a word element token character level, and based on the text word segmentation model, extracting features of the second text word to obtain the text word segmentation features.
In one embodiment, token refers to a symbol used to represent a word or phrase during natural language processing. the token may be a single character or a sequence of a plurality of characters.
In one embodiment, the pre-training model performs word segmentation on the input text at token character level to obtain a second text word segment, i.e., each Chinese character or letter is processed separately, rather than combining them into a word or phrase. For example, for the Chinese text "Changjiang bridge in Nanjing, the pre-trained model would be divided into" south "," Beijing "," City "," Changjiang "," Large "," bridge ".
In one embodiment, the preprocessing model converts the encoding of each character into vector form after dividing the input text into individual characters. Text Word segmentation features may be obtained by mapping the code of each character to a vector in a high dimension by looking for a pre-trained Word embedding (e.g., word2Vec, gloVe, etc.).
In one embodiment, the vector into which a single character is converted (i.e., text-word feature) may contain semantic information for the character, and the usage and meaning of the character in different contexts may be captured in order to provide input for subsequent natural language processing tasks (e.g., feature fusion, feature matching, etc.).
S102, carrying out grammar analysis on the input text based on a grammar tree analysis algorithm to obtain grammar characteristics of the input text;
In one embodiment, the input text of the pre-training model is also input into the syntax tree parsing module, and the syntax structure of the input text is analyzed through the syntax tree parsing algorithm in the syntax tree parsing module to construct a syntax tree. Grammar features of the input text are then extracted from the constructed grammar tree to assist in understanding the deep structure and semantics of the input text.
Further, text parsing is conducted on the input text based on the grammar tree parsing algorithm to generate a grammar tree, the grammar tree is used for identifying the syntactic structure of the input text, first text word segmentation and word segmentation part of speech of each first text word segmentation are obtained, and the grammar features are generated based on the first text word segmentation and the word segmentation part of speech.
In one embodiment, the grammar tree parsing module parses a grammar of an input text to perform word segmentation. In the parse tree module, the text is divided by part of speech. For example, as shown in fig. 2, the syntax tree parsing module divides the Yangtze river bridge in the south-kyo city into two kinds of nouns, a proper noun NR and other nouns NN. The text context relevance can be effectively distinguished through the access of the grammar tree.
In one embodiment, a grammar tree is a tree-like structure in which each node represents a word or phrase and the relationships between the nodes represent their grammatical roles in a sentence.
In one embodiment, the grammar parsing algorithm will make part-of-speech tags for each word in the text, determining that it is a noun, verb, adjective, etc.
In one embodiment, the dependency relationships between words in the input text are identified by a parse tree algorithm, which defines how the words are connected to each other to form a meaningful sentence. A complete grammar tree is constructed based on the dependencies, the root of the tree is usually the subject or predicate of the sentence, and the other parts are connected to the corresponding locations of the tree based on the dependencies. Grammar features are extracted from the constructed grammar tree, and may include parts of speech, dependencies, syntactic roles, context information, and the like.
It will be appreciated that other natural language processing techniques may also be employed to extract grammatical features of the input text, such as part-of-speech tags, named entity recognition, dependency syntactic analysis, semantic role tags, and the like.
Wherein the part-of-speech tags may be used to identify the part of speech of each word in the text, such as nouns, verbs, adjectives, etc., to aid in understanding the composition of the sentence. Named entity recognition may identify specific entities in text, such as person name, place, organization, date, etc., and information extraction may be accomplished. Dependency syntax analysis is similar to syntax tree analysis in that dependency syntax analysis focuses on the dependency relationship between words. Semantic role labeling can identify semantic roles, such as agent, events, etc., played by various components in a sentence, which helps to understand the semantic structure of the sentence.
In this embodiment, one or more grammar feature extraction algorithms may be selected according to actual scene requirements, and grammar features of the input text may be extracted.
S103, based on a semantic analysis model, carrying out language probability prediction on the text word segmentation feature and the grammar feature, and outputting a reply text.
In an embodiment, the semantic parsing model takes as input the extracted text word segmentation features and grammar features. And performing cascading operation on the text word segmentation features extracted by the pre-training model and the grammar features extracted by the grammar tree analysis module, inputting a semantic analysis model, and connecting a softmax function or sigmoid to perform language probability prediction after the best fitting state is achieved through training and learning of the semantic analysis model. Finally, in a robot scenario (e.g., government robot, etc.), the reply text is output.
In one embodiment, text word segmentation features (e.g., word vectors, parts of speech, word frequencies, etc.) and grammar features (e.g., dependencies, syntactic roles, etc.) are used as inputs. The semantic parsing model is used to understand the deep meaning of the input text including, but not limited to, identifying intent, extracting key information, understanding context, etc.
In one embodiment, a statistical model or a deep learning model may be employed to predict the probability distribution of the language. For example, a language model such as an n-gram model, a Recurrent Neural Network (RNN), a long-short-term memory network (LSTM), or a transducer model may be used.
In one embodiment, the reply text is generated based on a predicted language probability distribution. The most probable word sequence may be selected or the reply text generated using a sequence-to-sequence (Seq 2 Seq) model.
In one embodiment, the context information of the dialog is considered in generating the reply text, ensuring that the reply text is consistent with the previous dialog content. Finally, the semantic analysis model outputs generated reply texts which can be used for application scenes such as chat robots, automatic customer service systems, language translation and the like.
The embodiment provides a semantic feature recognition method based on a deep grammar tree, which is used for segmenting an input text through a text analysis model to obtain text segmentation features, ensuring correct recognition of vocabulary units in the text through accurate segmentation processing, carrying out grammar analysis on the input text through a grammar tree analysis algorithm to obtain grammar features, and revealing dependency and syntactic relations among words in the text through the grammar tree analysis algorithm, so that the model is helpful for more accurately understanding sentence structures. By combining text word segmentation characteristics and grammar characteristics, the relation between words in the text is subjected to associated understanding, the semantic analysis model can perform deeper semantic understanding, the intrinsic meaning of the text is captured, the semantic analysis model is helped to perform semantic understanding analysis in the correct direction, the generated reply text is further matched with the query intention and the requirement of a user, and the accuracy of a text analysis result is improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of a semantic feature recognition method based on a depth syntax tree according to the present application.
In this embodiment, as shown in fig. 3, based on the embodiment shown in fig. 1, step S103 specifically includes steps S201 to S203.
S201, analyzing the text word segmentation characteristics and the grammar characteristics based on the semantic analysis model, and outputting a text analysis result;
In one embodiment, the text parsing result may include word segmentation results, part-of-speech tags, dependencies, syntactic role tags, intent recognition, key information extraction, and the like.
Illustratively, the word segmentation result refers to dividing the input text into separate vocabulary units, for example, the input text "what material i need to handle passport update" may be decomposed into [ "i", "need", "handle", "passport", "update", "what", "material" ].
Illustratively, parts of speech of each vocabulary are identified, such as [ ("me", "pronoun"), ("need", "verb"), ("transact", "verb"), ("passport", "noun"), ("update", "verb"), ("need", "verb"), ("which", "pronoun"), ("material", "noun") ].
Illustratively, the dependencies between words are determined, such as ("need", "transacted", "object"), ("transacted", "passport", "object").
In one embodiment, syntactic roles of individual words in the sentence are identified, such as ("i", "subject"), ("need", "predicate"), ("transact passport update", "object").
For example, understanding the intent or purpose of the text, such as intent may be "transacting passport updates".
Illustratively, important information or keywords are extracted from the input text, such as "material needed to handle passport updates".
Further, based on the grammar characteristics and the text word segmentation characteristics, the context information of the input text is identified, based on a characteristic fusion algorithm, the text word segmentation characteristics and the grammar characteristics are fused to obtain fusion characteristics, and based on the semantic analysis model, the fusion characteristics and the context information are analyzed to obtain the text analysis result.
In one embodiment, after feature extraction, the text information is fed into a semantic parsing model. This model may be a variety of types of neural networks, such as a Recurrent Neural Network (RNN), a long-short-term memory network (LSTM), or a transducer, etc., for further training and understanding semantic information of the input text.
In one embodiment, the input text is partitioned into individual vocabulary units, and each vocabulary unit is assigned a part-of-speech tag, such as verb, noun, adjective, etc., the dependencies between vocabulary units are identified, a dependency tree of the sentence is constructed, and then the role each vocabulary plays in the sentence, such as subject, object, subject, etc., is determined.
In an embodiment, the contextual information of the input text is analyzed based on grammatical features and text word segmentation features, which may include dialog history, text context, context information, and the like. The feature fusion algorithm is used to combine text word segmentation features (such as word vectors, parts of speech, word frequencies) and grammar features (dependency relationships, syntactic roles) to form fusion features. The fusion features and the context information are input into a semantic parsing model, and the model is used for understanding the deep meaning of the text through training and learning.
In one embodiment, text parsing results are generated based on the output of the semantic parsing model, which generally includes, but is not limited to, intent recognition, key information extraction, emotion analysis, language probability prediction, and generating reply text.
Where intent recognition refers to the primary intent or purpose of recognizing text. Key information extraction refers to extracting key entities, attributes or concepts in text. Emotion analysis refers to determining the emotion tendencies of a text. Language probability prediction refers to predicting the probability distribution of words or phrases in text. Generating the reply text refers to generating a proper reply or outputting according to the analysis result.
S202, carrying out probability prediction on the text recognition result based on a probability prediction algorithm to determine a text intention;
In one embodiment, a classification model can be trained using the annotated data set, which model can identify different intents. And carrying out probability prediction on the input text by using the trained model, wherein the model outputs the probability of each intention category. Then, the intention category with the highest probability is selected as the predicted intention of the text. The predicted intent and corresponding probability are output.
In one embodiment, a threshold may be set to determine the confidence level of the prediction, and the prediction may be considered reliable only if the probability of the prediction exceeds this threshold.
By way of example, if there is text entered by the user that "I need help to set up My wireless routers". The "probabilistic predictive algorithm based flow may be:
Text preprocessing, namely, word segmentation is carried out to obtain [ "I", "need", "help", "come", "set", "My", "wireless", "router" ].
Feature extraction, namely converting word segmentation results into vector forms by using word embedding.
Context information integration if the user mentioned a network problem before, this can be used as context information.
Model training-training a classification model, such as Support Vector Machines (SVMs), random forests, gradient lifts (GBMs), deep neural networks, etc., using data sets of similar intent labels.
Probability prediction, namely inputting a feature vector into a model, and predicting the probability of each intention by the model.
Intent recognition assuming that the probability of the model predicting a "technical support" intent is highest, e.g., 0.9.
Threshold setting-if 0.9 is above the set threshold (e.g., 0.7), then this prediction is accepted.
And outputting the predicted intention technical support, and attaching probability of 0.9.
S203, generating the reply text based on the recognition intention.
In one embodiment, an appropriate reply strategy is selected based on the identified intent. Reply text is generated using templates, rules, or machine learning models. Either simple text templates or complex generative models, such as deep learning based sequence-to-sequence (Seq 2 Seq) models.
By way of example, assume that the user has entered the text "I need to handle the identity card loss compensation procedure". "
The text word segmentation result is [ "I", "need", "transact", "ID card", "lose", "make up", "procedure" ]
Grammar feature parsing:
identifying part of speech [ PRP, V, V, NN, V, V, NN ]
Semantic analysis, namely extracting semantic features of the text by using a text word segmentation model.
And (3) feature fusion, namely fusing semantic features extracted from the text word segmentation model with grammar features extracted from the grammar tree analysis module to form comprehensive feature representation so as to obtain more comprehensive text representation.
Context modeling-leveraging context information, such as dependencies, syntactic structures, etc., to enhance semantic representation.
And (3) probability prediction, namely predicting that the text is intended to be 'identity card missing complement'.
Identifying text, namely determining the intention of the user as supplementing the lost identity card.
Generating a reply text, namely generating the reply text according to the identified intention, wherein the step of complementing the lost identity card is required.
Through such a process flow, the semantic understanding model can accurately recognize the intention of the user and generate a suitable reply to meet the needs of the user.
According to the embodiment, through text word segmentation, part-of-speech tagging, dependency relationship and syntactic role tagging, the model can be used for deeply understanding the structure and meaning of the text, accurately identifying the intention or purpose of the text and is helpful for providing more targeted service or response. And by utilizing the context information, such as dialogue history and scene information, the understanding of the text by the model is enhanced, and the relevance and accuracy of reply are improved.
Referring to fig. 4, fig. 4 is a flowchart illustrating a third embodiment of a semantic feature recognition method based on a depth syntax tree according to the present application.
In this embodiment, as shown in fig. 4, based on the embodiment shown in fig. 1, the step S101 further includes:
s301, obtaining model training text data;
in one embodiment, government affair dialogue related text data can be collected from data source paths such as web pages, books, news, dialogue texts and the like, and then the collected text data is processed through data preprocessing to obtain model training text data.
In an embodiment, data preprocessing may include data cleansing, data deduplication, data enhancement, data annotation, and the like.
S302, training a pre-training model based on the model training text data to obtain the semantic analysis model;
In one embodiment, a pre-trained model, such as BERT, GPT, ERNIE, may be selected that is appropriate for the semantic parsing task. And preprocessing the text data according to the requirements of the model, including word segmentation, part-of-speech tagging, vocabulary construction and the like.
In one embodiment, training parameters such as learning rate, batch size, training round number (epoch), optimizer type, etc. are set. The model is trained using model training text data training data.
In one embodiment, loss and performance metrics during training may be monitored. Model performance is evaluated on the validation set and the hyper-parameters are adjusted to optimize the model. The generalization ability of the model was evaluated using a test set.
In one embodiment, the semantic parsing capability of the model may be evaluated using indexes such as accuracy, recall, F1 score, and the like. Tuning the model according to the evaluation result may include changing the model structure, adjusting the training strategy, and the like.
S303, performing iterative optimization on model parameters of the semantic analysis model based on a text prediction result of the semantic analysis model until the text prediction accuracy of the semantic analysis model reaches a preset accuracy threshold.
In one embodiment, a target accuracy threshold may be determined as one of the stopping conditions for model training, and then the text prediction accuracy of the current model is evaluated using a validation set or test set. If the current accuracy rate does not reach the threshold, bottlenecks in the performance of the analysis model, such as analysis of error types, confusion matrices, and the like, are analyzed. Based on the error analysis, the model can be further improved by data enhancement or cleaning, and the model structure can be adjusted as needed, such as increasing or decreasing the number of layers, changing the number of neurons, etc.
Optionally, mesh search, random search or Bayesian optimization can be used to adjust the hyper-parameters, regularization techniques such as dropout, weight decay, etc. can be applied to prevent model overfitting, model retraining can be performed using updated model structures and hyper-parameters, and cross-validation can be employed to ensure stability and generalization capability of the model. Once the model reaches a predetermined accuracy, the model parameters are updated. Thus, the semantic analysis model is systematically optimized until the text prediction accuracy reaches a preset accuracy threshold.
Further, the step S103 further comprises the steps of obtaining feedback data of the text input end on the reply text, and carrying out iterative optimization training on the semantic analysis model based on the feedback data and the input text.
In one embodiment, user feedback data for the reply text is collected from the text input, where the feedback data may be a satisfaction score, a flag of whether the problem was solved, error correction or user modified text, and the like.
In one embodiment, the collected feedback data is combined with the original input text to form a labeled dataset for model training. The feedback data is analyzed to identify weaknesses of the model, such as common misunderstandings, types of errors, or causes of user dissatisfaction. For correction or feedback to the user, the callout of the original input text is updated for use in training.
In one embodiment, the updated labeled dataset is preprocessed, including word segmentation, part-of-speech labeling, vocabulary construction, etc., and the updated dataset is used to fine tune the existing model. The trimmed model performance is evaluated on the validation set, ensuring that the improvement is based on the validity of the user feedback.
Alternatively, incremental learning methods may be employed to train the model on the new data or feedback data only to maintain the model's original knowledge.
According to the method and the device, the weakness of the model is improved pertinently, so that the prediction accuracy of the model can be improved remarkably. The optimized model can better understand the user intention and provide satisfactory reply, thereby improving the user satisfaction. The model can adapt to the changes of language use and the new expression mode by learning new feedback data. Through correction and feedback of users, the model can learn to avoid common misunderstanding and errors, and improves the fluency of the conversation.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a first embodiment of a semantic feature recognition device based on a depth syntax tree according to the present application, where the semantic feature recognition device based on a depth syntax tree is used for executing the foregoing semantic feature recognition method based on a depth syntax tree.
As shown in fig. 5, the semantic feature recognition device 300 based on the deep grammar tree comprises a text word segmentation module 301, a grammar feature recognition module 302 and a semantic parsing module 303.
The text word segmentation module 301 is configured to perform word segmentation processing on an input text based on a text word segmentation model to obtain text word segmentation features;
The grammar feature recognition module 302 is configured to parse the input text based on a grammar tree parsing algorithm to obtain grammar features of the input text;
The semantic analysis module 303 is configured to predict a language probability of the text word segmentation feature and the grammar feature based on a semantic analysis model, and output a reply text.
In one embodiment, the grammar feature recognition module 302 includes:
The grammar tree generating unit is used for carrying out text analysis on the input text based on the grammar tree analysis algorithm to generate a grammar tree;
The grammar structure identification unit is used for identifying the grammar structure of the input text based on the grammar tree and obtaining first text word segmentation and word segmentation part of speech of each first text word segmentation;
and the grammar characteristic generation unit is used for generating the grammar characteristic based on the first text word segmentation and the word segmentation part of speech.
In an embodiment, the semantic parsing module 303 includes:
The text analysis unit is used for analyzing the text word segmentation characteristics and the grammar characteristics based on the semantic analysis model and outputting text analysis results;
The text prediction unit is used for carrying out probability prediction on the text recognition result based on a probability prediction algorithm to determine text intention;
and the reply text generation unit is used for generating the reply text based on the recognition intention.
In an embodiment, the text parsing unit includes:
a context information identifying subunit configured to identify context information of the input text based on the grammatical feature and the text word segmentation feature;
The feature fusion subunit is used for fusing the text word segmentation feature and the grammar feature based on a feature fusion algorithm to obtain a fusion feature;
and the text analysis subunit is used for analyzing the fusion characteristics and the context information based on the semantic analysis model to obtain the text analysis result.
In one embodiment, the text word segmentation module 301 includes:
The text word segmentation unit is used for segmenting the input text based on a preset word segmentation mode to obtain a second text word segmentation, wherein the preset word segmentation mode comprises word segmentation according to the token character level;
And the word segmentation feature extraction unit is used for carrying out feature extraction on the second text word segmentation based on the text word segmentation model to obtain the text word segmentation features.
In an embodiment, the semantic feature recognition device 300 based on the depth syntax tree further includes a model training module, where the model training module specifically includes:
the training data acquisition unit is used for acquiring model training text data;
The model training unit is used for training a pre-training model based on the model training text data to obtain the semantic analysis model;
The model optimization unit is used for carrying out iterative optimization on the model parameters of the semantic analysis model based on the text prediction result of the semantic analysis model until the text prediction accuracy of the semantic analysis model reaches a preset accuracy threshold.
In an embodiment, the semantic feature recognition device 300 based on the depth syntax tree further includes a model optimization module, where the model optimization module specifically includes:
The feedback data acquisition unit is used for acquiring feedback data of the text input end on the reply text;
And the model iteration training unit is used for carrying out iteration optimization training on the semantic analysis model based on the feedback data and the input text.
It should be noted that, for convenience and brevity of description, a person skilled in the art may clearly understand that, for the specific working process of the above-described apparatus and each module, reference may be made to the corresponding process in the foregoing embodiment of the semantic feature recognition method based on the depth syntax tree, which is not described herein again.
The apparatus provided by the above embodiments may be implemented in the form of a computer program which may be run on a computer device as shown in fig. 6.
Referring to fig. 6, fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server.
With reference to FIG. 6, the computer device includes a processor, memory, and a network interface connected by a system bus, where the memory may include a non-volatile storage medium and an internal memory.
The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause a processor to perform any of a number of semantic feature recognition methods based on a deep syntax tree.
The processor is used to provide computing and control capabilities to support the operation of the entire computer device.
The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of semantic feature recognition methods based on a deep syntax tree.
The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
It should be appreciated that the processor may be a central processing unit (CentralProcessingUnit, CPU), which may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), field programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:
based on a text word segmentation model, word segmentation processing is carried out on an input text, and text word segmentation characteristics are obtained;
Based on a grammar tree analysis algorithm, carrying out grammar analysis on the input text to obtain grammar characteristics of the input text;
And based on a semantic analysis model, carrying out language probability prediction on the text word segmentation characteristics and the grammar characteristics, and outputting a reply text.
In an embodiment, when implementing the syntax tree based parsing algorithm, the processor is configured to parse the input text to obtain syntax features of the input text, to implement:
Based on the grammar tree analysis algorithm, carrying out text analysis on the input text to generate a grammar tree;
Based on the grammar tree, identifying the syntactic structure of the input text, and obtaining first text word segmentation and word segmentation part of speech of each first text word segmentation;
the grammar feature is generated based on the first text word segmentation and the word part of speech of the word segmentation.
In an embodiment, when implementing the semantic parsing model, the processor performs language probability prediction on the text word segmentation feature and the grammar feature, and outputs a reply text, the processor is configured to implement:
analyzing the text word segmentation characteristics and the grammar characteristics based on the semantic analysis model, and outputting a text analysis result;
based on a probability prediction algorithm, carrying out probability prediction on the text recognition result to determine a text intention;
the reply text is generated based on the recognition intent.
In an embodiment, when the processor implements the parsing the text word segmentation feature and the grammar feature based on the semantic parsing model, the processor is configured to implement:
identifying contextual information of the input text based on the grammatical features and the text word segmentation features;
Based on a feature fusion algorithm, fusing the text word segmentation features and the grammar features to obtain fusion features;
And analyzing the fusion characteristics and the context information based on the semantic analysis model to obtain the text analysis result.
In an embodiment, when implementing the text word segmentation model and performing word segmentation processing on the input text, the processor is configured to implement:
Performing word segmentation on the input text based on a preset word segmentation mode to obtain a second text word segmentation, wherein the preset word segmentation mode comprises word segmentation according to a token character level;
And carrying out feature extraction on the second text word segmentation based on the text word segmentation model to obtain the text word segmentation features.
In an embodiment, before implementing the text word segmentation model and performing word segmentation processing on the input text, the processor is further configured to implement:
Obtaining model training text data;
training a pre-training model based on the model training text data to obtain the semantic analysis model;
and carrying out iterative optimization on model parameters of the semantic analysis model based on a text prediction result of the semantic analysis model until the text prediction accuracy of the semantic analysis model reaches a preset accuracy threshold.
In an embodiment, after implementing the semantic parsing model, the processor performs language probability prediction on the text word segmentation feature and the grammar feature, and outputs a reply text, the processor is further configured to implement:
Acquiring feedback data of a text input end for the reply text;
and carrying out iterative optimization training on the semantic analysis model based on the feedback data and the input text.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, the computer program comprises program instructions, and the processor executes the program instructions to realize any semantic feature recognition method based on the deep grammar tree.
The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk provided on the computer device, a smart memory card (SMARTMEDIACARD, SMC), a secure digital (SecureDigital, SD) card, a flash memory card (FLASHCARD), or the like.
While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.
Claims (10)
1. A semantic feature recognition method based on a deep syntax tree, the method comprising:
based on a text word segmentation model, word segmentation processing is carried out on an input text, and text word segmentation characteristics are obtained;
Based on a grammar tree analysis algorithm, carrying out grammar analysis on the input text to obtain grammar characteristics of the input text;
And based on a semantic analysis model, carrying out language probability prediction on the text word segmentation characteristics and the grammar characteristics, and outputting a reply text.
2. The method for identifying semantic features based on a depth syntax tree according to claim 1, wherein the syntax tree parsing algorithm parses the input text to obtain the syntax features of the input text, and comprises:
Based on the grammar tree analysis algorithm, carrying out text analysis on the input text to generate a grammar tree;
Based on the grammar tree, identifying the syntactic structure of the input text, and obtaining first text word segmentation and word segmentation part of speech of each first text word segmentation;
the grammar feature is generated based on the first text word segmentation and the word part of speech of the word segmentation.
3. The deep grammar tree-based semantic feature recognition method of claim 1, wherein the semantic parsing model-based language probability prediction of the text word segmentation feature and the grammar feature, outputting a reply text, comprises:
analyzing the text word segmentation characteristics and the grammar characteristics based on the semantic analysis model, and outputting a text analysis result;
based on a probability prediction algorithm, carrying out probability prediction on the text recognition result to determine a text intention;
the reply text is generated based on the recognition intent.
4. The semantic feature recognition method based on a deep grammar tree according to claim 3, wherein the parsing the text word segmentation feature and the grammar feature based on the semantic parsing model, outputting a text parsing result, comprises:
identifying contextual information of the input text based on the grammatical features and the text word segmentation features;
Based on a feature fusion algorithm, fusing the text word segmentation features and the grammar features to obtain fusion features;
And analyzing the fusion characteristics and the context information based on the semantic analysis model to obtain the text analysis result.
5. The semantic feature recognition method based on a deep grammar tree according to claim 1, wherein the text word segmentation model is used for word segmentation processing of input text to obtain text word segmentation features, and the method comprises the following steps:
Performing word segmentation on the input text based on a preset word segmentation mode to obtain a second text word segmentation, wherein the preset word segmentation mode comprises word segmentation according to a token character level;
And carrying out feature extraction on the second text word segmentation based on the text word segmentation model to obtain the text word segmentation features.
6. The semantic feature recognition method based on a deep grammar tree according to claim 1, wherein the text word segmentation model is used for word segmentation processing of input text, and before obtaining text word segmentation features, the method further comprises:
Obtaining model training text data;
training a pre-training model based on the model training text data to obtain the semantic analysis model;
and carrying out iterative optimization on model parameters of the semantic analysis model based on a text prediction result of the semantic analysis model until the text prediction accuracy of the semantic analysis model reaches a preset accuracy threshold.
7. The method for identifying semantic features based on a deep grammar tree according to claim 1, wherein the semantic parsing model predicts the language probability of the text word segmentation feature and the grammar feature, and after outputting the reply text, the method further comprises:
Acquiring feedback data of a text input end for the reply text;
and carrying out iterative optimization training on the semantic analysis model based on the feedback data and the input text.
8. A depth syntax tree based semantic feature recognition apparatus, wherein the depth syntax tree based semantic feature recognition apparatus comprises:
the text word segmentation module is used for carrying out word segmentation processing on the input text based on the text word segmentation model to obtain text word segmentation characteristics;
The grammar feature recognition module is used for carrying out grammar analysis on the input text based on a grammar tree analysis algorithm to obtain grammar features of the input text;
The semantic analysis module is used for carrying out language probability prediction on the text word segmentation characteristics and the grammar characteristics based on a semantic analysis model and outputting a reply text.
9. A computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the depth syntax tree based semantic feature identification method as claimed in any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the depth syntax tree based semantic feature recognition method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411263093.7A CN119047485A (en) | 2024-09-09 | 2024-09-09 | Semantic feature recognition method, device, equipment and medium based on depth grammar tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411263093.7A CN119047485A (en) | 2024-09-09 | 2024-09-09 | Semantic feature recognition method, device, equipment and medium based on depth grammar tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN119047485A true CN119047485A (en) | 2024-11-29 |
Family
ID=93587568
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411263093.7A Pending CN119047485A (en) | 2024-09-09 | 2024-09-09 | Semantic feature recognition method, device, equipment and medium based on depth grammar tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119047485A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119886121A (en) * | 2025-03-27 | 2025-04-25 | 上海甄零科技有限公司 | Legal dictionary intelligent generation method and system |
-
2024
- 2024-09-09 CN CN202411263093.7A patent/CN119047485A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119886121A (en) * | 2025-03-27 | 2025-04-25 | 上海甄零科技有限公司 | Legal dictionary intelligent generation method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Arora et al. | Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis | |
Kim et al. | Two-stage multi-intent detection for spoken language understanding | |
CN108304468B (en) | Text classification method and text classification device | |
CN111611810B (en) | Multi-tone word pronunciation disambiguation device and method | |
Nguyen et al. | Distinguishing antonyms and synonyms in a pattern-based neural network | |
US9645988B1 (en) | System and method for identifying passages in electronic documents | |
US10528664B2 (en) | Preserving and processing ambiguity in natural language | |
CN105528349A (en) | Method and apparatus for analyzing question based on knowledge base | |
JP2014120053A (en) | Question answering device, method, and program | |
CN117251524A (en) | Short text classification method based on multi-strategy fusion | |
CN110929518A (en) | Text sequence labeling algorithm using overlapping splitting rule | |
US20250217421A1 (en) | Web crawler system that crawls internet articles and provides a summary service of issue article affecting global value chain | |
CN119047485A (en) | Semantic feature recognition method, device, equipment and medium based on depth grammar tree | |
CN113158675B (en) | Entity extraction method, device, equipment and medium based on artificial intelligence | |
Gholami-Dastgerdi et al. | Part of speech tagging using part of speech sequence graph | |
CN118395987A (en) | BERT-based landslide hazard assessment named entity identification method of multi-neural network | |
Ananth et al. | Grammatical tagging for the Kannada text documents using hybrid bidirectional long-short term memory model | |
Mammadov et al. | Part-of-speech tagging for azerbaijani language | |
Lim et al. | Real-world sentence boundary detection using multitask learning: A case study on French | |
Ramesh et al. | Interpretable natural language segmentation based on link grammar | |
CN118070782A (en) | Text error correction method, device, equipment and medium | |
Sampath et al. | Hybrid Tamil spell checker with combined character splitting | |
Sathyanarayanan et al. | Kannada named entity recognition and classification using bidirectional long short-term memory networks | |
CN114691716A (en) | SQL statement conversion method, device, equipment and computer readable storage medium | |
Croce et al. | Grammatical Feature Engineering for Fine-grained IR Tasks. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |