[go: up one dir, main page]

CN117056524B - Aspect-level sentiment analysis method and system based on domain knowledge graph - Google Patents

Aspect-level sentiment analysis method and system based on domain knowledge graph Download PDF

Info

Publication number
CN117056524B
CN117056524B CN202310278253.4A CN202310278253A CN117056524B CN 117056524 B CN117056524 B CN 117056524B CN 202310278253 A CN202310278253 A CN 202310278253A CN 117056524 B CN117056524 B CN 117056524B
Authority
CN
China
Prior art keywords
target
domain knowledge
descriptor
knowledge graph
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310278253.4A
Other languages
Chinese (zh)
Other versions
CN117056524A (en
Inventor
熊熙
王江河
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liu Ting
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310278253.4A priority Critical patent/CN117056524B/en
Publication of CN117056524A publication Critical patent/CN117056524A/en
Application granted granted Critical
Publication of CN117056524B publication Critical patent/CN117056524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an aspect emotion analysis method and system based on a domain knowledge graph, the method comprises the steps of obtaining a target text, analyzing and extracting target aspect words, constructing a mask language template based on prompt learning, constructing the domain knowledge graph of the domain related to the target text, inputting the target text and the mask language template into a pre-trained language model for mask language modeling after splicing, inquiring a target aspect category in the domain knowledge graph according to the target aspect words, extracting all description words related to the target aspect category as target aspect description words to fill in masks, and respectively predicting the aspect emotion polarity of the target aspect description words to obtain the aspect emotion polarity of the target text. Under the condition of zero sample or few samples, the method can keep higher prediction accuracy aiming at ATSC and ACSC tasks, and avoid the problems of deviation existing in manual input prompt and limitation of coverage range of tag words.

Description

Aspect-level emotion analysis method and system based on domain knowledge graph
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an aspect-level emotion analysis method and system based on a domain knowledge graph.
Background
Emotion analysis is an important research direction in the field of Natural Language Processing (NLP) and focuses on extracting and analyzing emotion expressed in text data. Aspect-level emotion analysis (ABSA) is a subtask of emotion analysis that provides a more detailed analysis of emotion for a particular item or category. Such insight can be exploited by businesses to identify the merits of products and reveal unexpected aspects that produce positive or negative emotions. ABSA can be divided into two variants, aspect category emotion classification (ACSC) and aspect target emotion classification (ATSC). ACSC is directed to determining overall emotion to a broader category or aspect, while ATSC is directed to determining emotion to a more specific target or entity. By using ABSA, an enterprise can more fully understand the opinion and attitudes of a customer for its products or services, which can provide information support for product development, marketing strategies, and customer service planning.
In recent years, an important transition in the field of natural language processing has emerged, namely a shift towards pre-trained language models (PLMs) that rely on their excellent performance as the basis for many downstream tasks, and researchers have conducted extensive research to gain insight into the principles of PLM effects, which gain a rich knowledge during pre-training. Therefore, how to excite and utilize such knowledge has attracted increasing attention. Fine-Tuning (Fine-Tuning) is a conventional approach to achieving this goal, with an additional classifier added on top of PLMs and the model further trained according to the classification goals. Fine tuning achieves satisfactory results by supervising the learning strategy. However, applying fine tuning to both the low sample learning and zero sample learning scenarios remains a challenging task because the additional classifier requires a sufficient amount of labeled training data to adjust.
Recent studies have shown that Prompt Learning (Prompt Learning) can be used to connect pre-training targets with downstream tasks, improving the performance of pre-training language models on both low-sample and zero-sample tasks. For example, studies using GPT-3 and LAMA for model tuning have shown that using discrete or continuous promt can improve PLMs performance on these tasks. This association between vocabulary and tag space has a significant impact on classification performance.
Each template of the conventional prompt learning method employs a one-to-one mapping of manual tag words to linguistic modules. However, manual lexicon is often limited to limited information, and it is difficult to make accurate predictions based on limited information. This one-to-one mapping limits the coverage of tag words and therefore lacks sufficient information to make an accurate prediction, which may introduce bias into the lexicographer. Furthermore, in the ABSA task, people often use more specific and targeted adjectives when assessing different aspects, such as "savoury" rather than "good" when assessing food, and the importance of the semantics of such tag words in predictions is not addressed by existing prompt learning methods.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a new idea for enhancing the prompt performance in the ABSA task by integrating an external knowledge graph, an effective domain knowledge graph is constructed by using the combination of word network extraction and small-scale data annotation, and a linguistic module s generated by the method provides richer and more specific label word mapping, so that the accuracy of the ABSA task is improved.
In order to achieve the technical purpose, the technical scheme provided by the invention comprises the following steps:
The aspect-level emotion analysis method based on the domain knowledge graph comprises the following steps of:
Obtaining a target text, analyzing and extracting target aspect words of the target text, and constructing a mask language template based on prompt learning;
Constructing a domain knowledge graph of the domain involved in the target text, wherein the domain knowledge graph comprises a plurality of aspects contained in the domain and the corresponding relation between an upper descriptor and a lower descriptor and between the upper descriptor and the lower descriptor;
After splicing the target text and the mask language template, inputting the spliced target text and mask language template into a pre-trained language model for mask language modeling, inquiring a target aspect category in the domain knowledge graph according to the target aspect word, extracting all description words associated with the target aspect category as target aspect description words, and filling the target aspect description words into a mask;
And respectively predicting the aspect emotion polarity of the target aspect descriptor to obtain the aspect emotion polarity of the target text.
In some preferred embodiments, the method for constructing the domain knowledge graph of the domain involved in the target text includes:
acquiring a plurality of evaluation texts which are the same as the field related to the target text, extracting aspect words and description words of all aspects contained in the evaluation texts, and establishing an aspect-description word list;
based on the aspect-descriptor list, extracting the aspect category and the corresponding descriptor-descriptor and descriptor-aspect relation in the word network database, and constructing the domain knowledge graph.
In some preferred embodiments, the method for obtaining the aspect-level emotion polarity of the target text includes:
and taking the aspect emotion polarity mapped by the target aspect descriptor with the highest prediction score as the aspect emotion polarity of the target text.
In some preferred embodiments, the method for obtaining the aspect emotion polarity of the target text comprises the following steps:
And carrying out weighted average on the prediction scores of all target aspect descriptors according to the aspect emotion polarity categories, and taking the aspect emotion polarity with the highest weighted average as the aspect emotion polarity of the target text.
In some preferred embodiments, the language model comprises a BERT model or a GPT model.
In some preferred embodiments, the word network database includes a word network word library and/or a Chinese word network word library.
The invention also provides an aspect-level emotion analysis system based on the domain knowledge graph, which comprises the following steps:
The first acquisition module is used for acquiring target texts and analyzing and extracting target aspect words of the target texts;
The domain knowledge graph module is set to be internally provided with a domain knowledge graph containing a plurality of aspects of the domain related to the target text and the corresponding upper descriptor-lower descriptor and descriptor-aspect relation;
The mask language template module is provided with a mask language template based on prompt learning, and the target text and the mask language template are spliced to be used as a first output text;
the pre-training language model is set to be built in with a pre-trained language model, a first output text is input for mask language modeling, target aspect categories are inquired in the domain knowledge graph according to target aspect words, all description words associated with the target aspect categories are extracted to serve as target aspect description words, and mask positions are filled in;
and the prediction analysis module is used for respectively predicting the aspect emotion polarities of the target aspect descriptors to obtain the aspect emotion polarities of the target texts.
In some preferred embodiments, the method for constructing the domain knowledge graph includes:
acquiring a plurality of evaluation texts which are the same as the field related to the target text, extracting aspect words and description words of all aspects contained in the evaluation texts, and establishing an aspect-description word list;
based on the aspect-descriptor list, extracting the aspect category and the corresponding descriptor-descriptor and descriptor-aspect relation in the word network database, and constructing the domain knowledge graph.
In some preferred embodiments, the prediction analysis module includes a first prediction analysis unit configured to set, as the aspect emotion polarity of the target text, the aspect emotion polarity mapped by the target aspect descriptor having the highest prediction score.
In some preferred embodiments, the prediction analysis module includes a second prediction analysis unit configured to perform weighted average on the prediction scores of all the target aspect descriptors according to the aspect emotion polarity categories, and take the aspect emotion polarity with the highest weighted average as the aspect emotion polarity of the target text.
Advantageous effects
The comprehensive description words and entity relations are obtained by utilizing the domain knowledge graph of the related domain with stronger pertinence, and under the condition of zero samples or few samples, the high prediction accuracy can be maintained for the tasks of ATSC and ACSC, so that the problems of possible deviation of manually inputting prompt words and limitation of the coverage range of tag words are avoided.
Drawings
FIG. 1 is a schematic diagram of steps of an aspect emotion analysis method based on domain knowledge graph in a preferred embodiment of the present invention;
FIG. 2 is a schematic diagram of the steps of a method for constructing a domain knowledge graph of a domain involved in constructing a target text according to another preferred embodiment of the present invention;
FIG. 3 is a schematic diagram of an aspect emotion analysis system based on domain knowledge graph in a preferred embodiment of the present invention;
FIG. 4 is a flow and structure diagram of a mask language template constructed in accordance with another preferred embodiment of the present invention;
FIG. 5 is a statistical graph of experimental results of a baseline model of another preferred embodiment of the present invention and the current mainstream in the art;
FIG. 6 is a plot of experimental results from a baseline model of the current state-of-the-art mainstream in another preferred embodiment of the invention;
Detailed Description
The present invention will be further described with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. In the description of the present invention, it should be understood that the terms "upper," "lower," "front," "rear," "left," "right," "top," "bottom," "inner," "outer," and the like indicate or are based on the orientation or positional relationship shown in the drawings, merely to facilitate description of the present invention and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Example 1
As shown in fig. 1, the embodiment provides an aspect-level emotion analysis method based on a domain knowledge graph, which includes the following steps:
S1, acquiring a target text, analyzing and extracting target aspect words of the target text, and constructing a mask language template based on prompt learning.
The Aspect (Aspect) mainly comprises a category (Categary) and a target (Item), and predictions for the category (Categary) and the target (Item) are respectively called Aspect category emotion Analysis (ACSC) and Aspect target emotion Analysis (ATSC), and the two are collectively called Aspect level emotion analysis (ABSA). ACSC is directed to determining overall emotion to a broader category or aspect, while ATSC is directed to determining emotion to a more specific target or entity. By using ABSA, the user's opinion and attitude towards the target product or service can be more fully understood, which can provide information support for the enterprise for product development, marketing strategies, and customer service planning.
The target text refers to text data that needs to predict the polarity of emotion at the aspect level that it contains, and contains different words that people use for feeling different aspects of a specific target, for example, food is often described as "savoury" or "unpalatable", and the environment may be marked as "clean" or "cluttered".
The extracting the target aspect words of the target text refers to that an aspect word Extraction task (ASPECT TERM Extraction, ATE) is a basic subtask in aspect-level emotion analysis. The goal is to give a target text and the goal of ATE is to extract the aspect phrases that are expressed by the user as emotions. For example, for comment "The Bombay style bhelpuriis very paltable," ATE wishes to extract the aspect word "bhelpuri". There is a great deal of research in the prior art on this task, including specifically various models and algorithms based on unsupervised, semi-supervised and unsupervised, and those skilled in the art can select an appropriate model in the prior art to accomplish this task according to actual needs, which the present invention is not limited to further.
The Prompt Learning (Prompt Learning) is a fourth stage (also called a fourth model) in the field of currently recognized Natural Language Processing (NLP), i.e., pre-training- > prompting- > prediction, which focuses on adapting a downstream task to a language model, rather than adapting a previous language model to a change of the downstream task, so parameters of the pre-training model are not generally changed. In particular, a contextual cue is provided about the input, entering the model with the input, telling the model what task you should do next. For the task of the present invention, it is to formalize the aspect-level emotion classification task as a mask language modeling problem. Specifically, the input sequence is packaged using a template, which is a piece of natural language text. The following template examples:
I felt the{aspect}was[MASK].
I[MASK]{aspect}.
The{aspect}made me feel[MASK].
The{aspect}is[MASK].
In which { aspect } is a target aspect, extraction is completed by a pre-step, and [ MASK ] is a MASK of an aspect descriptor, as shown in fig. 4. The specific choice or construction of what mask language templates may be made by those skilled in the art based on the prior art, and the invention is not further limited.
S2, constructing a domain knowledge graph of the domain related to the target text, wherein the domain knowledge graph comprises a plurality of aspects contained in the domain and the corresponding relation between the upper descriptor and the lower descriptor.
It should be appreciated that in ATSC or ACSC tasks, the core challenge is to extend ternary emotion to finer granularity emotion. Different words are used to describe the perception of different aspects. For example, food products are often described as "savoury" or "bad-eating", while the environment may be marked as "clean" or "messy". Predicting emotion of a sentence of a particular aspect using hint learning involves predicting mask words based on context. This process is not a simple option as there may be many suitable words in such a context. Thus, the descriptor of the ABSA task should have three characteristics, broad coverage, less subjective bias, and strong relevance in terms. Such comprehensive predictions are critical to the pre-trained language model (PLM) and are also the essence of prompt learning. The invention completes the structuring of the external knowledge by using the domain knowledge graph which is strongly related to the related domain, so that the description words simultaneously meet the three requirements.
In some preferred embodiments, a specific method for constructing a domain knowledge graph of a domain involved in a target text is provided, as shown in fig. 2, including:
S201, acquiring a plurality of evaluation texts which are the same as the field related to the target text, extracting aspect words and description words of all aspects contained in the evaluation texts, and establishing an aspect-description word list. Wherein the number of the evaluation texts is not required to be too large, and the acquired sources can be an evaluation database in the disclosed target field and the like.
In the following, a detailed description is given taking as an example the creation of a list of aspect-descriptors of a restaurant domain, it being understood that the description herein should not be construed as a unique limitation of this step, but rather a representation of the operating logic, it will be apparent to those skilled in the art that similar processing steps can be employed with respect to a list of aspect-descriptors of other domains in the light of this logic.
One thousand untagged data items were first randomly selected from the yellow open dataset, including a large number of user-generated restaurant reviews. These comments cover different aspects of the dining experience, from food quality to employee friendliness, to restaurant atmosphere, and so on. Next, selected reviews are manually or automatically annotated to determine words and phrases that express particular emotional tendencies for different aspects of the restaurant. In particular, eight key aspects are mainly focused on food quality, quality of service, atmosphere, menus, staff, price, location and overall experience. For each aspect, an aspect-descriptor list is generated to capture all emotions expressed by the reviewers in the dataset. These words include positive descriptors such as "delicacy" and "perfect", and negative descriptors such as "disappointing" and "unprofitable". By annotating a subset of the yellow dataset in this manner, a small and targeted evaluation vocabulary can be created that contains words that are highly relevant to the restaurant review area. These words can be used as descriptors of the ABSA task, which helps to improve the accuracy and granularity of emotion analysis in the field. Obviously, this method can also be applied to other fields with limited evaluation vocabularies.
S202, based on the aspect-descriptor list, extracting the aspect category and the corresponding descriptor-descriptor and descriptor-aspect relation in the word network database, and constructing a domain knowledge graph. The word network database refers to a dictionary made based on cognitive linguistics, is a word network composed according to the meaning of single words, covers a wide-range word semantic network, nouns, verbs, adjectives and adverbs, is respectively organized into a synonym network, each synonym set represents a basic semantic concept, and the sets are connected by various relations. Is a popular tool dataset in the NLP field at present, and the representative databases comprise WordNet of English class, babelNet of multilingual class, chinese WordNet of Chinese class and the like. Since such a word network database has a broad set of synonyms, these sets of synonyms group nouns, verbs, adjectives, and adverbs according to cognitive similarity. And has a clear hierarchical structure, organizing the synonym sets into a network of concept-semantic and lexical relationships.
For a knowledge graph, it is composed of entities (such as "Zhang san"), concepts (such as "actors"), generic relationships between entities and concepts (also known as isA relationships, such as "Zhang san isA actors"), generic relationships between concepts (such as "movie actors" are a subclass of "actors"). If A is B, it is commonly referred to as A is a hyponym of B (hyponym), or B is an hypernym of A (hypernym).
Taking WordNet as an example, ISA relationships are the most commonly encoded relationships between synonym sets, linking more general categories with specific categories. Hyponyms are relationships between sub-categories and their parent categories, and hyponym relationships are transitive, so that meaningful relationships between concepts in a given domain can be derived therefrom.
S3, inputting the spliced target text and mask language templates into a pre-trained language model for mask language modeling, inquiring target aspect categories in the domain knowledge graph according to target aspect words, extracting all description words associated with the target aspect categories as target aspect description words, and filling the target aspect description words into a mask;
It should be noted that, for the ATSC task, the target aspect word extracted in the pre-step is an aspect target, which can be understood as a lower concept of an aspect category (equivalent to a attendant versus foreground service), at this time, the aspect category to which the target aspect word belongs needs to be queried in the domain knowledge graph by using the target aspect word, and then a polar word associated with the category is extracted as a description word of the aspect, which ensures that the verbalization program adjusts for a specific context, and captures subtle differences of category emotion. For the ACSC task, the target aspect words extracted in the pre-step are aspect categories, so that the polar vocabulary associated with the category can be directly extracted as the description words of the aspect. Such a domain knowledge graph using method makes use of well-defined categories in the domain knowledge graph, so that an appropriate linguistic module of each entity can be efficiently identified and used. In some particular embodiments, certain classes or terms of descriptor data may be limited or non-existent, and in the event of such data scarcity, efficient linguistic modules may be generated by utilizing the rich semantic relationships contained in the domain knowledge graph.
S4, predicting the aspect emotion polarities of the target aspect descriptors respectively to obtain the aspect emotion polarities of the target text. It is common for people to use different words in different sentences to express the same emotion polarity for a particular aspect, and the purpose of this step is to determine the tag word that contributes most to the predictive tag, assuming that the mapping between tag words and emotion polarities is accurate. Because the number of samples in each target field is different, different criteria can be used to measure the emotion polarity reflected by the predictive score.
In some zero sample embodiments, the maximum value of the predictive score is used instead of the average value to determine the polarity of emotion reflected thereby, thus eliminating the need for adjustments to the training parameters of the model. The prediction score refers to the probability of filling target aspect descriptors in the mask related to the aspects, and is recorded as
The specific prediction formula is as follows:
Wherein, For emotion polarity labels, respectively comprising "Positive" (labeled 0), "Negative" (labeled 1) or "Neutral" (labeled 2); Representing a language model pre-trained in a large corpus; representing emotion polarity of text Wherein θ represents the number of emotion polarities; representing a set of aspect descriptors and defining a function f: [ MASK ] represents a target aspect descriptor; representing mask language templates, e.g. XThe{aspect}is[MASK];
In some embodiments with fewer samples, the aspect emotion polarity with the highest weighted average is taken as the aspect emotion polarity of the target text after the prediction scores are weighted and averaged according to the aspect emotion polarity class. The specific prediction formula is as follows:
Wherein s (y|x p) is defined as:
the function may continuously optimize the prediction probability using a cross entropy loss function.
Example 2
As shown in fig. 2, this embodiment is developed on the basis of the above embodiment 1, and this embodiment provides an aspect emotion analysis system based on a domain knowledge graph, including:
The first acquisition module is used for acquiring target texts and analyzing and extracting target aspect words of the target texts;
The domain knowledge graph module is set to be internally provided with a domain knowledge graph containing a plurality of aspects of the domain related to the target text and the corresponding upper descriptor-lower descriptor and descriptor-aspect relation;
The mask language template module is provided with a mask language template based on prompt learning, and the target text and the mask language template are spliced to be used as a first output text;
the pre-training language model is set to be built in with a pre-trained language model, a first output text is input for mask language modeling, target aspect categories are inquired in the domain knowledge graph according to target aspect words, all description words associated with the target aspect categories are extracted to serve as target aspect description words, and mask positions are filled in;
and the prediction analysis module is used for respectively predicting the aspect emotion polarities of the target aspect descriptors to obtain the aspect emotion polarities of the target texts.
In some preferred embodiments, the method for constructing the domain knowledge graph includes:
acquiring a plurality of evaluation texts which are the same as the field related to the target text, extracting aspect words and description words of all aspects contained in the evaluation texts, and establishing an aspect-description word list;
based on the aspect-descriptor list, extracting the aspect category and the corresponding descriptor-descriptor and descriptor-aspect relation in the word network database, and constructing the domain knowledge graph.
In some preferred embodiments, the prediction analysis module includes a first prediction analysis unit configured to set, as the aspect emotion polarity of the target text, the aspect emotion polarity mapped by the target aspect descriptor having the highest prediction score.
In some preferred embodiments, the prediction analysis module includes a second prediction analysis unit configured to perform weighted average on the prediction scores of all the target aspect descriptors according to the aspect emotion polarity categories, and take the aspect emotion polarity with the highest weighted average as the aspect emotion polarity of the target text.
Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server. In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 3
This embodiment is developed on the basis of embodiment 1 described above. This example shows the course of an experiment performed on different data sets and a comparison of the results.
Data set
In the embodiment, experiments are carried out on two data sets, namely a MAMS data set and a SemEval 2014task4 data set, and the effectiveness of the aspect emotion analysis method based on the domain knowledge graph provided by the invention is improved. All data set labels are limited to three types of positive, negative and neutral. Wherein neutral means that no positive or negative emotion is expressed.
Table 1 below is the statistical result for the two data sets described above.
Table 1 data set parameter statistics
Of these, semEval 2014task 4 dataset, which is widely used as a benchmark for evaluating performance on ATSC and ACSC tasks, contains english comment sentences for two target areas (notebook and restaurant). To evaluate the performance of the proposed method and baseline on the field test set and ensure comparability, two pretreatment steps need to be taken. First, the instance with conflicting emotion tags is deleted, which was rare in previous studies and is usually ignored. Second, sentences having multiple emotion aspects are split into separate instances, each focusing on a single emotion aspect.
The MAMS dataset includes sentences that contain at least two emotion attributes, where the emotions of these attributes differ. The MAMS dataset includes two versions for ACSC and ATSC tasks, respectively. For an ATSC dataset, researchers have extracted the attributed words in sentences and labeled the emotion polarity of each attribute. Sentences containing only one attribute or multiple attributes with the same emotion polarity are removed. For ACSC datasets, the author has predefined eight categories, including food, service, staff, price, environment, menu, location, and confusion categories. And according to the category information, the researchers carry out corresponding category and emotion polarity labeling on the sentences.
Experimental setup
The performance of the proposed method and baseline method is evaluated over a range of training data sizes throughout the field, covering the case from zero sample to full sample (i.e. full supervision). Specifically, our model is trained using a randomly resampled training set size, the number of samples including { Zero,16,64,256,1024, full }. For all experiments, BERT and GPT-2 were used as pre-trained language models. To evaluate performance, accuracy and Micro-F1 were used as test indicators. To ensure robustness, five random seeds were used for each cue and baseline, and their scores were averaged, and the average performance across all four cues was calculated.
Reference model
The following reference model is selected for comparison with the method proposed by the present invention, comprising:
(1)the last hidden state of the[CLS]token in BERT(referred to as BERT[CLS]);
(2)the NSP head of BERT(referred to as BERT NSP);
(3)BERT LM;
(4)GPT-2LM\cite{seoh2021open};
(5)CapsNet-BERT\cite{jiang2019challenge}.
results and analysis
5-6, The experimental results of the aspect emotion analysis method based on the domain knowledge graph and the baseline model of the current mainstream in the field are shown. The KG BERT LM and KG GPT-2LM respectively represent the aspect-level emotion analysis method provided by the invention, and the BERT model and the GPT-2 model are respectively adopted as a prediction model established by a pre-trained language model.
As can be seen by comparison, when the invention performs comparison of different tasks on two different data sets, the comparison is superior to the reference model in all cases, and the effectiveness of improving the performance by utilizing the domain knowledge graph is shown. Specifically, for small sample learning, compared with the existing reference model, the invention realizes larger performance improvement, and the accuracy (Acc) and Micro-F1 value (MF 1) are obviously higher than those of other prediction models with the same available label number. In addition, the invention performs well in the case of zero samples, and is obviously superior to a reference model trained by using only 16 samples, thereby further proving the usefulness of the invention. The learning speed of the invention is faster, the performance on 16, 64 and 256 samples is several orders of magnitude better than that of the existing prompt-free model, and the invention is also superior to the existing prompt model. This suggests that combining domain knowledge maps may improve predictive performance in cases where the data is limited. One possible explanation is that domain knowledge maps provide a more structural and informative way of prompting. By incorporating domain knowledge into the cues, the model can leverage relevant information about a particular domain and increase its generalization ability to new examples.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. The aspect-level emotion analysis method based on the domain knowledge graph is characterized by comprising the following steps of:
Obtaining a target text, analyzing and extracting target aspect words of the target text, and constructing a mask language template based on prompt learning;
Constructing a domain knowledge graph of the domain involved in the target text, wherein the domain knowledge graph comprises a plurality of aspects contained in the domain and the corresponding relation between an upper descriptor and a lower descriptor and between the upper descriptor and the lower descriptor;
After splicing the target text and the mask language template, inputting the spliced target text and mask language template into a pre-trained language model for mask language modeling, inquiring a target aspect category in the domain knowledge graph according to the target aspect word, extracting all description words associated with the target aspect category as target aspect description words, and filling the target aspect description words into a mask;
Predicting the aspect emotion polarity of the target aspect descriptor to obtain the aspect emotion polarity of the target text;
The method for constructing the domain knowledge graph of the domain related to the target text comprises the following steps:
acquiring a plurality of evaluation texts which are the same as the field related to the target text, extracting aspect words and description words of all aspects contained in the evaluation texts, and establishing an aspect-description word list;
based on the aspect-descriptor list, extracting the aspect category and the corresponding descriptor-descriptor and descriptor-aspect relation in the word network database, and constructing the domain knowledge graph.
2. The domain knowledge-based aspect emotion analysis method as recited in claim 1, wherein the method of obtaining the aspect emotion polarity of the target text includes:
and taking the aspect emotion polarity mapped by the target aspect descriptor with the highest prediction score as the aspect emotion polarity of the target text.
3. The domain knowledge-based aspect emotion analysis method as recited in claim 1, wherein the method of obtaining the aspect emotion polarity of the target text includes:
And carrying out weighted average on the prediction scores of all target aspect descriptors according to the aspect emotion polarity categories, and taking the aspect emotion polarity with the highest weighted average as the aspect emotion polarity of the target text.
4. The domain knowledge-based aspect emotion analysis method of claim 1, wherein the language model comprises a BERT model or a GPT model.
5. The domain knowledge graph based aspect emotion analysis method of claim 1, wherein the word network database comprises a word net word stock and/or a Chinese word net word stock.
6. An aspect emotion analysis system based on a domain knowledge graph is characterized by comprising:
The first acquisition module is used for acquiring target texts and analyzing and extracting target aspect words of the target texts;
The domain knowledge graph module is set to be internally provided with a domain knowledge graph containing a plurality of aspects of the domain related to the target text and the corresponding upper descriptor-lower descriptor and descriptor-aspect relation;
The mask language template module is provided with a mask language template based on prompt learning, and the target text and the mask language template are spliced to be used as a first output text;
the pre-training language model is set to be built in with a pre-trained language model, a first output text is input for mask language modeling, target aspect categories are inquired in the domain knowledge graph according to target aspect words, all description words associated with the target aspect categories are extracted to serve as target aspect description words, and mask positions are filled in;
The prediction analysis module is used for respectively predicting the aspect emotion polarities of the target aspect descriptors to obtain the aspect emotion polarities of the target texts;
the construction method of the domain knowledge graph comprises the following steps:
acquiring a plurality of evaluation texts which are the same as the field related to the target text, extracting aspect words and description words of all aspects contained in the evaluation texts, and establishing an aspect-description word list;
based on the aspect-descriptor list, extracting the aspect category and the corresponding descriptor-descriptor and descriptor-aspect relation in the word network database, and constructing the domain knowledge graph.
7. The aspect emotion analysis system of a domain knowledge graph of claim 6, wherein said predictive analysis module comprises a first predictive analysis unit configured to set an aspect emotion polarity mapped by a target aspect descriptor having a highest predictive score as an aspect emotion polarity of a target text.
8. The aspect emotion analysis system of a domain knowledge graph of claim 6, wherein said predictive analysis module comprises a second predictive analysis unit configured to weight average the predictive scores of all target aspect descriptors by aspect emotion polarity class, and take the aspect emotion polarity with the highest weighted average as the aspect emotion polarity of the target text.
CN202310278253.4A 2023-03-21 2023-03-21 Aspect-level sentiment analysis method and system based on domain knowledge graph Active CN117056524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310278253.4A CN117056524B (en) 2023-03-21 2023-03-21 Aspect-level sentiment analysis method and system based on domain knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310278253.4A CN117056524B (en) 2023-03-21 2023-03-21 Aspect-level sentiment analysis method and system based on domain knowledge graph

Publications (2)

Publication Number Publication Date
CN117056524A CN117056524A (en) 2023-11-14
CN117056524B true CN117056524B (en) 2025-02-11

Family

ID=88666886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310278253.4A Active CN117056524B (en) 2023-03-21 2023-03-21 Aspect-level sentiment analysis method and system based on domain knowledge graph

Country Status (1)

Country Link
CN (1) CN117056524B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118780808B (en) * 2024-09-10 2024-12-27 江苏户传科技有限公司 Predictive complaint processing decision support method and system based on machine learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114780691A (en) * 2022-06-21 2022-07-22 安徽讯飞医疗股份有限公司 Model pre-training and natural language processing method, device, equipment and storage medium
CN115391570A (en) * 2022-10-28 2022-11-25 聊城大学 Method and device for constructing emotion knowledge graph based on aspects

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2607975C2 (en) * 2014-03-31 2017-01-11 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Constructing corpus of comparable documents based on universal measure of similarity
US12197486B2 (en) * 2021-06-29 2025-01-14 Microsoft Technology Licensing, Llc Automatic labeling of text data
CN113590782B (en) * 2021-07-28 2024-02-09 北京百度网讯科技有限公司 Training methods, inference methods and devices for inference models
CN114912423B (en) * 2022-03-24 2024-10-29 燕山大学 Aspect level emotion analysis method and device based on transfer learning
CN115713072A (en) * 2022-11-14 2023-02-24 东南大学 Relation category inference system and method based on prompt learning and context awareness

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114780691A (en) * 2022-06-21 2022-07-22 安徽讯飞医疗股份有限公司 Model pre-training and natural language processing method, device, equipment and storage medium
CN115391570A (en) * 2022-10-28 2022-11-25 聊城大学 Method and device for constructing emotion knowledge graph based on aspects

Also Published As

Publication number Publication date
CN117056524A (en) 2023-11-14

Similar Documents

Publication Publication Date Title
Millstein Natural language processing with python: natural language processing using NLTK
US10282468B2 (en) Document-based requirement identification and extraction
US9471559B2 (en) Deep analysis of natural language questions for question answering system
Rashid et al. A survey paper: areas, techniques and challenges of opinion mining
US9734238B2 (en) Context based passage retreival and scoring in a question answering system
Alfrjani et al. A hybrid semantic knowledgebase-machine learning approach for opinion mining
Arumugam et al. Hands-On Natural Language Processing with Python: A practical guide to applying deep learning architectures to your NLP applications
Khan et al. Exploring the landscape of automatic text summarization: a comprehensive survey
Chiarello et al. Product description in terms of advantages and drawbacks: Exploiting patent information in novel ways
WO2024248731A1 (en) Method and apparatus for multi-label text classification
Bakari et al. A novel semantic and logical-based approach integrating RTE technique in the Arabic question–answering
CN117056524B (en) Aspect-level sentiment analysis method and system based on domain knowledge graph
CN118170919B (en) A method and system for classifying literary works
Saeed et al. An abstractive summarization technique with variable length keywords as per document diversity
RU2662699C2 (en) Comprehensive automatic processing of text information
Rasheed et al. Conversational chatbot system for student support in administrative exam information
Rahul et al. Social media sentiment analysis for Malayalam
WO2020026229A2 (en) Proposition identification in natural language and usage thereof
Corredera Arbide et al. Affective computing for smart operations: a survey and comparative analysis of the available tools, libraries and web services
Yergesh et al. Semantic Knowledge Base for the Emotional Coloring Analysis of Kazakh Texts
Rybak et al. Machine learning-enhanced text mining as a support tool for research on climate change: theoretical and technical considerations
Arbizu Extracting knowledge from documents to construct concept maps
Vanetik et al. Multilingual text analysis: History, tasks, and challenges
Kaci et al. From NL preference expressions to comparative preference statements: A preliminary study in eliciting preferences for customised decision support
CN120373306B (en) Feasibility study report intelligent analysis and information extraction method, system and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240812

Address after: No. 666 Jinggong East 1st Road, Xinxing Street, Tianfu New Area, Chengdu City, Sichuan Province 610213, China. T2-16 # -4-102, Tianfu International Emerging Technology Park, Liandong U Valley

Applicant after: Liu Ting

Country or region after: China

Address before: Room 320, 3rd Floor, No. 2 Chuangye Road, High tech Zone, Chengdu, Sichuan, 610000

Applicant before: Chengdu Tuyi Technology Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant