Background
With the development of artificial intelligence and the technological explosion of the information extraction field, the extraction of entity relationships is attracting attention of more and more students as an important research topic in the information extraction field. The method mainly aims at extracting semantic relations between marked entity pairs in sentences, namely determining relation categories between entity pairs in unstructured text on the basis of entity identification, and forming structured data for storage and access. The result of entity relation extraction can be used for constructing a knowledge graph or an ontology knowledge base, and can also provide data support for the construction of an automatic question-answering system. In addition, entity relation extraction has important research significance in semantic network labeling, chapter understanding and machine translation.
Early relation extraction is mainly based on grammar rules, and the grammar structures in sentences are analyzed to be used as the basis for relation occurrence. Although the method achieves good results, recall rate is difficult to improve due to strict rules, professional grammar knowledge and literature basis are needed, and applicability is not high. With the continuous development of technology, the method of relation extraction is divided into three types of supervised, semi-supervised and unsupervised. Based on the content related to the invention, the supervised relation extraction method is studied with emphasis. The supervised extraction of relationships can be regarded as a classification problem, and there are mainly two methods to summarize: shallow structure model and deep learning model.
Shallow structures typically have only one layer or no hidden layer nodes, such as support vector machines, maximum entropy, etc. Shallow structures in relational extraction often use methods of feature engineering or kernel functions. Traditional methods based on feature engineering mainly rely on feature sets which are ingenious in design and output through language processing procedures. These above approaches mostly rely on a large number of manually designed features, or on well-designed kernel functions. Despite the assistance of many excellent NLP tools, there is still a risk of performance degradation due to errors such as word segmentation inaccuracy and syntax parsing errors. More importantly, the low portability of these well-designed features or kernel functions greatly affects their scalability.
In recent years, a relationship extraction study based on deep learning has been greatly advanced. The method for extracting various relations is based on the models of CNN, RNN and the like, and achieves good effects. Many neural network-based approaches show the advantages of neural networks over traditional shallow structures, but these results are mostly achieved on a distributed balanced english dataset and use many external features as an aid. The Chinese grammar has complex structure and more serious language blurring phenomenon.
Disclosure of Invention
The invention provides a Chinese relation extraction method based on a neural network. The method is characterized in that hidden layers with different sizes are arranged on the long-short-term memory model, so that abstract features with dependency information of different dimensions can be automatically extracted from original input, and global information is captured by using an attention mechanism. Experiments show that the method has a relatively large number of kernel convolutional neural networks and a single long-term and short-term memory-attention model, can obviously improve the extraction effect of Chinese relations, and obtains a relatively good result on an ACE RDC2005 Chinese data set, which proves the effectiveness of the method. The model frame is shown in figure 1.
The technical scheme of the invention is as follows: a neural network-based chinese relation extraction method, the method comprising the steps of: step one, constructing BiLSTMA units, and extracting deep semantic information and global dependency information of sentences; step two, constructing a Multi-BiLSTMA model, and acquiring semantic information with dependency relations of different granularities; and thirdly, verifying the validity of the method by using the real data.
The step 1 fully utilizes the advantages of a two-way long-short-term memory model (BiLSTM) in the aspect of processing long-term dependency problems and the characteristic that a Attention mechanism (Attention) can capture global dependency information, and builds BiLSTMA units (BiLSTM-Attention) to extract sentence deep semantic information and dependency information.
And 2, setting hidden layers with different sizes in the BiLSTMA units, combining the BiLSTMA units with different sizes, and constructing a Multi-BiLSTMA model which can acquire semantic information with dependency relations with different granularities.
Step 3, to verify the validity of the method, the recognition effect of the method is verified using the ACE RDC200 chinese dataset, thereby verifying its validity.
Advantageous effects
The beneficial effects of the invention are as follows:
in the invention, emphasis is placed on the characteristic that the Multi-core CNN can learn different granularity characteristics, a Multi-BiLSTMA model is constructed by setting BiLSTM with different sizes by utilizing BiLSTM and Attention mechanisms, and experiments prove that the method has excellent effects on an ACERDC2005 Chinese data set.
The invention provides a Chinese relation extraction method based on a neural network model of Multi-BiLSTM-attribute. Experiments prove that the method has higher performance on an ACE data set, and the effectiveness of the method is proved. The method provided by the invention effectively utilizes the characteristic that different granularity characteristics can be learned in the multi-core CNN neural network, combines the characteristics with BiLSTM, and fully exerts the characteristic of automatic extraction characteristics of the neural network model. The two-way BILSTM channel is provided with a plurality of hidden layers with different sizes, so that feature sparseness can be prevented to a certain extent, semantic information of characters can be effectively acquired and utilized, and abstract features with different dimensions can be automatically acquired. On the basis, an Attention mechanism is added, the weight is adjusted through the characteristics by utilizing the local characteristics and the global characteristics of sentences, the noise is reduced, and the accuracy is improved.
The method provided by the invention combines the characteristic that a single long-short-term memory model can only learn a certain specific dimension with the characteristic that a plurality of convolution kernels in a convolution neural network learn different dimensions, provides a Multi-BiLSTM-Attention model, obtains excellent results in the aspect of Chinese relation extraction, and obtains good use effects.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings of the present specification.
For a sentence with two entities, the relation extraction task is to extract candidate relations between the two entities. The two-way long-short-term memory neural network (BiLSTM) model belongs to a variant of a cyclic neural network (RNN), can effectively process long-distance information and avoid gradient explosion, and is used by combining BiLSTM and Attention in view of better complementarity. However, a single, fixed BiLSTM can only learn information of a specific dimension, so by setting up different BiLSTM, a Multi-BiLSTMA model is constructed. The model may learn information-dependent features in multiple dimensions.
First, the input layer of the model consists of word vectors mapped to by the look-up table obtained by initialization. If the sentence length is L, mappingThe vectorized sentence may be expressed as: x= [ X ] 1 ,x 2 ,···,x L ]Wherein x is i ∈R D Is the i-th word w i D is the dimension of the vector. If the dictionary size is V, then the Embedding layer can be expressed as X ε R V×H . This process can be expressed as: x=embedding(s).
Second, the Multi-BiLSTMA layer of the present invention is composed of three BiLSTMA cells. Wherein each BiLSTMA cell is composed of a layer of BiLSTM and a layer of Attention. As shown in fig. 1 (b), the bimtma receives data of the embedded layer, and uses a forward LSTM and a reverse LSTM to form a bimstm layer for extracting features of deeper layers of the embedded layer. This procedure is summarized as: representing element-by-element additions. The Attention layer merges the information on each time step in the BiLSTM layer, and obtains the information with larger influence on the extraction result through calculation. This process can be summarized as:
The next step is the full connection layer of the model. After the outputs of the three BiLSTMA units are spliced together, the modeled information is classified by a full-connection (Dense) layer, wherein the size of the hidden layer is the relation type number, namely 7. This procedure is summarized as:
finally, in order to obtain a better experimental effect, the softmax layer is used for carrying out normalization processing on the output result of the full-connection layer, and a final classification result is obtained. In general, this process can be summarized as: y=softmax (D).
The validity of the method is verified by adopting real data, the selected data is an ACE RDC2005 standard Chinese data set, and the data is preprocessed first.
The present invention employs publicly published ACE RDC2005 chinese data sets for relationship extraction. After screening out the nonstandard documents, the experiment shares 628 documents. This dataset contains 6 entity relationship types (collectively positive examples), respectively: "ParT-WHOLE", "PHYS", "ORG-AFF", "GEN-AFF", "PER-SOC", "ART". Because the relationships in the dataset are directional, for example: if the entity pair (A, B) has an ART relationship in the data set, but no relationship type marked by the data set exists between the entity pair (B, A), all the conditions are collectively called negative examples, and the relationship type is marked as 'Other'. Because the relation extraction is mainly performed at sentence level, the terms "," areused. ", I! "? ",". "the 5 chinese punctuation marks cut the text in the dataset into sentences. The sentences without entity pairs are discarded, and the repeated sentences between the positive examples and the negative examples are removed (because the same sentence cannot be the positive example and the negative example), so that 1010056 sentences are obtained in total, wherein 9244 positive example sentences and 91812 negative example sentences are included. The ACE RDC2005 chinese dataset is a dataset with unbalanced distributions, each relationship type is not uniformly distributed, especially with negative cases up to 90.85%. In order to be closer to the real situation, the influence caused by a large amount of negative example data is reduced, so that only the result of the positive example is evaluated during the evaluation.
Secondly, on word vector processing, a method of randomly initializing LookupTable is adopted, the LookupTable is continuously adjusted in the training process, and the dimension of the word vector is set to be 100 dimensions. Since the neural network requires a fixed input, the average sentence length for each relationship type is analyzed. In order to balance the extraction effect and the training cost, a sentence with a sentence length equal to 50 is selected as the maximum input length, sentences with a sentence length lower than 50 are filled to 50 by 0, and the sentences with a sentence length higher than 50 are cut to 50. The AdaDelta function is selected as an optimization function, and the learning rate is 1.0 of the default of the optimization function. Further, the batch amount was set to 50 and the number of iterations was 100. Experiments prove that three BiLSTMA units are selected, wherein the sizes of the hidden layers are respectively 100, 200 and 300.
Finally, in order to prove the effectiveness of the method of the invention, three tasks were designed on the same data. The first task is to use the multi-core CNN for relationship extraction, which can be seen as a benchmark model; the second task is to use single-layer BiLSTMA for relation extraction, and experiments prove that the effect is superior to that of a simple multi-core CNN method through the combination of BiLSTM and Attention; the third task is to use a Multi-BiLSTMA model to extract the relation, prove that the model has the effect similar to that of Multi-core CNN, fully utilize the advantages of BiLSTM and Attention, and remarkably improve the experimental result compared with the former two.
After 5-fold cross-validation experiments, the performance is shown in table 1 (the F values of the three models have been shown in bold).
Table 1 relation extraction task Performance
The number distribution of each relationship type is not balanced and such results are also directly presented in table 1. The characteristics of high type results are presented in total, and the characteristics of the neural network are met. In general, the larger the data amount, the more fully trained the data amount, the less likely the overfitting will be, and the better the results will be under the same data quality and model. From the results, it can also be seen that the F values of the three classes, "PART-WHOLE", "ORG-AFF" and "GEN-AFF" are significantly higher than the other three positive example types, also due to the large data size of these three classes.
Meanwhile, as can be seen from table 1, the performance of the single-layer bistma is better than that of the simple multi-core CNN, because compared with the CNN, the bistma can capture the dependency information and key features in the sentence more effectively, so that a better extraction effect is obtained. The Multi-BiLSTMA has the characteristics of both, so that the performance of the Multi-BiLSTMA is obviously superior to that of the Multi-BiLSTMA. In summary, the Chinese relation extraction method based on the neural network provided by the invention has excellent performance.
The present invention is not described in detail in the present application, and is well known to those skilled in the art. Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.