CN111506706A

CN111506706A - Relationship similarity based upper and lower meaning relationship forest construction method

Info

Publication number: CN111506706A
Application number: CN202010296825.8A
Authority: CN
Inventors: 张英杰; 方义秋; 李春江; 葛娜; 林颂策; 余德华
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2020-08-07
Anticipated expiration: 2040-04-15
Also published as: CN111506706B

Abstract

The invention relates to a method for constructing an upper-lower-sense relational forest based on relational similarity, and belongs to the fields of natural language processing and knowledge graphs. The method includes: inputting all triples, obtaining the entity pair probability set C of relations in all triples through a traditional multilayer perceptron, making a Cartesian for C, and calculating the similarity of the Cartesian product C×C with the formula The measurement value (matrix S) is filtered with the threshold value ThresholdA to filter out the relation similarity set; then combined with the open domain entity data set, the trained similarity relation is used to predict the relation correspondence in the open domain entity on the improved multi-layer perceptual model The number of entities is filtered by the threshold ThresholdB to obtain the Result matrix; finally, the relationship forest is constructed according to the Result matrix and the relationship similarity set. The invention can retrieve similar relationship entities in the knowledge map, and improve the accuracy of similar relationship extraction in the relationship extraction task.

Description

A method of constructing upper and lower relational forests based on relational similarity

技术领域technical field

本发明属于自然语言处理和知识图谱领域，涉及一种基于关系相似度的上下义关系森林构建方法。The invention belongs to the field of natural language processing and knowledge graph, and relates to a method for constructing an upper-lower-sense relational forest based on relational similarity.

背景技术Background technique

目前学术界和工业界的自然语言处理和知识图谱是一个热度较高的研究领域，自然语言处理中对于实体对关系的研究得到了长足的发展。At present, natural language processing and knowledge graphs in academia and industry are a hot research field, and the research on entity-to-relationship in natural language processing has made great progress.

现有关于关系相似度的研究主要集中在以下几种方法：Existing research on relational similarity mainly focuses on the following methods:

2006年Peter D.Turney在Similarity of Semantic Relation文中提出关系相似度的概念，作为关系相似度研究的先驱Turney认为，测量关系相似度即测量关系对应的实体对的相似度，比如：mason和stone的关系(此处称为关系A)，carpenter和wood的关系(此处称为关系B)， A与B的关系相似程度根据他们对应的实体对(mason：stone)和实体对(carpenter：wood) 来测量。In 2006, Peter D.Turney proposed the concept of relational similarity in the article Similarity of Semantic Relation. As a pioneer of relational similarity research, Turney believes that measuring relational similarity is to measure the similarity of entity pairs corresponding to the relation, such as: mason and stone's The relationship (referred to as Relation A here), the relationship between carpenter and wood (referred to as Relation B here), the similarity of the relationship between A and B is based on their corresponding entity pair (mason: stone) and entity pair (carpenter: wood) to measure.

2013年Alisa Zhila在NAACL-HLT会议中提出了一种测量关系相似度的方法，Alisa在 Combining Heterogeneous Models for Measuring Relational Similarity文中提出的用向量法来测量关系相似度的方法为此项研究取得了里程碑的意义。文中提出用RNN循环神经网络语言模型学习关系词向量，源语料库是由关系对应的各个实体组成的，将实体对输入到RNN循环网络得到一个实体对嵌入矩阵，矩阵的维度是：实体数×维度，在此文章的实验中维度参数取 1600，假设实体对W_i＝(W_i1,W_i2)，W_j＝(W_j1,W_j2)对应的词嵌入为(v_i1,v_i2)和(v_j1,v_j2)，则W_i对应的方向向量为v_i＝v_i2-v_i1,W_j对应的方向向量为v_j＝v_j2-v_j1，相似度测量公式为：In 2013, Alisa Zhila proposed a method for measuring relational similarity at the NAACL-HLT conference, and Alisa's method for measuring relational similarity using the vector method in Combining Heterogeneous Models for Measuring Relational Similarity achieved a milestone in this research. meaning. In this paper, it is proposed to use the RNN recurrent neural network language model to learn the relationship word vector. The source corpus is composed of various entities corresponding to the relationship, and the entity pair is input into the RNN recurrent network to obtain an entity pair embedding matrix. The dimension of the matrix is: number of entities × dimension , in the experiment of this article, the dimension parameter is set to 1600, assuming that the entity pair Wi = (W _i1 , _Wi2 ), W _j = (W _j1 , W _j2 ) The corresponding word embeddings are (v _i1 , v _i2 ) and ₍ v _j1 , v _j2 ), then the direction vector corresponding to Wi _i is v _i =v _i2 -v _i1 , the direction vector corresponding to W _j is v _j =v _j2 -v _j1 , and the similarity measurement formula is:

2019年清华大学Weize Chen在ACL会议上发表论文Quantifying Similaritybetween Relations with Fact Distribution，此论文中提出用多层感知机求出实体对对应的关系概率即关系1对应的头尾实体概率为P_θ1＝(h,t|r₁)，关系2对应的头尾实体概率为P_θ2＝(h,t|r₂)，则关系 1和关系2之间的相似度用KL散度来测量：In 2019, Weize Chen of Tsinghua University published a paper Quantifying Similarity between Relations with Fact Distribution at the ACL conference. In this paper, it is proposed to use the multilayer perceptron to find the relationship probability corresponding to the entity pair, that is, the probability of the head and tail entities corresponding to the relationship 1 is P _θ1 = ( h,t|r ₁ ), the head and tail entity probability corresponding to relation 2 is P _θ2 =(h,t|r ₂ ), then the similarity between relation 1 and relation 2 is measured by KL divergence:

求出KL散度后，用函数g(x)进行双向求相似度得到最终关系1和关系2的相似值S(r₁,r₂)：After finding the KL divergence, use the function g(x) to find the similarity in both directions to obtain the similarity value S(r ₁ , r ₂ ) of the final relation 1 and relation 2:

S(r₁,r₂)＝g(D_KL(P_θ1(h,t|r₁)||(P_θ2(h,t|r₂)),D_KL((P_θ1(h,t|r₂)||(P_θ2(h,t|r₁))) (3)S(r ₁ ,r ₂ )=g(D _KL (P _θ1 (h,t|r ₁ )||(P _θ2 (h,t|r ₂ )),D _KL ((P _θ1 (h,t| r ₂ )||(P _θ2 (h,t|r ₁ ))) (3)

从上述几种方案中可以看出，关系相似度目前在关系抽取任务中应用较广，目前的技术是将关系相似度直接应用到任务中，比如：利用关系相似度进行关系去重，构建知识图谱等。但是此类应用缺乏应用结构，直接运用到相应的任务中会导致效果不明显，或者不能根据相应的任务有效的应用关系相似度技术。It can be seen from the above solutions that relational similarity is currently widely used in relation extraction tasks. The current technology is to directly apply relational similarity to tasks, such as: using relational similarity to deduplicate relationships and build knowledge map etc. However, such applications lack application structure, and direct application to the corresponding tasks will lead to insignificant effects, or the relational similarity technology cannot be effectively applied according to the corresponding tasks.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种基于关系相似度的上下义关系森林构建方法，根据关系相似度和一定的规则来构建关系森林。构建好关系森林后各个任务在应用关系相似度技术时，检索关系森林来获得相应的关系对象。使得关系相似度技术与其下游任务进行解耦合，且此技术高内聚。①由此缓解目前关系抽取任务中由于关系相似，关系抽取精度不高的问题。②在构建知识图谱中可以利用构建好的关系森林检索相似关系的实体。③在关系推理任务中利用关系森林获得相似关系来推理获得下一个对象。In view of this, the purpose of the present invention is to provide a method for constructing a relational forest of upper and lower sense based on the relational similarity, and constructs the relational forest according to the relational similarity and certain rules. After building the relational forest, each task retrieves the relational forest to obtain the corresponding relational objects when applying relational similarity technology. The relational similarity technology is decoupled from its downstream tasks, and the technology is highly cohesive. ① This alleviates the problem of low accuracy of relation extraction in the current relation extraction task due to the similarity of relations. ②In the construction of knowledge graph, the constructed relation forest can be used to retrieve entities with similar relations. (3) In the relational reasoning task, the relational forest is used to obtain similar relations to reason to obtain the next object.

为达到上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

一种基于关系相似度的上下义关系森林构建方法，包括以下步骤：A method for constructing upper and lower relational forests based on relational similarity, comprising the following steps:

S1：输入所有三元组，通过传统多层感知机获得所有三元组中关系的实体对概率集合 C＝(P_θ1,P_θ2,P_θ3,…,P_θn)，n为关系个数；S1: Input all triples, and obtain the entity pair probability set C=(P _θ1 , P _θ2 , P _θ3 ,..., P _θn ) of relations in all triples through traditional multilayer perceptron, and n is the number of relations;

S2：将集合C与其自身做笛卡尔积得到一个n×n大小的集合，将笛卡尔积中的两个元素相同的一对删除，剩下n²-n对元素，然后通过公式计算每一对元素的关系相似度量值S(r_i,r_j)，其中i＝1,2,…,n，j＝1,2,…,n；S2: Do the Cartesian product of the set C with itself to obtain a set of size n×n, delete the same pair of two elements in the Cartesian product, and leave n ² -n pairs of elements, and then calculate each The relationship similarity measure value S(r _i ,r _j ) for the elements, where i=1,2,...,n, j=1,2,...,n;

S3：设定门限值ThresholdA，并筛选出大于等于ThresholdA的关系度量值S(r_i,r_j)作为相似关系集合F(r_i,r_j)；S3: Set the threshold value ThresholdA, and filter out the relationship metric value S(r _i , r _j ) that is greater than or equal to ThresholdA as the similar relationship set F(r _i , r _j );

S4：改进多层感知机模型，分别将关系词向量或实体词向量作为改进的多层感知机模型的输入，输出为Tmp_rel矩阵或Tmp_ent矩阵；S4: Improve the multi-layer perceptron model, respectively use the relation word vector or the entity word vector as the input of the improved multi-layer perceptron model, and the output is a Tmp_rel matrix or a Tmp_ent matrix;

S5：将关系词向量和实体词向量输入多层感知机后的输出结果进行矩阵相乘即Tmp_rel*Tmp_ent得到Result_tmp矩阵，维度为关系个数×实体个数；S5: Multiply the output result after inputting the relation word vector and the entity word vector into the multi-layer perceptron, that is, Tmp_rel*Tmp_ent to obtain the Result_tmp matrix, and the dimension is the number of relations × the number of entities;

S6：对Result_tmp矩阵的每一行求Softmax，表示对每一个关系对应的实体求得概率；并设定门限值ThresholdB，筛选出Result_tmp矩阵中概率值大于ThresholdB的实体，得到 Result矩阵，即Result＝(soft max(Result_tmp)＞＝TresholdB)；S6: Calculate Softmax for each row of the Result_tmp matrix, indicating that a probability is obtained for the entity corresponding to each relationship; and set a threshold value ThresholdB, screen out the entities whose probability value is greater than ThresholdB in the Result_tmp matrix, and obtain the Result matrix, that is, Result= (soft max(Result_tmp)>=TresholdB);

S7：根据相似关系集合F(r_i,r_j)和Result矩阵构建多叉树；S7: Construct a polytree according to the similarity relationship set F(r _i ,r _j ) and the Result matrix;

S8：由多个多叉树构成关系森林。S8: The relational forest is composed of multiple multi-fork trees.

进一步，所述步骤S4中，改进的多层感知机模型的感知机为5层，hidden为1024即每一层有1024个神经单元；其中，关系词向量维度为关系词个数×神经单元个数，Tmp_rel矩阵维度为神经单元个数×关系个数，Tmp_ent矩阵维度为神经单元个数×实体词个数。Further, in the step S4, the perceptron of the improved multi-layer perceptron model is 5 layers, and the hidden value is 1024, that is, each layer has 1024 neural units; wherein, the dimension of the relation word vector is the number of relation words × the number of neural units The dimension of the Tmp_rel matrix is the number of neural units × the number of relations, and the dimension of the Tmp_ent matrix is the number of neural units × the number of entity words.

进一步，所述步骤S7中，构建多叉树具体包括：统计Result矩阵中每一行的数量r_n_num，结合相似关系集合F(r_i,r_j)；相似关系中，假设对应实体数数量大的则作为上义关系，对应实体数量少的则作为下义关系；Further, in the step S7, constructing a polytree specifically includes: counting the number r _n _num of each row in the Result matrix, and combining the similarity relationship set F(r _i ,r _j ); in the similarity relationship, it is assumed that the number of corresponding entities is large If the number of corresponding entities is small, it will be regarded as a hyponymous relationship;

情况1：若F(r_i,r_j)中存在相似关系S(r₁,r₂),S(r₁,r₃)，则根据r₁,r₂,r₃找出在Result矩阵中对应的r₁_num，r₂_num，r₃_num；Case 1: If there is a similarity relationship S(r ₁ , r ₂ ), S(r ₁ , r ₃ ) in F(r _i , r _j ), then find out in the Result matrix according to r ₁ , r ₂ , r ₃ corresponding r ₁ _num, r ₂ _num, r ₃ _num;

情况2：若F(r_i,r_j)中存在相似关系S(r₁,r₂),S(r₁,r₃),S(r₁,r₄),S(r₃,r₅)，则根据r₁,r₂,r₃找出在 Result矩阵中对应的r₁_num，r₂_num，r₃_num，r₄_num，r₅_num；Case 2: If there is a similarity relationship S(r ₁ , r ₂ ), S(r ₁ , r ₃ ), S(r ₁ , r ₄ ), S(r ₃ , r ₅ ) in F(r _i , r _j ) ), then find out the corresponding r ₁ _num, r ₂ _num, r ₃ _num, r ₄ _num, r ₅ _num in the Result matrix according to r ₁ , r ₂ , r ₃ ;

以此类推，根据相似关系集合F(r_i,r_j)和Result矩阵构建多叉树。By analogy, a polytree is constructed according to the similarity relation set F(r _i ,r _j ) and the Result matrix.

更进一步，情况1中，R1＝softmax(r₁_num)，R2＝softmax(r₂_num)，R3＝softmax(r₃_num)，取根节点Root＝max(R1,R2,R3)，其余为子节点。Further, in case 1, R1=softmax(r ₁ _num), R2=softmax(r ₂ _num), R3=softmax(r ₃ _num), take the root node Root=max(R1, R2, R3), and the rest are child node.

更进一步，情况2中，R1＝softmax(r₁_num)，R2＝softmax(r₂_num)，R3＝softmax(r₃_num)， R4＝softmax(r₄_num)，R5＝softmax(r₅_num)，取根节点Root＝max(R1,R2,R3,R4,R5)，其余为子节点。Furthermore, in case 2, R1=softmax(r ₁ _num), R2=softmax(r ₂ _num), R3=softmax(r ₃ _num), R4=softmax(r ₄ _num), R5=softmax(r ₅ _num) ), take the root node Root=max(R1, R2, R3, R4, R5), and the rest are child nodes.

本发明的有益效果在于：The beneficial effects of the present invention are:

1)现有关系抽取技术中由于存在关系相似的问题导致关系抽取精度不高，利用本发明关系森林进行关系去冗可以有效缓解关系抽取精度不高的问题。1) In the existing relationship extraction technology, due to the problem of similar relationships, the relationship extraction accuracy is not high. Using the relationship forest of the present invention to remove redundant relationships can effectively alleviate the problem of low relationship extraction accuracy.

2)在推荐系统任务中，利用本发明关系森林，有效发现相似关系的推荐内容，提高推荐精度。2) In the task of the recommendation system, the relationship forest of the present invention is used to effectively discover the recommended content of the similar relationship and improve the recommendation accuracy.

3)在关系推理任务中，在关系森林中检索相似关系，提高关系推理的深度和广度。3) In relational reasoning tasks, similar relations are retrieved in relational forests to improve the depth and breadth of relational reasoning.

本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述，并且在某种程度上，基于对下文的考察研究对本领域技术人员而言将是显而易见的，或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。Other advantages, objects, and features of the present invention will be set forth in the description that follows, and will be apparent to those skilled in the art based on a study of the following, to the extent that is taught in the practice of the present invention. The objectives and other advantages of the present invention may be realized and attained by the following description.

附图说明Description of drawings

为了使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明作优选的详细描述，其中：In order to make the purpose, technical scheme and advantages of the present invention clearer, the present invention will be preferably described in detail below in conjunction with the accompanying drawings, wherein:

图1为传统多层原感知机模型结构示意图；Figure 1 is a schematic structural diagram of a traditional multi-layer original perceptron model;

图2为本发明的上下义关系森林构建方法流程图；Fig. 2 is the flow chart of the construction method of the upper and lower sense relation forest of the present invention;

图3为本发明改进的多层感知机模型结构示意图，(a)为输入为关系词向量的内部结构示意图，(b)为输入为实体词向量的内部结构示意图；Fig. 3 is the improved multilayer perceptron model structure schematic diagram of the present invention, (a) is the internal structure schematic diagram that input is relation word vector, (b) is the internal structure schematic diagram that input is entity word vector;

图4为本发明构建多叉树中情况1的节点关系示意图；Fig. 4 is the node relation schematic diagram of situation 1 in the present invention constructs multi-fork tree;

图5为本发明构建多叉树中情况2的节点关系示意图；5 is a schematic diagram of the node relationship of the situation 2 in the construction of a multi-node tree according to the present invention;

图6为由多个多叉树构成的关系森林示意图；Fig. 6 is the relational forest schematic diagram that is formed by a plurality of multi-fork trees;

图7为本发明构建的关系森林在推荐系统中应用到搜索引擎的流程图。FIG. 7 is a flow chart of applying the relational forest constructed by the present invention to a search engine in a recommendation system.

具体实施方式Detailed ways

以下通过特定的具体实例说明本发明的实施方式，本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本发明的精神下进行各种修饰或改变。需要说明的是，以下实施例中所提供的图示仅以示意方式说明本发明的基本构想，在不冲突的情况下，以下实施例及实施例中的特征可以相互组合。The embodiments of the present invention are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the drawings provided in the following embodiments are only used to illustrate the basic idea of the present invention in a schematic manner, and the following embodiments and features in the embodiments can be combined with each other without conflict.

请参阅图1～图6，图2为一种基于关系相似度的上下义关系森林构建方法流程图，该方法具体包括以下步骤：Please refer to Fig. 1 to Fig. 6, Fig. 2 is a flowchart of a method for constructing an upper-lower relationship forest based on relationship similarity, and the method specifically includes the following steps:

步骤一：本发明实施例的语料来自知识库中抽取的三元组(实体1，关系，实体2)，根据图1所示的多层感知机模型获得关系1的实体对概率P_θ1＝(h,t|r₁)和关系2的实体对概率 P_θ2＝(h,t|r₂)，其中，h:(head)实体1，t：(tail)实体2，r：(relation)关系然后通过公式(2)和公式(3)得到关系1与关系2的关系相似度Sim_r1,r2。Step 1: The corpus of the embodiment of the present invention comes from triples (entity 1, relationship, entity 2) extracted from the knowledge base, and the entity pair probability P _θ1 =( h,t|r ₁ ) and relation 2 entity pair probability P _θ2 =(h,t|r ₂ ), where h: (head) entity 1, t: (tail) entity 2, r: (relation) relation Then, the relationship similarity Sim _r1,r2 of the relationship 1 and the relationship 2 is obtained through the formula (2) and the formula (3).

步骤二：通过多层感知机获得所有三元组中关系的实体对概率集合 C＝(P_θ1,P_θ2,P_θ3,…,P_θn)(n为关系个数)，将集合C与其自身做笛卡尔积得到一个n×n大小的集合。Step 2: Obtain the entity pair probability set C=(P _θ1 ,P _θ2 ,P _θ3 ,...,P _θn ) (n is the number of relations) in all triples through the multi-layer perceptron, and compare the set C with itself Do a Cartesian product to get a set of size n×n.

将表1中笛卡尔积中的两个元素相同的一对删除，剩下n²-n对元素每一对元素用公式(2) 和公式(3)得出一个关系相似度量值S(r_i,r_j)(i＝1,2,…,n，j＝1,2,…,n)。Delete the same pair of two elements in the Cartesian product in Table 1, leaving n ² -n pairs of elements. Each pair of elements uses formula (2) and formula (3) to obtain a relationship similarity measure S(r _i , r _j ) (i=1,2,...,n, j=1,2,...,n).

表1 C×C(笛卡尔积)Table 1 C×C (Cartesian product)

Pθ1 Pθ1Pθ1 Pθ1 Pθ1 Pθ2Pθ1 Pθ2 Pθ1 Pθ3Pθ1 Pθ3 ……... Pθ1 PθnPθ1 Pθn Pθ2 Pθ1Pθ2 Pθ1 Pθ2 Pθ2Pθ2 Pθ2 Pθ2 Pθ3Pθ2 Pθ3 ……... Pθ2 PθnPθ2 Pθn Pθ3 Pθ1Pθ3 Pθ1 Pθ3 Pθ2Pθ3 Pθ2 Pθ3 Pθ3Pθ3 Pθ3 ……... Pθ3 PθnPθ3 Pθn ……... ……... ……... ……... ……... Pθn Pθ1Pθn Pθ1 Pθn Pθ2Pθn Pθ2 Pθn Pθ3Pθn Pθ3 ……... Pθn PθnPθn Pθn

步骤三：取门限值ThresholdA＝0.7，取表2中所有S(r_i,r_j)>＝ThresholdA的关系度量值，经过此条件过滤后的S(r_i,r_j)值称为相似关系。从表2提取出相似关系后的关系集合为F(r_i,r_j)。Step 3: Take the threshold value ThresholdA=0.7, take all the relational metric values of S(r _i ,r _j )>=ThresholdA in Table 2, the S(r _i ,r _j ) value filtered by this condition is called similar relation. The relation set after the similarity relation is extracted from Table 2 is F(r _i ,r _j ).

表2关系相似度量值矩阵Table 2 Relationship similarity measure matrix

S(r1,r1)S(r1,r1) S(r1,r2)S(r1,r2) S(r1,r3)S(r1,r3) ……... S(r1,rn)S(r1,rn) S(r2,r1)S(r2,r1) S(r2,r2)S(r2,r2) S(r2,r3)S(r2,r3) ……... S(r2,rn)S(r2,rn) S(r3,r1)S(r3,r1) S(r3,r2)S(r3,r2) S(r3,r3)S(r3,r3) ……... S(r3,rn)S(r3,rn) ……... ……... ……... ……... ……... S(rn,r1)S(rn,r1) S(rn,r2)S(rn,r2) S(rn,r3)S(rn,r3) ……... S(rn,rn) S(rn,rn)

步骤四：设计新的模型多层感知机，感知机为5层，hidden为1024即每一层有1024个神经单元。如图3(a)所示，将关系词向量作为多层感知机模型的输入，关系词向量维度为 (关系词个数×神经单元个数)；输出为Tmp_rel矩阵，维度为(神经单元个数×关系个数)。如图3(b)所示，将实体词向量作为多层感知机模型的输入，实体词向量维度为(实体词个数×神经单元个数)；输出为Tmp_ent矩阵，维度为(神经单元个数×实体词个数)。Step 4: Design a new model multi-layer perceptron, the perceptron is 5 layers, and the hidden value is 1024, that is, each layer has 1024 neural units. As shown in Figure 3(a), the relation word vector is used as the input of the multilayer perceptron model, and the dimension of the relation word vector is (the number of relation words × the number of neural units); the output is the Tmp_rel matrix, and the dimension is (the number of neural units) number × number of relationships). As shown in Figure 3(b), the entity word vector is used as the input of the multi-layer perceptron model, and the dimension of the entity word vector is (the number of entity words × the number of neural units); the output is the Tmp_ent matrix, and the dimension is (the number of neural units) number × number of entity words).

步骤五：将关系词向量和实体词向量输入多层感知机后的结果进行矩阵相乘即Tmp_rel*Tmp_ent得到Result_tmp矩阵，矩阵维度为(关系个数×实体个数)。Step 5: Multiply the result of the relationship word vector and the entity word vector into the multi-layer perceptron, that is, Tmp_rel*Tmp_ent to obtain the Result_tmp matrix, and the matrix dimension is (the number of relations × the number of entities).

步骤六：对Result_tmp每一行求Softmax.表示对每一个关系对应的实体求得概率，设门限值ThresholdB＝0.8。对矩阵Result_tmp进行过滤，保留Result矩阵中概率值大于0.8的实体。Step 6: Calculate Softmax for each row of Result_tmp. It means to obtain the probability for the entity corresponding to each relation, and set the threshold value ThresholdB=0.8. Filter the matrix Result_tmp to retain entities with probability values greater than 0.8 in the Result matrix.

Result_tmp＝Tmp_rel*Tmp_ent (4)Result_tmp=Tmp_rel*Tmp_ent (4)

Result＝(soft max(Result_tmp)＞＝TresholdB) (5)Result=(soft max(Result_tmp)>=TresholdB) (5)

此处作假设：在相似关系中，关系对应的实体数量大的则作为上义关系，关系对应实体数量少的则作为下义关系。The assumption is made here: in the similarity relationship, a relationship with a large number of entities corresponding to the relationship is regarded as a superordinate relationship, and a relationship with a small number of corresponding entities is regarded as a hyponymous relationship.

步骤七：根据相似关系集合F(r_i,r_j)和Result矩阵构建多叉树；Step 7: Construct a polytree according to the similarity relationship set F(r _i ,r _j ) and the Result matrix;

统计Result矩阵中每一行的个数，结合相似关系集合F(r_i,r_j)。相似关系中，对应实体数数量大的则作为上义关系，对应实体数量少的则作为下义关系。Count the number of each row in the Result matrix, and combine the similarity relationship set F(r _i ,r _j ). In the similarity relationship, the one with a large number of corresponding entities is regarded as a superordinate relationship, and the one with a small number of corresponding entities is regarded as a subordinate relationship.

情况1：如图4所示，若F(r_i,r_j)中存在相似关系S(r₁,r₂),S(r₁,r₃)，则根据r₁,r₂,r₃找出在Result 矩阵中对应的r₁_num，r₂_num，r₃_num。Case 1: As shown in Figure 4, if _there is a similarity relationship S(r ₁ , r ₂ ), S(r ₁ , r ₃ ) in F(ri , r _j ), then according to r ₁ , r ₂ , r ₃ Find the corresponding r ₁ _num, r ₂ _num, r ₃ _num in the Result matrix.

R1＝softmax(r₁_num) (6)R1=softmax(r ₁ _num) (6)

R2＝softmax(r₂_num) (7)R2=softmax(r ₂ _num) (7)

R3＝softmax(r₃_num) (8)R3=softmax(r ₃ _num) (8)

取根节点为：Take the root node as:

Root＝max(R1,R2,R3) (9)Root=max(R1,R2,R3) (9)

左右子节点为R2，R3。The left and right child nodes are R2, R3.

情况2：如图5所示，若F(r_i,r_j)中存在相似关系S(r₁,r₂),S(r₁,r₃),S(r₁,r₄),S(r₃,r₅)，则根据 r₁,r₂,r₃找出在Result矩阵中对应的r₁_num，r₂_num，r₃_num，r₄_num，r₅_num。Case 2: As shown in Figure 5, if there is a similarity relationship S(r ₁ , r ₂ ), S(r ₁ , r ₃ ), S(r ₁ , r ₄ ), S(r 1 , r 3 ), S(r 1 , r 4 ) in F(r _i , r _j ) (r ₃ , r ₅ ), then find out the corresponding r ₁ _num, r ₂ _num, r ₃ _num, r ₄ _num, r ₅ _num in the Result matrix according to r ₁ , r ₂ , r ₃ .

R1＝softmax(r₁_num) (10)R1=softmax(r ₁ _num) (10)

R2＝softmax(r₂_num) (11)R2=softmax(r ₂ _num) (11)

R3＝softmax(r₃_num) (12)R3=softmax(r ₃ _num) (12)

R4＝softmax(r₄_num) (13)R4=softmax(r ₄ _num) (13)

R5＝softmax(r₅_num) (14)R5=softmax(r ₅ _num) (14)

取根节点为：Take the root node as:

Root＝max(R1,R2,R3,R4,R5) (15)Root=max(R1,R2,R3,R4,R5) (15)

孩子节点为R2，R3，R4，且R5为R3的子节点。The child nodes are R2, R3, R4, and R5 is a child node of R3.

以此类推，根据F(r_i,r_j)相似关系集合和Result矩阵构建多叉树。And so on, according to F(r _i , r _j ) similar relation set and Result matrix to construct multi-fork tree.

步骤八：由多个多叉树构成关系森林。图6中灰色节点为根节点，由root＝max(R1，R2,....,Rn)取得，白色节点为子节点，子节点由公式(6)～(15)取得。图6中例举了8组相似关系。Step 8: Constitute a relational forest from multiple polytrees. In Fig. 6, the gray node is the root node, which is obtained by root=max(R1, R2, . . . , Rn), the white node is the child node, and the child node is obtained by formulas (6) to (15). Eight sets of similarity relationships are illustrated in Figure 6.

如图7所示，本发明构建的关系森林在实际应用中的应用，如应用到推荐系统中用于搜索引擎，能够快速精准的获取需要的推荐内容。As shown in Figure 7, the application of the relational forest constructed by the present invention in practical applications, such as applying it to a recommendation system for a search engine, can quickly and accurately obtain the required recommendation content.

最后说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本技术方案的宗旨和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements, without departing from the spirit and scope of the technical solution, should all be included in the scope of the claims of the present invention.

Claims

1. a method for constructing a relational forest based on relational similarity, characterized in that the method comprises the following steps:

S1: Input all triples, and obtain the entity pair probability set C=(P _θ1 , P _θ2 , P _θ3 ,..., P _θn ) through the traditional multilayer perceptron, where the probability P _θi is (entity i ₁ , entity i ₂ | relationship), n is the number of relationships;

S2: Do the Cartesian product of the set C with itself to obtain a set of size n×n, delete the same pair of two elements in the Cartesian product, and leave n ² -n pairs of elements, and then calculate each The relationship similarity measure value S(r _i ,r _j ) for the elements, where i=1,2,...,n, j=1,2,...,n;

S3: Set the threshold value ThresholdA, and filter out the relationship metric value S(r _i , r _j ) that is greater than or equal to ThresholdA as the similar relationship set F(r _i , r _j );

S4: Improve the multi-layer perceptron model, respectively use the relation word vector or the entity word vector as the input of the improved multi-layer perceptron model, and the output is a Tmp_rel matrix or a Tmp_ent matrix;

S5: Multiply the output result after inputting the relation word vector and the entity word vector into the multi-layer perceptron, that is, Tmp_rel*Tmp_ent to obtain the Result_tmp matrix, and the dimension is the number of relations × the number of entities;

S6: Calculate Softmax for each row of the Result_tmp matrix, indicating that the probability is obtained for the entity corresponding to each relationship; and set the threshold value ThresholdB, filter out the entities whose probability value is greater than ThresholdB in the Result_tmp matrix, and obtain the Result matrix, that is, Result= (softmax(Result_tmp)>=TresholdB);

S7: Construct a polytree according to the similarity relationship set F(r _i ,r _j ) and the Result matrix;

S8: The relational forest is composed of multiple multi-fork trees.

2. a kind of upper and lower relation forest construction method based on relation similarity according to claim 1, is characterized in that, in described step S4, the perceptron of the improved multilayer perceptron model is 5 layers, and hidden is 1024 That is, each layer has 1024 neural units; among them, the dimension of the relation word vector is the number of relation words × the number of neural units, the dimension of the Tmp_rel matrix is the number of neural units × the number of relations, and the dimension of the Tmp_ent matrix is the number of neural units × the number of entities number of words.

3. a kind of upper and lower sense relation forest construction method based on relation similarity according to claim 1, is characterized in that, in described step S7, constructing polynomial tree specifically comprises: the quantity r _n of each row in statistical Result matrix _num, combined with the similarity relationship set F(r _i , r _j ); in the similarity relationship, it is assumed that the number of corresponding entities is large as a superordinate relationship, and the number of corresponding entities is small as a hyponymous relationship;

Case 1: If there is a similarity relationship S(r ₁ , r ₂ ), S(r ₁ , r ₃ ) in F(r _i , r _j ), then find out in the Result matrix according to r ₁ , r ₂ , r ₃ corresponding r ₁ _num, r ₂ _num, r ₃ _num;

Case 2: If there is a similarity relationship S(r ₁ , r ₂ ), S(r ₁ , r ₃ ), S(r ₁ , r ₄ ), S(r ₃ , r ₅ ) in F(r _i , r _j ) ), then find out the corresponding r ₁ _num, r ₂ _num, r ₃ _num, r ₄ _num, r ₅ _num in the Result matrix according to r ₁ , r ₂ , r ₃ ;

By analogy, a polytree is constructed according to the similarity relation set F(r _i ,r _j ) and the Result matrix.

4. The method for constructing a relational forest based on relational similarity according to claim 3, characterized in that, in case ₁ , R1=softmax(r1_num), R2=softmax( _{r2_num} ), R3 =softmax(r ₃ _num), take the root node Root=max(R1, R2, R3), and the rest are child nodes.

5. A method for constructing a relational similarity forest based on relational similarity according to claim 3, wherein in case 2, R1=softmax( _{r1_num} ), R2=softmax( _{r2_num} ), R3 =softmax(r ₃ _num), R4=softmax(r ₄ _num), R5=softmax(r ₅ _num), take the root node Root=max(R1, R2, R3, R4, R5), and the rest are child nodes.