CN106354712A

CN106354712A - Method and system of expressing semantics of verbs based on concept of parameters

Info

Publication number: CN106354712A
Application number: CN201610729108.3A
Authority: CN
Inventors: 朱其立; 龚禹; 赵凯祺
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2016-08-25
Filing date: 2016-08-25
Publication date: 2017-01-25

Abstract

The present invention provides a method and system for expressing verb semantics based on the concept of parameters, comprising the following steps: step 1: extracting the dependency relationship between verb-object and verb-subject; step 2: according to the dependency relationship between verb-object and verb-subject Calculate the entropy value of the verb parameter to the pattern, and calculate the mutual information value of the verb parameter to the verb, and calculate the weight of the verb parameter; Step 3: Conceptualize the verb parameter, that is, find the k group with the largest weight of the verb parameter. The present invention creatively uses an external knowledge base to represent the semantics of verbs, and provides users with parameters to select the semantic granularity of verbs, so that the obtained semantic concepts of verbs are of moderate size, which can be read by humans or directly calculated by machines .

Description

Method and system for expressing verb semantics based on parameter concept

技术领域technical field

本发明涉及计算机技术领域的自然语言处理，具体地，涉及一种基于参数概念表达动词语义的方法及系统。The present invention relates to natural language processing in the field of computer technology, in particular to a method and system for expressing verb semantics based on parameter concepts.

背景技术Background technique

随着人工智能技术应用的越来越深入，人工智能技术对人类生活起着更多的作用。其中自然语言理解技术是计算机理解人类语言的重要技术，其中最为困难的地方就是理解人类语言的语义。With the deepening of the application of artificial intelligence technology, artificial intelligence technology plays more roles in human life. Among them, natural language understanding technology is an important technology for computers to understand human language, and the most difficult part is to understand the semantics of human language.

动词对于理解句子的句法和语法上都起着中心性的作用。分布假设()表明可以用一个词语的上下文信息来表示这个词语的语义，例如这个词语周围的词。一个动词在句子中有其独特的角色，因为它包含了与其主语和宾语的依存关系，所以可以用这个动词的主语和宾语来表达这个动词的语义。类似的系统包括ReVerb()，采用了“词袋模型”的方法，但这种模型有如下缺点：Verbs play a central role in understanding the syntax and grammar of sentences. The distributional assumption ( ) states that the semantics of a word can be represented by its contextual information, such as the words around it. A verb has its unique role in a sentence, because it contains the dependent relationship with its subject and object, so the semantics of this verb can be expressed with the subject and object of this verb. Similar systems include ReVerb(), which uses a "bag of words" approach, but this model has the following disadvantages:

1)无法考虑同义词的关系；1) The relationship of synonyms cannot be considered;

2)“词袋模型”维度很高，因此计算效率低下；2) The "bag of words model" has a high dimensionality, so the calculation efficiency is low;

3)生成的模型是人类不可读的。3) The generated model is not human readable.

为了弥补这些缺陷，一个很自然的方法就是使用这些主语和宾语的抽象概念或类型来表示，而不是直接使用这些词。类似的系统包括FrameNet()，它通过人类标注动词的主语和宾语类型来表达这个动词的语义。但是，这个系统也有一些明显的缺陷：In order to make up for these deficiencies, a very natural way is to use abstract concepts or types of these subjects and objects to represent, rather than using these words directly. Similar systems include FrameNet(), which expresses the semantics of verbs by humans annotating their subject and object types. However, this system also has some obvious flaws:

1)人工标注工作量巨大，无法进行拓展；1) The workload of manual labeling is huge and cannot be expanded;

2)动词参数的抽象程度太高，例如动词“eat”的宾语概念只有“Ingestibles”所以无法表达动词的多种语义。2) The abstraction of verb parameters is too high. For example, the object concept of the verb "eat" is only "Ingestibles", so it cannot express the multiple semantics of the verb.

经检索，申请号为:201010290860.5，名称为：基于事件本体的动词语义信息提取方法，在该发明中使用一种涉及基于事件本体的动词语义信息提取方法，通过匹配动词和动词角色的方法，提高了识别动词的准确率。然而，该发明中并没有对动词生成一个人类可读与机器可计算的概念集，且无法改变人类对于动词的语义粒度的设置。After retrieval, the application number is: 201010290860.5, and the name is: a method for extracting verb semantic information based on event ontology. In this invention, a method for extracting verb semantic information based on event ontology is used. By matching verbs and verb roles, the method improves The accuracy of recognizing verbs. However, this invention does not generate a human-readable and machine-computable concept set for verbs, and cannot change the semantic granularity of verbs set by humans.

申请号为:200510088741.0，名称为：一种用于句子分析中动词歧义结构消解的语义分析方法，在该发明中利用句子分析中的动词歧义结构消解的语义；包括：语义模型的构建和语义分析方法，所述的语义模型用于表达动词的歧义结构，所述的语义分析方法根据语义模型实现歧义结构的判断、消解和计算。该发明建立了一种表达动词歧义结构的统一的语义模型，把歧义结构上升到句子层面进行处理；但是没有用到外部的知识库，所以不能对动词的语义进行表示。The application number is: 200510088741.0, and the title is: A semantic analysis method for dissolving verb ambiguity structure in sentence analysis, in which the semantics of verb ambiguity structure dissolving in sentence analysis is used; including: construction of semantic model and semantic analysis In the method, the semantic model is used to express the ambiguity structure of the verb, and the semantic analysis method realizes the judgment, resolution and calculation of the ambiguity structure according to the semantic model. This invention establishes a unified semantic model for expressing the ambiguous structure of verbs, and raises the ambiguous structure to the sentence level for processing; however, no external knowledge base is used, so the semantics of verbs cannot be expressed.

综上所述，ReVerb系统对动词的表达粒度太细，而FrameNet对动词表达的粒度又太粗，所以我们急需一个能准确表达动词语义的算法和系统。To sum up, the ReVerb system expresses verbs with too fine granularity, while FrameNet expresses verbs with too coarse granularity, so we urgently need an algorithm and system that can accurately express verb semantics.

发明内容Contents of the invention

针对现有技术中的缺陷，本发明的目的是提供一种基于参数概念表达动词语义的方法及系统。In view of the defects in the prior art, the object of the present invention is to provide a method and system for expressing verb semantics based on the parameter concept.

根据本发明提供的基于参数概念表达动词语义的方法，包括如下步骤：The method for expressing verb semantics based on the parameter concept provided by the present invention comprises the following steps:

步骤1：抽取动词与宾语，以及动词与主语的依存关系；Step 1: Extract the verb and object, and the dependency relationship between the verb and the subject;

步骤2：根据动词与宾语、动词与主语的依存关系计算动词参数对于模式的熵值，并计算动词参数对于该动词的互信息值，计算得到动词参数权重；Step 2: Calculate the entropy value of the verb parameter for the pattern according to the dependency relationship between the verb and the object, and the verb and the subject, and calculate the mutual information value of the verb parameter for the verb, and calculate the weight of the verb parameter;

步骤3：将动词参数进行概念化，即寻找具有最大动词参数权重的k团。Step 3: Conceptualize the verb parameters, that is, find the k-cluster with the largest verb parameter weight.

优选地，所述步骤2包括：Preferably, said step 2 includes:

步骤2.1：计算动词参数对于模式的熵值，熵值越大，则动词参数的质量越好，所述质量是指某个词作为这个动词的参数的可靠度；Step 2.1: Calculate the entropy value of the verb parameter for the pattern, the larger the entropy value, the better the quality of the verb parameter, and the quality refers to the reliability of a certain word as the parameter of the verb;

${Entropy Entropy}_{v v} ((e e)) = = - - \underset{m m &Element; &Element; {M m}_{e e,, v v}}{Σ Σ} P P ((m m)) log log P P ((m m));;$

式中：Entropy_v(e)表示词e对于动词v的模式熵，P(m)表示模式m出现的概率，m表示模式m，即词e和动词v的不同搭配组合，M_e,v表示词e和动词v的所有搭配组合；In the formula: Entropy _v (e) represents the mode entropy of the word e for the verb v, P(m) represents the probability of the occurrence of the mode m, m represents the mode m, that is, different combinations of the word e and the verb v, M _e,v represents All collocations of the word e and the verb v;

步骤2.2：计算动词参数对于动词的互信息值，互信息值越高，则动词参数的质量越好，具体地，采用的二值的互信息，计算公式如下：Step 2.2: Calculate the mutual information value of the verb parameter for the verb. The higher the mutual information value, the better the quality of the verb parameter. Specifically, the calculation formula of the binary mutual information used is as follows:

${MI MI}_{v v} ((e e)) = = f f ((x x)) = = \{\begin{matrix} 11,, p p ((v v,, e e)) log log \frac{p p ((v v,, e e))}{p p ((v v)) p p ((e e))} > > 00 \\ - - 11,, o o t t h h e e r r w w i i s the s e e \end{matrix};;$

式中：MI_v(e)表示词e相对于动词v的互信息，p(v,e)表示在语料中动词v和词e一起出现的概率，p(v)表示动词v出现的概率，p(e)表示词e出现的概率；In the formula: MI _v (e) represents the mutual information of the word e relative to the verb v, p(v, e) represents the probability that the verb v and the word e appear together in the corpus, and p(v) represents the probability that the verb v appears, p(e) represents the probability of word e appearing;

步骤2.3：计算动词参数的权重值Q_v(e)，计算公式如下：Step 2.3: Calculate the weight value Q _v (e) of the verb parameter, the calculation formula is as follows:

Q_v(e)＝Entropy_v(e)×MI_v(e)。Qv(e)= _Entropyv (e)× _MIv ₍ e).

优选地，所述步骤3包括：采用分支限界的算法找到最大权重的k团；其中，所述分支限界的算法是指：构造一个搜索树，树的每一层，除去根节点，均表示的是是否选择某个参数概念，其中左分支就是选择该参数概念，右分支表示不选择该参数概念；当选择到的参数概念数量为k时，判断此k个参数概念是否在图中是一个团，如果是，则返回正确；如果不是一个团，则返回错误。Preferably, the step 3 includes: using a branch-and-bound algorithm to find the k-cluster with the largest weight; wherein, the branch-and-bound algorithm refers to: constructing a search tree, each layer of the tree, except the root node, represents Is whether to select a parameter concept, where the left branch is to select the parameter concept, and the right branch indicates not to select the parameter concept; when the number of selected parameter concepts is k, judge whether the k parameter concepts are a group in the graph , if yes, return true; if not a clique, return false.

根据本发明提供的基于参数概念表达动词语义的系统，包括：The system for expressing verb semantics based on the parameter concept provided by the present invention includes:

依存关系抽取模块：用于抽取动词与宾语，以及动词与主语的依存关系；Dependency extraction module: used to extract verbs and objects, as well as the dependency relationship between verbs and subjects;

动词参数权重计算模块：用于根据动词与宾语、动词与主语的依存关系计算动词参数对于模式的熵值，并计算动词参数对于该动词的互信息值，计算得到动词参数权重；Verb parameter weight calculation module: used to calculate the entropy value of the verb parameter for the mode according to the dependency relationship between the verb and the object, the verb and the subject, and calculate the mutual information value of the verb parameter for the verb, and calculate the weight of the verb parameter;

动词参数概念化模块，用于将动词参数进行概念化，即寻找具有最大动词参数权重的k团。The verb parameter conceptualization module is used to conceptualize the verb parameter, that is, to find the k-cluster with the largest verb parameter weight.

优选地，所述动词参数权重计算模块包括：Preferably, the verb parameter weight calculation module includes:

熵值计算模块：计算动词参数对于模式的熵值，熵值越大，则动词参数的质量越好，所述质量是指某个词作为这个动词的参数的可靠度；计算公式中如下：Entropy value calculation module: calculate the entropy value of the verb parameter for the pattern, the larger the entropy value, the better the quality of the verb parameter, and the quality refers to the reliability of a certain word as the parameter of this verb; in the calculation formula, it is as follows:

互信息值计算模块：计算动词参数对于动词的互信息值，互信息值越高，则动词参数的质量越好，具体地，采用的二值的互信息，计算公式如下：Mutual information value calculation module: Calculate the mutual information value of the verb parameter for the verb. The higher the mutual information value, the better the quality of the verb parameter. Specifically, the calculation formula for the binary mutual information used is as follows:

动词参数的权重值计算模块：计算动词参数的权重值Q_v(e)，计算公式如下：The weight value calculation module of the verb parameter: calculate the weight value Q _v (e) of the verb parameter, and the calculation formula is as follows:

Q_v(e)＝Entropy_v(e)×MI_v(e)。Qv(e)= _Entropyv (e)× _MIv ₍ e).

优选地，所述动词参数概念化模块采用分支限界的算法找到最大权重的k团；其中，所述分支限界的算法是指：构造一个搜索树，树的每一层，除去根节点，均表示的是是否选择某个参数概念，其中左分支就是选择该参数概念，右分支表示不选择该参数概念；当选择到的参数概念数量为k时，判断此k个参数概念是否在图中是一个团，如果是，则返回正确；如果不是一个团，则返回错误。Preferably, the verb parameter conceptualization module uses a branch-and-bound algorithm to find the k-cluster with the largest weight; wherein, the branch-and-bound algorithm refers to: constructing a search tree, each layer of the tree, except the root node, represents Is whether to select a parameter concept, where the left branch is to select the parameter concept, and the right branch indicates not to select the parameter concept; when the number of selected parameter concepts is k, judge whether the k parameter concepts are a group in the graph , if yes, return true; if not a clique, return false.

与现有技术相比，本发明具有如下的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、本发明提供的基于参数概念表达动词语义的方法，创造性地利用外部的知识库来表示动词的语义，并且提供给用户选择动词语义粒度的参数，使得得到的动词语义概念的大小适中。1. The method for expressing verb semantics based on parameter concepts creatively utilizes an external knowledge base to express verb semantics, and provides parameters for users to select the semantic granularity of verbs, so that the obtained verb semantic concepts are of moderate size.

2、本发明提供的基于参数概念表达动词语义的方法，得到即可被人类读懂的动词语义概念，也可以被机器直接进行计算。2. The method for expressing verb semantics based on parameter concepts provided by the present invention can obtain verb semantic concepts that can be read by humans, and can also be directly calculated by machines.

附图说明Description of drawings

通过阅读参照以下附图对非限制性实施例所作的详细描述，本发明的其它特征、目的和优点将会变得更明显：Other characteristics, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1为本发明技术程序框架图；Fig. 1 is a technical program frame diagram of the present invention;

图2为一种概念图结构；Figure 2 is a concept map structure;

图3为对应图2的另一种概念图结构；Fig. 3 is another conceptual diagram structure corresponding to Fig. 2;

图4为分支限界的搜索树示意图。FIG. 4 is a schematic diagram of a branch-and-bound search tree.

具体实施方式detailed description

下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明，但不以任何形式限制本发明。应当指出的是，对本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进。这些都属于本发明的保护范围。The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

根据本发明提供的基于参数概念表达动词语义的方法及系统，首先提出了一种准确抽取动词参数概念的算法，抽取的动词参数均来至于知识库，所述知识库包括Probase和WordNet。这些知识库存储了大量的概念与实体的关系，称作“IsA”的关系，比如苹果(apple)是一种(IsA)水果(fruit)，则水果是概念，苹果是实体。According to the method and system for expressing verb semantics based on the parameter concept provided by the present invention, an algorithm for accurately extracting the concept of verb parameters is first proposed, and the extracted verb parameters are all from a knowledge base, and the knowledge base includes Probase and WordNet. These knowledge bases store a large number of relationships between concepts and entities, which are called "IsA" relationships. For example, if an apple (apple) is a kind of (IsA) fruit, then the fruit is a concept and the apple is an entity.

定义了两个概念的语义重叠分数如下：The semantic overlap scores for two concepts are defined as follows:

$O o v v e e r r l l a a p p (({c c}_{11},, {c c}_{22})) = = | | {E E.}_{{c c}_{11}} \cap \cap {E E.}_{{c c}_{22}} | | / / m m i i n no {{{E E.}_{{c c}_{11}},, {E E.}_{{c c}_{22}}}};;$

其中，和分别表示概念c₁和c₂的实体集合。然后，本发明将动词参数概念化的问题转换成一个在无向图中找到最大权重的k团问题。一个概念图G＝(C,L,W)，其中C表示知识库的概念集，L表示两个概念之间的语义重叠分数小于一定的阈值τ，W表示在图中某个概念的权重，用来表示这个概念相对于某个动词的质量，也就是这个概念是否能表达这个动词的用法，比如“食物”相对于“吃”这个动词。图2、图3展示了某个概念图的结构。in, and represent the entity sets _of concepts c1 and _c2 , respectively. The present invention then converts the problem of verb parameter conceptualization into a k-clique problem of finding the maximum weight in an undirected graph. A concept map G=(C,L,W), where C represents the concept set of the knowledge base, L represents the semantic overlap score between two concepts is less than a certain threshold τ, W represents the weight of a certain concept in the graph, Used to indicate the quality of the concept relative to a verb, that is, whether the concept can express the usage of the verb, such as "food" relative to the verb "eat". Figure 2 and Figure 3 show the structure of a concept map.

通过计算概念图下面包含的实体的数量来定义概念权重，即假设对于某个动词来说这些实体的重要性一致，但是这种情况一般是不满足的。所以本发明定义了对某个动词v的实体e的质量以动词参数权重Q_v(e)表示，与步骤2.3对应，因此对于动词v的概念c的动词参数权重可以定义如下：The concept weight is defined by calculating the number of entities contained under the concept map, that is, assuming that the importance of these entities is consistent for a certain verb, but this situation is generally not satisfied. Therefore, the present invention defines that the quality of the entity e of a certain verb v is represented by the verb parameter weight Q _v (e), corresponding to step 2.3, so the verb parameter weight for the concept c of the verb v can be defined as follows:

${w w}_{v v} ((c c)) = = \underset{e e &Element; &Element; {{e e | | e e I I s the s A A c c}}}{Σ Σ} {Q Q}_{v v} ((e e))$

式中：w_v(c)表示概念c相对于动词v的参数权重；e是某个实体，c是某个概念，eIsA c表示实体e是概念c的一个实例，比如“苹果”是“食物”的一个实例。In the formula: w _v (c) represents the parameter weight of concept c relative to verb v; e is a certain entity, c is a certain concept, and eIsA c represents that entity e is an instance of concept c, such as "apple" is "food An instance of ".

因此，动词参数概念化问题就是在概念图G＝(C，L，W)中找到一个k团(定义为C_k)，使得下面的方程值最大，方程式如下：Therefore, the problem of verb parameter conceptualization is to find a k group (defined as C _k ) in the conceptual graph G=(C, L, W), so that the value of the following equation is the largest, and the equation is as follows:

${f f}_{v v} (({C C}_{k k})) = = \underset{c c &Element; &Element; {C C}_{k k}}{Σ Σ} {w w}_{c c} ((c c));;$

具体地，包括如下步骤：Specifically, include the following steps:

所述步骤2包括：Said step 2 includes:

步骤2.1：计算动词参数对于模式的熵值，熵值越大，则动词参数的质量(也就是某个词作为这个动词的参数的可靠度)越好；Step 2.1: Calculate the entropy value of the verb parameter for the pattern, the larger the entropy value, the better the quality of the verb parameter (that is, the reliability of a certain word as the parameter of the verb);

式中：Entropy_v(e)表示词e对于动词v的模式熵，P(m)表示模式m出现的概率，m表示模式m(词e和动词v的不同搭配组合)，M_e,v表示词e和动词v的所有搭配组合；In the formula: Entropy _v (e) represents the pattern entropy of word e for verb v, P(m) represents the probability of pattern m appearing, m represents pattern m (different combinations of word e and verb v), M _e,v represents All collocations of the word e and the verb v;

步骤2.3：计算动词参数的权重值：Step 2.3: Calculate the weight value of the verb parameter:

Q_v(e)＝Entropy_v(e)×MI_v(e)。Qv(e)= _Entropyv (e)× _MIv ₍ e).

所述步骤3包括：采用分支限界的算法找到最大权重的k团。The step 3 includes: using a branch and bound algorithm to find the k-cluster with the largest weight.

图2中C0，C1，C2和C3分别代表四个参数概念，其中C0和C3，C1和C3的重合度较高(语义相对较近)，从而构建出一个图，图中C0和C3，C1和C3没有边相连接，如图3所示。目标是在图3所示的图中找到最大的权值的k团，这里令k＝3；分支限界的算法值得是构造一个搜索树，如图4所示，树的每一层(除去根节点)表示的是是否选择某个参数概念，其中左分支就是选择该参数概念，右分支表示不选择该参数概念；当选择到的参数概念数量为k时，判断此k个参数概念是否在图中是一个团，如果是则返回正确，不是一个团则返回错误，示例参见图4。In Figure 2, C0, C1, C2 and C3 respectively represent four parameter concepts, among which C0 and C3, C1 and C3 have a high degree of coincidence (relatively close in semantics), thus constructing a graph, in which C0 and C3, C1 There is no edge connection with C3, as shown in Figure 3. The goal is to find the k group with the largest weight in the graph shown in Figure 3, where k=3; the branch and bound algorithm is worth constructing a search tree, as shown in Figure 4, each layer of the tree (excluding the root Node) indicates whether to select a parameter concept, where the left branch is to select the parameter concept, and the right branch indicates not to select the parameter concept; when the number of selected parameter concepts is k, it is judged whether the k parameter concepts are in the graph is a clique, if it is, it will return correct, if it is not a clique, it will return an error, see Figure 4 for an example.

以上对本发明的具体实施例进行了描述。需要理解的是，本发明并不局限于上述特定实施方式，本领域技术人员可以在权利要求的范围内做出各种变形或修改，这并不影响本发明的实质内容。Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art may make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention.

Claims

1. A method for expressing verb semantics based on parameter concept, is characterized in that, comprises the steps:

Step 1: Extract the verb and object, and the dependency relationship between the verb and the subject;

Step 2: Calculate the entropy value of the verb parameter for the pattern according to the dependency relationship between the verb and the object, and the verb and the subject, and calculate the mutual information value of the verb parameter for the verb, and calculate the weight of the verb parameter;

Step 3: Conceptualize the verb parameters, that is, find the k-cluster with the largest verb parameter weight.

2. the method for expressing verb semantics based on parameter concept according to claim 1, is characterized in that, described step 2 comprises:

Step 2.1: Calculate the entropy value of the verb parameter for the pattern, the larger the entropy value, the better the quality of the verb parameter, and the quality refers to the reliability of a certain word as the parameter of the verb;

{Entropy Entropy}_{v v} ((e e)) = = - - \underset{m m &Element; &Element; {M m}_{e e,, v v}}{Σ Σ} P P ((m m)) log log P P ((m m));;

In the formula: Entropy _v (e) represents the mode entropy of the word e for the verb v, P(m) represents the probability of the occurrence of the mode m, m represents the mode m, that is, different combinations of the word e and the verb v, M _e,v represents All collocations of the word e and the verb v;

Step 2.2: Calculate the mutual information value of the verb parameter for the verb. The higher the mutual information value, the better the quality of the verb parameter. Specifically, the calculation formula of the binary mutual information used is as follows:

{MI MI}_{v v} ((e e)) = = f f ((x x)) = = \{\begin{matrix} 11,, p p ((v v,, e e)) l l o o g g \frac{p p ((v v,, e e))}{p p ((v v)) p p ((e e))} > > 00 \\ - - 11,, o o t t h h e e r r w w i i s the s e e \end{matrix};;

In the formula: MI _v (e) represents the mutual information of the word e relative to the verb v, p(v, e) represents the probability that the verb v and the word e appear together in the corpus, and p(v) represents the probability that the verb v appears, p(e) represents the probability of word e appearing;

Step 2.3: Calculate the weight value Q _v (e) of the verb parameter, the calculation formula is as follows:

Qv(e)= _Entropyv (e)× _MIv ₍ e).

3. the method for expressing verb semantics based on parameter concept according to claim 1, is characterized in that, described step 3 comprises: adopt the algorithm of branch and bound to find the k group of maximum weight; Wherein, the algorithm of described branch and bound refers to : Construct a search tree. Each layer of the tree, except the root node, indicates whether to choose a certain parameter concept, where the left branch means to choose the parameter concept, and the right branch means not to choose the parameter concept; when the selected parameter When the number of concepts is k, judge whether the k parameter concepts are a clique in the graph, if yes, return correct; if not a clique, return error.

4. A system for expressing verb semantics based on parameter concepts, characterized in that it includes:

Dependency extraction module: used to extract verbs and objects, as well as the dependency relationship between verbs and subjects;

Verb parameter weight calculation module: used to calculate the entropy value of the verb parameter for the mode according to the dependency relationship between the verb and the object, the verb and the subject, and calculate the mutual information value of the verb parameter for the verb, and calculate the weight of the verb parameter;

The verb parameter conceptualization module is used to conceptualize the verb parameter, that is, to find the k-cluster with the largest verb parameter weight.

5. the system based on parameter concept expression verb semantics according to claim 4, is characterized in that, described verb parameter weight calculation module comprises:

Entropy value calculation module: calculate the entropy value of the verb parameter for the pattern, the larger the entropy value, the better the quality of the verb parameter, and the quality refers to the reliability of a certain word as the parameter of this verb; in the calculation formula, it is as follows:

{Entropy Entropy}_{v v} ((e e)) = = - - \underset{m m &Element; &Element; {M m}_{e e,, v v}}{Σ Σ} P P ((m m)) log log P P ((m m));;

Mutual information value calculation module: Calculate the mutual information value of the verb parameter for the verb. The higher the mutual information value, the better the quality of the verb parameter. Specifically, the calculation formula for the binary mutual information used is as follows:

{MI MI}_{v v} ((e e)) = = f f ((x x)) = = \{\begin{matrix} 11,, p p ((v v,, e e)) l l o o g g \frac{p p ((v v,, e e))}{p p ((v v)) p p ((e e))} > > 00 \\ - - 11,, o o t t h h e e r r w w i i s the s e e \end{matrix};;

The weight value calculation module of the verb parameter: calculate the weight value Q _v (e) of the verb parameter, and the calculation formula is as follows:

Qv(e)= _Entropyv (e)× _MIv ₍ e).

6. The system for expressing verb semantics based on the parameter concept according to claim 4, wherein the verb parameter conceptualization module uses a branch and bound algorithm to find the k group of maximum weight; wherein, the branch and bound algorithm refers to : Construct a search tree. Each layer of the tree, except the root node, indicates whether to choose a certain parameter concept, where the left branch means to choose the parameter concept, and the right branch means not to choose the parameter concept; when the selected parameter When the number of concepts is k, judge whether the k parameter concepts are a clique in the graph, if yes, return correct; if not a clique, return error.