Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In recent years, with the rapid development of artificial intelligence, form question answering has received more and more attention. Form question answering (TableQA) is a technique for asking questions based on existing form knowledge to obtain accurate answers. At present, the form question and answer is mainly standardized aiming at the questions input by the user, so that database searching is carried out to obtain the final answer.
In the related art, dictionary matching and model prediction are mainly adopted for named entity recognition to standardize the problem input by the user, but the attribute synonym or the attribute value alias in the problem input by the user is difficult to be completely covered by the dictionary, and the prediction accuracy of the model is low when the number of category labels corresponding to the problem input by the user is large.
Therefore, in order to solve the existing problems, the present disclosure provides a question and answer processing method, device, electronic device and storage medium.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. It should be noted that the question answering processing method according to the embodiment of the present disclosure may be applied to a question answering processing apparatus according to the embodiment of the present disclosure, and the apparatus may be configured in an electronic device. The electronic device may be a mobile terminal, for example, a mobile phone, a tablet computer, a personal digital assistant, and other hardware devices with various operating systems.
As shown in fig. 1, the question answering method may include the following steps:
step 101, obtaining a target question.
In the embodiment of the present disclosure, the target question may be collected online, for example, a sentence with a question structure may be collected online by using a web crawler technology, and the collected target question is used as the target question, or the target question may also be a sentence with a question structure collected offline, or the target question may also be a manually synthesized question, and the like.
And 102, performing dependency syntax analysis on the target question to obtain the dependency relationship among the participles in the target question.
For example, dependency syntax analysis of the target question "how much the price of pax is" can result in the dependency relationship between "pax", "what", "price", "yes" and "how much". Dependencies may include, but are not limited to: a main-meaning relationship, a moving-object relationship, a centering relationship, a fictitious component and the like.
And 103, matching the dependency relationship among the participles in the target question with the dependency relationship among the participle types in the set high-frequency dependency subgraph.
In the disclosed embodiments, dependency syntax analysis may be performed on each sample question in the sample question training set, thereby determining a high-frequency dependency subgraph. It should be noted that the high-frequency dependency sub-graph may include the dependency relationship between each node and the probability of the type to which each node belongs.
Further, the dependency relationship graph corresponding to the target question can be determined according to the dependency relationship among the participles in the target question, and the dependency relationship in the dependency relationship graph is matched with the dependency relationship in the high-frequency dependency subgraph to determine the participle types matched with the dependency relationship in the high-frequency dependency subgraph.
And step 104, determining the type of each corresponding target participle in the target question sentence according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph.
Furthermore, each participle in the target question sentence can be determined to be a corresponding node in the high-frequency dependency subgraph, the probability that each participle belongs to each type is determined according to the probability recorded by each node in the high-frequency dependency subgraph, and the type of each participle can be determined according to the probability that each participle belongs to each type.
And 105, determining a standard question corresponding to the target question sentence according to the type of the target participle to determine a reply answer.
And then, according to the type of each target participle, determining standard data corresponding to each target participle, determining target standard data from the standard data, standardizing the target question sentence according to the target standard data to generate a standard question corresponding to the target question sentence, and further determining a reply answer corresponding to the target question sentence.
In conclusion, by acquiring a target question; performing dependency syntax analysis on the target question to obtain the dependency relationship among all the participles in the target question; matching the dependency relationship among the participles in the target question sentence with the dependency relationship among the participle types in the set high-frequency dependency subgraph; determining the type of each corresponding target participle in the target question sentence according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph; and determining a standard question corresponding to the target question sentence according to the type of the target word segmentation so as to determine a reply answer. Therefore, the types of all participles in the target question sentence can be determined through the types of all nodes of the matched high-frequency dependency subgraphs, the target participles in the target problem can be effectively standardized, the problem matching method is suitable for problem matching of different scenes, and the coverage and accuracy of problem standardization are improved.
In order to determine that each participle type in the high-frequency dependency subgraph has a corresponding relationship with a participle in the target question, as shown in fig. 2, fig. 2 is a schematic diagram according to a second embodiment of the present disclosure, in the embodiment of the present disclosure, a dependency relationship in the dependency relationship graph corresponding to the target question may be matched with a dependency relationship in the high-frequency dependency subgraph to determine that each participle type in the high-frequency dependency subgraph has a corresponding relationship with a participle in the target question, which has the same dependency relationship, the embodiment shown in fig. 2 may include the following steps:
step 201, a target question is obtained.
Step 202, performing dependency syntax analysis on the target question to obtain the dependency relationship among the participles in the target question.
Step 203, determining a dependency relationship graph corresponding to the target question according to the dependency relationship among all the participles of the target question.
For example, as shown in fig. 3, dependency syntax analysis is performed on how much the price of the target question "pax specific" to obtain the dependency relationship between "pax specific", "price", "yes" and "how much", and the dependency relationship diagram corresponding to the target question can be determined according to the dependency relationship between "pax specific", "price", "yes" and "how much".
And step 204, inquiring the dependency relationship in the high-frequency dependency subgraph in the dependency relationship graph.
Further, it is queried whether there is a dependency relationship in the high-frequency dependency sub-graph in the dependency graph, for example, the dependency relationship in the high-frequency dependency sub-graph is "fixed relationship", and it may be queried whether the dependency graph includes "fixed relationship" in the dependency graph shown in fig. 3.
And step 205, under the condition that the dependency relationship in the high-frequency dependency subgraph exists in the dependency relationship graph, determining that the participle types in the high-frequency dependency subgraph have corresponding matching relationships with participles with the same dependency relationship in the target question sentence.
For example, the dependency graph is the right part of fig. 3, the high-frequency dependency subgraph is shown in fig. 4, the dependency graph in fig. 3 includes the high-frequency dependency subgraph in fig. 4, the "price" in the target question may correspond to node 0 in fig. 4, and the "paxter" in the target question corresponds to node 1 in fig. 4.
And step 206, determining the type of each corresponding target participle in the target question sentence according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph.
It should be noted that the high-frequency dependency subgraph includes the dependency relationship between the nodes and the probability of the type to which each node belongs, and after determining that each participle type in the high-frequency dependency subgraph has a corresponding matching relationship with the participle with the same dependency relationship in the target question sentence, the type of each corresponding target participle in the target question sentence can be determined according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph.
In order to accurately determine the type of the participle, optionally, for each participle in the target sentence, querying the probability of each type recorded by the corresponding node in the matched high-frequency dependency subgraph; determining the probability of each type of the participle according to the queried probability of each type; and determining the type of the participle according to the probability that the participle belongs to each type. It should be noted that the number of the matched high-frequency dependency subgraphs may be one or more.
As an example, the number of the matched high-frequency dependency subgraphs is one, for each participle in the target sentence, the probability of each type recorded by the corresponding node in the matched high-frequency dependency subgraph is queried, and the type of the participle can be determined according to the highest probability that the participle belongs to each type. For example, the participle "cost" corresponds to node 0 in the high-frequency dependency sub-graph, the probability that node 0 is type "attribute" is 96.4%, the probability that node 0 is type "attribute value" is 1.3%, the probability that node 0 is type "other" is 2.2%, and the type of "cost" may be "attribute".
As another example, the number of the matched high-frequency dependency subgraphs is at least two, for each participle in the target statement, the probabilities of the respective types recorded by the corresponding nodes in the matched respective high-frequency dependency subgraphs are queried, the queried probabilities of the respective types may be normalized, and the probability that the participle belongs to the respective types is determined according to the normalization result.
For example, the participle "cost" corresponds to the node 0 in the high-frequency dependency subgraph a, the probability that the node 0 is the "attribute" type is 96.4%, the probability that the node 0 is the "attribute value" type is 1.3%, the probability that the node 0 is the "other" type is 2.2%, the participle "cost" corresponds to the node 1 in the high-frequency dependency subgraph B, the probability that the node 1 is the "attribute" type is 58.4%, the probability that the node 1 is the "attribute value" type is 38.8%, the probability that the node 1 is the "other" type is 2.6%, and the probability that the node 0 in the high-frequency dependency subgraph a is the "attribute" type and the probability that the node 1 in the high-frequency dependency subgraph B is the "attribute" type can be added to obtain the probability that the "attribute" type "corresponds (96.4% + 58.4%); adding the probability of the attribute value type of the node 0 in the high-frequency dependency subgraph A and the probability of the attribute value type of the node 1 in the high-frequency dependency subgraph B to obtain the probability of the attribute value type (1.3% + 38.8%), and adding the probability of the other types of the node 0 in the high-frequency dependency subgraph A and the probability of the other types of the node 1 in the high-frequency dependency subgraph B to obtain the probability of the other types (2.2% + 2.6%); furthermore, the probability of the "attribute" type, the probability of the "attribute value" type, and the probability of the "other" type are respectively subjected to probability normalization. For example, the normalized probability of the "attribute" type ═ 77.5% (96.4% + 58.4%)/((96.4% + 58.4%) + (1.3% + 38.8%) + (2.2% + 2.6%)); the normalized probability of the "attribute value" type ═ 20.08% (1.3% + 38.8%)/((96.4% + 58.4%) + (1.3% + 38.8%) + (2.2% + 2.6%)); the normalized probability of the "other" type ═ 2.2% + 2.6%/((96.4% + 58.4%) + (1.3% + 38.8%) + (2.2% + 2.6%)) -2.4%, and further, the "attribute" type having the highest probability value can be used as the type of the target participle "price", and the probability value of the "price" being the "attribute" type is 77.5%.
It should be noted that, in order to further improve the accuracy of determining the types of the respective participles in the target question, in the embodiment of the present disclosure, before querying, for each participle in the target sentence, the probabilities of the respective types recorded by the corresponding nodes in the matched high-frequency dependency subgraph, type probability initialization may be performed on the respective participles of the target question, for example, the "attribute" type probability of "cost" in the target question may be initialized to 0%, the probability of the "attribute value" type is initialized to 0%, and the probability of the "other" types is initialized to 100%.
For example, as shown in fig. 5, the target question is "what the brand with a price of more than 20 ten thousand is", the type of each corresponding target participle in the target question is determined according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph, for example, "more" and "cost" may correspond to a predicate relationship in the high-frequency dependency subgraph, and "brand" and "more" may correspond to a predicate relationship in the high-frequency dependency subgraph, and meanwhile, according to the probability of each type recorded by each node in the high-frequency dependency subgraph, the type of "price" is determined as a type "attribute", the type of "more" is determined as a type "other", and the type of "brand" is determined as a type "attribute".
Step 207, determining a standard question corresponding to the target question sentence according to the type of the target participle to determine a reply answer.
It should be noted that the execution processes of steps 201 to 202 and step 207 may be implemented by any one of the embodiments of the present disclosure, and the embodiments of the present disclosure do not limit this, and are not described again.
In conclusion, determining a dependency relationship graph corresponding to the target question according to the dependency relationship among all the participles of the target question; inquiring the dependency relationship in the high-frequency dependency subgraph in the dependency relationship graph; and in the case that the dependency relationship in the high-frequency dependency subgraph exists in the dependency relationship graph, determining that each participle type in the high-frequency dependency subgraph has a corresponding matching relationship with participles with the same dependency relationship in the target question sentence. Therefore, whether a high-frequency dependency subgraph exists in the dependency graph corresponding to the target question can be accurately judged, and when the dependency graph has the dependency relationship in the high-frequency dependency subgraph, the matching relationship between each participle type in the high-frequency dependency subgraph and participles with the same dependency relationship in the target question can be accurately determined.
In order to accurately determine the standard question corresponding to the target question, as shown in fig. 6, fig. 6 is a schematic diagram according to a third embodiment of the present disclosure, in the embodiment of the present disclosure, standard data matched with the type may be queried according to the type of the target participle, the target standard data may be determined from the standard data, and the target question may be normalized according to the target standard data to generate the standard question corresponding to the target question. The embodiment shown in fig. 6 may include the following steps:
step 601, obtaining a target question sentence.
Step 602, performing dependency syntax analysis on the target question to obtain a dependency relationship between each participle in the target question.
Step 603, matching the dependency relationship between each participle in the target question sentence with the dependency relationship between each participle type in the set high-frequency dependency subgraph.
And step 604, determining the type of each corresponding target participle in the target question sentence according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph.
And step 605, inquiring each standard data matched with the type according to the type of the target participle.
In the embodiment of the present disclosure, after determining the type of the target participle, the target participle may be matched with table data of the same type of the participle, and each standard data matched with the type may be determined.
For example, as shown in fig. 7, the table data indicates "what the target question is in the brand with a price of more than 20 ten thousand", and the dependency relationship graph corresponding to the target question is matched with the set high-frequency dependency subgraph to determine that the "price" and the "brand" in the target question belong to the "attribute" type, and the attribute slots "model", "brand", "price", "country", "seat number" and "panoramic sunroof" in the table data can be used as standard data matched with the "price" and the "brand".
And 606, determining semantic similarity between each standard data and the target participle.
For example, the standard data is "vehicle model", "brand", "price", "country", "seat number", and "panoramic sunroof", the target segmented words are "price" and "brand", semantic similarity can be calculated by comparing each segmented word in the standard data with the target segmented word "price", and semantic similarity can be calculated by comparing each segmented word in the standard data with the target segmented word "brand".
And step 607, determining the target standard data matched with the type from the standard data according to the semantic similarity.
And further sequencing the standard data according to the semantic similarity between the standard data and the target participle so as to determine the target standard data matched with the type from the standard data. For example, the standard data are sorted according to the semantic similarity from high to low, and the standard data with the highest semantic similarity with the target participle is used as the target standard data matched with the type of the target participle. For example, the target standard data matched with the target word "price" is "price", and the target standard data matched with the target word "brand" is "brand".
Step 608, standardizing the target question sentence according to the target standard data to generate a standard question corresponding to the target question sentence, so as to determine a reply answer.
In the embodiment of the present disclosure, the target standard data may be used to replace the target participle in the target question to generate the standard question corresponding to the target question, for example, the target question is "what brand with price exceeding 20 ten thousand", the target participle "price" may be replaced by the target standard data "price", the target participle "brand" may be replaced by the target standard data "brand", the generated standard question is "what brand with price exceeding 20 ten thousand", and according to the form data, the answer may be returned to the standard question.
It should be noted that the execution processes of steps 601 to 604 may be implemented by any one of the embodiments of the present disclosure, and the embodiments of the present disclosure do not limit this and are not described again.
In conclusion, according to the type of the target participle, inquiring each standard data matched with the type; determining semantic similarity with the target participle aiming at each standard data; and according to the semantic similarity, determining target standard data matched with the type from the standard data, and standardizing the target question sentence according to the target standard data to generate a standard problem corresponding to the target question sentence. Therefore, the target standard data corresponding to the target participle in the target question sentence can be accurately determined from the standard data, and further, the standard question corresponding to the target question sentence can be accurately generated.
In order to match the dependency relationship between the participles in the target question with the dependency relationship between the participle types in the set high-frequency dependency subgraph, a high-frequency dependency subgraph may be obtained first, as shown in fig. 8, where fig. 8 is a schematic diagram according to a fourth embodiment of the present disclosure, in the embodiment of the present disclosure, dependency syntax analysis may be performed on the sample question in the sample question training set to generate a sample dependency relationship graph corresponding to the sample question, and then, according to a set rule, the high-frequency dependency subgraph is determined from the sample dependency relationship graph, where the embodiment shown in fig. 8 may include the following steps:
step 801, a target question is obtained.
Step 802, performing dependency syntax analysis on the target question to obtain a dependency relationship between each participle in the target question.
Step 803, a sample question training set is obtained.
Optionally, obtaining a plurality of sample question sentences; determining a sample question template corresponding to the sample question according to the plurality of sample questions; generating a rewriting sample question corresponding to the sample question according to the sample question template; and taking the sample question and the rewritten sample question as a sample question training set.
In the embodiment of the present disclosure, the sample question may be collected online, for example, a sentence with a question structure may be collected online by using a web crawler technology, and the collected sample question is used as the sample question, or the sample question may also be a sentence with a question structure collected offline, or the sample question may also be a manually synthesized question, and the like. It should be noted that the sample question training set may include multiple types of sample questions, for example, the sample question may be "X vehicles with the highest price", and for example, the sample question may be "what brand is with a price exceeding 20 ten thousand"; for example, the sample question is "how much more X wins than PaX.
Furthermore, in order to expand the number of sample questions in the sample question training set, a sample question template corresponding to the sample questions can be determined, rewritten sample questions can be obtained according to the sample question template and in combination with different scenes and different requirements, and the sample questions and the rewritten sample questions can be used as the sample question training set.
For example, as shown in Table 1,
table 1: example of sentence pattern template
In table 1, [ text _1_ value ] indicates that the type of the first participle is "attribute value (value)", and the corresponding numeric type is character type (text), which represents "X french H6"; [ text _2_ name ] [ text _3_ name ] indicates that the type of the second participle and the type of the third participle are 'attributes (names)', and the corresponding numerical value types are all character types (text) and represent 'brands' and 'prices'; [ text _ finish _ key ] is an additional identifier that can be extended to common problem concluding words such as "how many", "what", etc.
In the embodiment of the present disclosure, different types of sample questions may respectively correspond to one sample question template, rewritten sample questions corresponding to the sample questions may be obtained according to the sample question template, and the sample questions and the rewritten sample questions may be used as a sample question training set. For example, as shown in table 2, a synonym is rewritten for a question based on a sample question template, and a rewritten sample question can be obtained.
TABLE 2 rewrite sample question
Step 804, for each sample question in the sample question training set, performing dependency syntax analysis on the sample question to generate a sample dependency relationship diagram corresponding to the sample question.
Further, performing dependency syntax analysis on each sample question in the sample question training set to generate a corresponding sample dependency relationship graph.
Step 805, according to the set rule, determining a high-frequency dependency subgraph from the sample dependency graph.
Optionally, determining corresponding candidate dependency subgraphs in each sample dependency graph according to the dependency relationship of each node in the sample dependency graph; and when the number of the nodes of the candidate dependency subgraph is larger than or equal to a set node number threshold value and the frequency of the candidate dependency subgraph appearing in the sample question training set is larger than or equal to a set frequency threshold value, taking the candidate dependency subgraph as a high-frequency dependency subgraph.
That is, in order to accurately determine the high-frequency dependency subgraph, in the embodiment of the present disclosure, for the sample dependency graph corresponding to the sample question, candidate dependency subgraphs in the sample dependency graph may be determined according to the dependency relationship of each node in the sample dependency graph, for example, the candidate dependency subgraph may be a "middle relationship" dependency subgraph in the sample dependency graph, and further, the number of nodes of each candidate subgraph and the probability of each candidate dependency subgraph appearing in all sample dependency graphs are counted, when the number of nodes of the candidate dependency subgraph is greater than or equal to a set node number threshold value and the frequency of the candidate dependency subgraph appearing in the sample dependency training set is greater than or equal to a set frequency threshold value, the candidate dependency subgraph is taken as the high-frequency dependency subgraph. For example, if the threshold of the number of nodes is set to 3 and the threshold of the frequency is set to 10%, more than 10% of the problems can be included in the candidate dependency subgraph, and the number of nodes in the candidate dependency subgraph is greater than or equal to 3, then the candidate dependency subgraph is the high-frequency dependency subgraph.
Step 806, matching the dependency relationship between the participles in the target question sentence with the dependency relationship between the participle types in the set high-frequency dependency subgraph.
And step 807, determining the type of each corresponding target participle in the target question sentence according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph.
Step 808, determining a standard question corresponding to the target question sentence according to the type of the target participle to determine a reply answer.
It should be noted that the execution processes of steps 801 to 802 and steps 806 to 807 may be implemented by any one of the embodiments of the present disclosure, and the embodiments of the present disclosure do not limit this and are not described again.
In conclusion, a sample question training set is obtained; performing dependency syntax analysis on the sample question aiming at each sample question in the sample question training set to generate a sample dependency relationship graph corresponding to the sample question; according to the set rule, the high-frequency dependency subgraph is determined from the sample dependency graph, so that the dependency relationship in the question can be automatically captured in the sample question, and further, the high-frequency dependency subgraph can be accurately determined from the sample dependency graph.
In order to accurately determine the probability values of the types to which the respective nodes in the high-frequency dependency subgraph belong, as shown in fig. 9, fig. 9 is a schematic diagram according to a fifth embodiment of the present disclosure, in the embodiment of the present disclosure, after the candidate dependency subgraph is taken as the high-frequency dependency subgraph, the high-frequency dependency subgraph may be used to traverse the sample question so as to count the types of the respective nodes of the high-frequency dependency subgraph in the sample question; and determining the probability of each type recorded by each node in the high-frequency dependency subgraph according to the type of each node in the sample question of the high-frequency dependency subgraph. The embodiment shown in fig. 9 may include the following steps:
step 901, obtain a target question.
Step 902, performing dependency syntax analysis on the target question to obtain a dependency relationship between each participle in the target question.
Step 903, obtaining a sample question training set.
Step 904, for each sample question in the sample question training set, performing dependency syntax analysis on the sample question to generate a sample dependency relationship graph corresponding to the sample question.
Step 905, determining a high-frequency dependency sub-graph from the sample dependency graph according to a set rule.
Step 906, traversing the sample question by using the high-frequency dependency subgraph to count the types of each node of the high-frequency dependency subgraph in the sample question.
Step 907, determining the probability of each type recorded by each node in the high-frequency dependency subgraph according to the type of each node in the sample question of the high-frequency dependency subgraph.
For example, as shown in fig. 10, the high-frequency dependency subgraph includes a node 0, a node 1 and a node 2, where the node 0 and the node 1 are in a fixed relationship, the node 1 and the node 2 are in a dominating relationship, and the number of the node 0 in the sample question that is the type "attribute", the type "attribute value" and the type "other" is counted, and similarly, the number of the node 1 in the sample question that is the type "attribute", the type "attribute value" and the type "other" is counted, and the number of the node 2 in the sample question that is the type "attribute", the type "attribute value" and the type "other" is counted. And determining the probability of each type recorded by the node according to the number of the type attribute, the type attribute value and the type other of the node in the sample question. For example, if the number of sample questions of the high-frequency dependency subgraph including the intermediate relationship is 3, the node 0 of the intermediate relationship is the type "attribute" in the first sample question, the node 0 is the type "attribute value" in the second sample question, and the node 0 is the type "attribute" in the third sample question, the probability that the node 0 is the type "attribute" is 2/3. Similarly, the probability that node 0 is of type "attribute value" and type "other" can be obtained.
And 908, matching the dependency relationship among the participles in the target question with the dependency relationship among the participle types in the set high-frequency dependency subgraph.
And step 909, determining the type of each corresponding target participle in the target question sentence according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph.
Step 910, determining a standard question corresponding to the target question sentence according to the type of the target participle to determine a reply answer.
It should be noted that the execution processes of steps 901 to 906 and steps 908 to 910 may be implemented by any one of the embodiments of the present disclosure, and the embodiments of the present disclosure do not limit this and are not described again.
In conclusion, the sample question is traversed by adopting the high-frequency dependency subgraph to count the types of all nodes of the high-frequency dependency subgraph in the sample question; and determining the probability of each type recorded by each node in the high-frequency dependency subgraph according to the type of each node in the sample question of the high-frequency dependency subgraph, thereby accurately determining the probability value of the type to which each node in the high-frequency dependency subgraph belongs.
In the question-answer processing method of the embodiment of the disclosure, a target question sentence is obtained; performing dependency syntax analysis on the target question to obtain the dependency relationship among all the participles in the target question; matching the dependency relationship among the participles in the target question sentence with the dependency relationship among the participle types in the set high-frequency dependency subgraph; determining the type of each corresponding target participle in the target question sentence according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph; and determining a standard question corresponding to the target question sentence according to the type of the target word segmentation so as to determine a reply answer. Therefore, the types of all participles in the target question sentence can be determined through the types of all nodes of the matched high-frequency dependency subgraphs, the target participles in the target problem can be effectively standardized, the problem matching of different scenes can be applied, the coverage and accuracy of problem standardization are improved, and meanwhile, the problem standardized analysis pressure of cold start can be relieved by utilizing the high-frequency dependency subgraphs.
In order to implement the above embodiments, the present disclosure proposes a question answering processing apparatus. Fig. 11 is a schematic diagram according to a sixth embodiment of the present disclosure.
As shown in fig. 11, the question-answering processing apparatus 1100 includes: a first acquisition module 1110, an analysis module 1120, a matching module 1130, a first determination module 1140, and a second determination module 1150.
The first obtaining module 1110 is configured to obtain a target question; the analysis module 1120 is configured to perform dependency syntax analysis on the target question to obtain a dependency relationship between each participle in the target question; the matching module 1130 is used for matching the dependency relationship among the participles in the target question sentence with the dependency relationship among the participle types in the set high-frequency dependency subgraph; the first determining module 1140 is configured to determine the type of each corresponding target participle in the target question sentence according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph; a second determining module 1150, configured to determine a standard question corresponding to the target question sentence according to the type of the target participle, so as to determine a reply answer.
As one possible implementation of the embodiments of the present disclosure, the matching module 1130 is configured to: determining a dependency relationship graph corresponding to the target question according to the dependency relationship among all the participles of the target question; querying the dependency relationship in the high-frequency dependency subgraph in the dependency relationship graph; and in the case that the dependency relationship in the high-frequency dependency subgraph exists in the dependency relationship graph, determining that each participle type in the high-frequency dependency subgraph has a corresponding matching relationship with participles with the same dependency relationship in the target question sentence.
As a possible implementation manner of the embodiment of the present disclosure, the matching module 1130 is further configured to: inquiring the probability of each type recorded by the corresponding node in the matched high-frequency dependency subgraph aiming at each participle in the target sentence; determining the probability of each type of the participle according to the queried probability of each type; and determining the type of the participle according to the probability that the participle belongs to each type.
As a possible implementation manner of the embodiment of the present disclosure, the second determining module 1150 is configured to: inquiring each standard data matched with the type according to the type of the target word segmentation; determining semantic similarity with the target participle aiming at each standard data; determining target standard data matched with the type from the standard data according to the semantic similarity; and standardizing the target question to generate a standard question corresponding to the target question according to the target standard data.
As a possible implementation manner of the embodiment of the present disclosure, the question answering processing apparatus 1100 further includes: the device comprises a second obtaining module, a generating module and a third determining module.
The second acquisition module is used for acquiring a sample question training set; the generating module is used for carrying out dependency syntax analysis on the sample question aiming at each sample question in the sample question training set so as to generate a sample dependency relationship graph corresponding to the sample question; and the third determining module is used for determining the high-frequency dependency subgraph from the sample dependency graph according to a set rule.
As a possible implementation manner of the embodiment of the present disclosure, the third determining module is configured to determine, according to the dependency relationship of each node in the sample dependency relationship graph, each corresponding candidate dependency subgraph in each sample dependency relationship graph; and when the number of the nodes of the candidate dependency subgraph is larger than or equal to a set node number threshold value and the frequency of the candidate dependency subgraph appearing in the sample question training set is larger than or equal to a set frequency threshold value, taking the candidate dependency subgraph as a high-frequency dependency subgraph.
As a possible implementation manner of the embodiment of the present disclosure, the second obtaining module is configured to: obtaining a plurality of sample question sentences; determining a sample question template corresponding to the sample question according to the plurality of sample questions; generating a rewriting sample question corresponding to the sample question according to the sample question template; and taking the sample question and the rewritten sample question as a sample question training set.
As a possible implementation manner of the embodiment of the present disclosure, the question answering processing apparatus 1100 further includes: a statistic module and a fourth determination module.
The statistical module is used for traversing the sample question by adopting the high-frequency dependency subgraph to count the types of each node of the high-frequency dependency subgraph in the sample question; and the fourth determining module is used for determining the probability of each type recorded by each node in the high-frequency dependency subgraph according to the type of each node in the sample question of the high-frequency dependency subgraph.
The question-answer processing device of the embodiment of the disclosure acquires a target question; performing dependency syntax analysis on the target question to obtain the dependency relationship among all the participles in the target question; matching the dependency relationship among the participles in the target question sentence with the dependency relationship among the participle types in the set high-frequency dependency subgraph; determining the type of each corresponding target participle in the target question sentence according to each participle type matched with the dependency relationship in the high-frequency dependency subgraph; and determining a standard question corresponding to the target question sentence according to the type of the target word segmentation so as to determine a reply answer. Therefore, the types of all participles in the target question sentence can be determined through the types of all nodes of the matched high-frequency dependency subgraphs, the target participles in the target problem can be effectively standardized, the problem matching of different scenes can be applied, the coverage and accuracy of problem standardization are improved, and meanwhile, the problem standardized analysis pressure of cold start can be relieved by utilizing the high-frequency dependency subgraphs.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all carried out on the premise of obtaining the consent of the user, and all accord with the regulation of related laws and regulations without violating the good custom of the public order.
In order to implement the above embodiments, the present disclosure also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the question-answer processing method as described above.
In order to achieve the above embodiments, the present disclosure also proposes a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the question-answering processing method as described above.
In order to implement the above embodiments, the present disclosure also proposes a computer program product comprising a computer program which, when executed by a processor, implements the question-answering processing method as described above.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 12 shows a schematic block diagram of an example electronic device 1200, which can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the apparatus 1200 includes a computing unit 1201 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the device 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.
Various components in the device 1200 are connected to the I/O interface 1205 including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 1201 executes the respective methods and processes described above, such as the question and answer processing method. For example, in some embodiments, the question-answering processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the question-answering processing method described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured to perform the question-answering processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be noted that artificial intelligence is a subject for studying a computer to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and includes both hardware and software technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.