CN114969242B

CN114969242B - Method and device for automatically completing query content

Info

Publication number: CN114969242B
Application number: CN202210675071.6A
Authority: CN
Inventors: 田有朋; 李俊; 黄亚东; 王小卫
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2025-04-08
Anticipated expiration: 2042-01-19
Also published as: CN114969242A; CN114090722B; CN114090722A

Abstract

The embodiment of the specification provides a method and a device for automatically completing query contents, wherein in the method for automatically completing the query contents, natural language query contents aiming at target data, which are currently input by a user, are acquired. And cutting the natural language query content to obtain a plurality of query words. And querying a plurality of dictionary trees corresponding to different entity categories by taking a plurality of query words as current query words to acquire candidate words corresponding to a plurality of entity categories of each query word, wherein the dictionary trees are pre-constructed according to data query aiming at target data. And selecting each target candidate word from the candidate words at least based on the entity category corresponding to each candidate word of each query word. And determining each target candidate word as the complement content of the natural language query content.

Description

Method and device for automatically completing query content

The application relates to a split application of an application patent application with the application number 202210058334.9 of 202210058334.9 and the name of a method and a device for automatically completing query contents, which is filed in 2022, 01 and 19 days.

Technical Field

One or more embodiments of the present disclosure relate to the field of data analysis, and in particular, to a method and apparatus for automatic completion of query content.

Background

Natural language queries (natural language query, NLQ) refer to query analysis of data using natural language. The data here may be stored in a database, an Excel table, or a search engine.

When a user uses natural language query data, in order to improve the input efficiency of the user, the user is generally intelligently prompted with the content which the user may want to input later, namely, the natural language query content of the user is complemented when the user inputs part of the content.

Traditional completion methods usually use sentences as granularity completion, i.e. the prompt content is usually a whole sentence. However, when a user has entered a portion of content, it is often desirable to be able to prompt words related to the user's natural language query content, rather than an irrelevant sentence. Accordingly, it is desirable to provide a completion scheme that enables more accurate completion of the user's natural language query content.

Disclosure of Invention

One or more embodiments of the present disclosure describe a method and an apparatus for automatically completing query content, which may perform completion with word as granularity, so that accuracy of completing the content may be improved, and user experience may be further improved.

In a first aspect, a method for automatically completing query content is provided, including:

acquiring natural language query content aiming at target data, which is currently input by a user;

Segmenting the natural language query content to obtain a plurality of query words;

Inquiring a plurality of dictionary trees corresponding to different entity categories by taking the plurality of inquiry words as current inquiry words to obtain candidate words corresponding to a plurality of entity categories of each inquiry word;

selecting each target candidate word from the candidate words at least based on the entity category corresponding to each candidate word of each query word;

And determining the complement content of the natural language query content according to the target candidate words.

In a second aspect, an apparatus for query content automatic completion is provided, including:

The acquisition unit is used for acquiring natural language query content aiming at target data which is currently input by a user;

The segmentation unit is used for segmenting the natural language query content to obtain a plurality of query words;

The query unit is used for querying a plurality of dictionary trees corresponding to different entity categories by taking the plurality of query words as current query words to obtain candidate words corresponding to a plurality of entity categories of each query word;

A selecting unit, configured to select each target candidate word from the candidate words based at least on an entity class corresponding to each candidate word of each query word;

And the determining unit is used for determining the complement content of the natural language query content according to the target candidate words.

In a third aspect, there is provided a computer storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

In a fourth aspect, there is provided a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method of the first aspect.

According to the method and the device for automatically completing the query content, for each query word obtained based on the natural language query content, candidate words of the query word with corresponding entity categories are obtained through a query dictionary tree. And screening each candidate word based on the corresponding entity category to obtain the candidate word serving as the complement content. That is, the present scheme can acquire candidate words as complement content based on entity categories. Because the entity categories are provided with the conventional combination mode, the method can solve the problem that the complement content is not related to the natural language query content by selecting the candidate words based on the entity categories. In addition, the candidate words are used as the complement contents, namely, the natural language query contents can be complemented with word granularity, so that the accuracy of the complement contents can be improved, and the user experience can be further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present description, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation scenario disclosed in one embodiment of the present disclosure;

FIG. 2 illustrates a method flow diagram for query content auto-completion, according to one embodiment;

FIG. 3a illustrates a prefix tree schematic according to one embodiment;

FIG. 3b illustrates a suffix tree schematic diagram according to an embodiment;

FIG. 4a shows a state machine schematic according to one embodiment;

FIG. 4b shows a state machine schematic according to another embodiment;

FIG. 5 illustrates an apparatus for automatic completion of query content, according to one embodiment.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

In the field of data analysis, data queries, i.e., reading data from databases, excel tables, or search engines, are typically involved.

Conventionally, data queries are typically performed based on a particular query language, such as reading data from a database based on SQL statements, which, however, increases the usage threshold of the data. For this purpose, the following two improvements are proposed:

First, natural language processing (Natural Language Processing, NLP) based methods, however, do not guarantee that the data read is completely accurate, i.e., the method is probabilistic accurate.

Second, a seq2SQL based method reads data by translating natural language directly into SQL statements. However, the method only supports about 80% of accuracy under single-table single-layer aggregation, and cannot support various complex data analysis requirements in real scenes inside enterprises. That is, the method has low accuracy and narrow coverage.

Because both schemes have certain defects, further improved schemes directly use natural language to query data. When a user uses natural language query data, in order to improve the input efficiency of the user, the natural language query content of the user needs to be complemented.

Currently, the completion method used in the search engine is usually to complete the sentence with granularity, that is, the whole sentence of the prompt content is usually. However, when a user has entered a portion of content, it is often desirable to be able to prompt words related to the user's natural language query content, rather than an irrelevant sentence. Therefore, the inventor of the application proposes to complement the word as granularity, namely to complement the natural language query content of the user with finer granularity, thereby improving the accuracy of the complement content and further improving the user experience.

Fig. 1 is a schematic diagram of an implementation scenario disclosed in one embodiment of the present specification. In fig. 1, first, natural language query contents for target data currently input by a user may be acquired. Then, the natural language query content can be segmented to obtain a plurality of query words W ₁、W₂、…、W_N, wherein N is the number of the query words. And querying a plurality of dictionary trees corresponding to different entity categories by taking a plurality of query words as current query words to obtain candidate words W ₁₁、W₁₂、W₂₁、W₂₂、W₂₃…、W_N1 and W _N2 of each query word, wherein the entity categories corresponding to each candidate word can be C ₂、C₁、C₁、C₂、C₁…、C₂ and C ₂ respectively. Finally, each target candidate word, W ₁₁、W₂₂、W_N1 and W _N2, may be selected from each candidate word based on the entity class to which each candidate word of each query word corresponds. And determining the complement content of the natural language query content according to each target candidate word.

In one example, target candidate words may be selected from candidate words based on a state machine of a regular expression, described in detail below.

The following embodiments of the present specification provide detailed descriptions of the embodiments.

FIG. 2 illustrates a flow diagram of a method of query content auto-completion, according to one embodiment. The method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown in fig. 2, the method may include at least the following steps.

Step 202, acquiring natural language query content aiming at target data, which is currently input by a user.

The entity categories may be divided into two categories, one of which is a public category, and may include at least one of time, operators, units, functions, intents, and the like. The other is a private category that may include at least one of dimensions, dimension values, metrics, and the like. In one example, the private category described above may be determined based on a key value of a key-value pair.

In one example, the above-mentioned entity words corresponding to time may be, for example, "XX year", "XX month", "XX day", "last N day", "last days", "last N year", "last year", and "today", etc. The entity words corresponding to the operators may be, for example, "greater than," "less than," "equal to," "greater than," and "above," etc. The entity words corresponding to the units may be, for example, "years", "several", and "several people", etc. The entity words corresponding to the functions may be, for example, "maximum", "minimum", and "average", etc. The entity words corresponding to the dimension can be, for example, a "city", "sales amount", a "class", etc., and the dimension value is a value of the dimension, for example, the dimension value corresponding to the dimension "city" can be, for example, a "Beijing" or a "Shanghai", etc.

Specifically, the natural language query content currently input by the user can be acquired based on the position of the cursor. For example, the entire content of the position of the cursor in the input box is used as the natural language query content.

And 204, segmenting the natural language query content to obtain a plurality of query words.

In one example, prior to slicing the natural language query, entity recognition may be performed to obtain a base entity class of the natural language query.

For example, assuming that the content of the natural language query currently input by the user is "pay for each city yesterday", the basic entity category, time and dimension, can be obtained through entity identification. The word corresponding to time is yesterday, and the word corresponding to dimension is city.

After entity identification, the method can be used for carrying out segmentation on yesterday city payment to obtain query words such as yesterday, city payment, payment and the like.

Step 206, using the query words as current query words, querying dictionary trees corresponding to different entity categories to obtain candidate words corresponding to multiple entity categories.

The plurality of dictionary trees may be pre-constructed from data queries for the target data. The data queries herein are also referred to as history queries, and the corresponding history natural language query content may include entity words corresponding to the common category and/or entity words corresponding to the private category.

Taking the first dictionary tree corresponding to the first entity class (any one of the public classes or any one of the private classes) of the plurality of dictionary trees as an example, the first dictionary tree may include a plurality of branches, wherein each branch represents one entity word corresponding to the first entity class in the historical natural language query content. In addition, each inter-node path in each branch corresponds to at least part of the words in the represented entity words, respectively, and the value of the leaf node is the represented entity word. The value of the branch node is a combination word of each word corresponding to each inter-node path from the root node to the branch node.

The query process for the first dictionary tree specifically includes sequentially matching a current query word with each branch in the first dictionary tree, and if a word covered by any first branch contains the current query word, taking a value of a leaf node of the first branch as a candidate word of a first entity class of the current query word.

Taking a certain branch in the first dictionary tree as an example, the word-by-word matching specifically may include word-by-word matching each word in the current query word with each word corresponding to the path between the nodes in the branch. If each word corresponding to the path between each node in the branch contains each word in the current query word, the matching is determined to be successful, otherwise, the matching is failed.

In one example, the first lexicon tree may include a prefix tree and a suffix tree. Wherein the prefix tree may be constructed based on at least some words from the beginning of each entity word corresponding to the first entity category in the historical natural language query content. The suffix tree may be constructed based on at least some words from the historical natural language query content that are ending in each entity word corresponding to the first entity category.

Fig. 3a shows a prefix tree schematic according to an embodiment. In fig. 3a, the prefix tree may include a plurality of branches, wherein the entity word represented by the leftmost branch is "payment amount", and the corresponding entity category is dimension. In addition, the words corresponding to the paths between nodes in the branch are "branch", "payment", "gold" and "amount" (i.e. the paths between nodes in the branch correspond to all words in the represented entity word) respectively, and the value of the leaf node is "payment amount". The values of the 3 branch nodes are respectively 'branch', 'payment', 'pay gold'. Similarly, the entity word represented by the branch on the secondary left side is transaction number, and the corresponding entity category is dimension. In addition, the words corresponding to the paths among the nodes in the branch are respectively 'transaction', 'easy', 'pen' and 'number', and the value of the leaf node is 'transaction pen number'. The values of the 3 branch nodes are respectively "trade", "trade" and "trade pen".

It can be seen that the entity classes of the entity words represented by the branches in fig. 3a are the same, i.e. are all entity classes of the prefix tree.

Fig. 3b shows a suffix tree schematic according to an embodiment. In FIG. 3b, the suffix tree may include a plurality of branches, wherein the leftmost branch represents an entity term of "payment," and the corresponding entity class is dimension. In addition, the words corresponding to the paths between nodes in the branch are respectively "paid", "gold" and "amount" (i.e., the paths between nodes in the branch respectively correspond to part of the words in the represented entity word), and the value of the leaf node is "payment amount". The values of the two branch nodes are respectively 'payment' and 'payment'. Similarly, the entity word represented by the branch on the secondary left side is "payment amount", and the corresponding entity category is dimension. In addition, the words corresponding to the paths among the nodes in the branch are gold and quota respectively, and the value of the leaf node is payment amount. The value of 1 branch node is gold.

It can be seen that the entity classes of the entity words represented by the branches in fig. 3b are the same, i.e. are all entity classes of the suffix tree. The entity categories of the prefix tree and the suffix tree are also the same. Similarly, the several lexicon trees described above may also include prefix trees and suffix trees corresponding to other entity categories.

When the first dictionary tree includes a prefix tree and a suffix tree, the query process for the first dictionary tree may specifically include querying the prefix tree with a current query word as a prefix word to obtain a first entity word of a first entity class of the current query word, and querying the suffix tree with the current query word as a suffix word to obtain a second entity word of the first entity class of the current query word. The first entity word and the second entity word constitute respective candidate words of the first entity class of the current query word.

The above query process for the prefix tree and the suffix tree is similar, and the detailed query process may refer to the above description of the query process for the first dictionary tree, only by replacing the first dictionary tree with the prefix tree or the suffix tree.

Taking the prefix tree shown in fig. 3a as an example, if the current query term is "payment", the candidate term obtained by the query may be "payment amount" and the corresponding entity class is dimension. Taking the suffix tree shown in fig. 3b as an example, if the current query term is: "amount", the candidate term obtained by the query may be: "payment amount", and the corresponding entity class is dimension.

In the foregoing example where the natural language query content is "yesterday city payment", the obtained candidate words may be, for example, "payment amount", "payment count", "payment number", and "payment date", etc.

Furthermore, in practical applications, there may be some words that may correspond to multiple entity categories at the same time, e.g., the word "Beijing" may correspond to both entity category: dimension and entity category: dimension value. For such words we are often referred to as confusing words.

When each candidate word of each query word includes an confusion word, a plurality of entity categories corresponding to the confusion word can be displayed to a user, and then the final entity category of the confusion word is determined according to a selection instruction of the user.

Step 208, selecting each target candidate word from the candidate words at least based on the entity class corresponding to each candidate word of each query word.

Specifically, the duplicate removal process may be performed on each candidate word, and then, for any first candidate word in each candidate word after the duplicate removal process, an entity class sequence is formed based on the basic entity class and the target entity class of the first candidate word. And checking the entity class sequence by using the regular expression, and if the checking is passed, taking the first candidate word as a target candidate word.

Taking the foregoing natural language query content as "yesterday cities" as an example, as described above, the basic entity category obtained by identifying the entity is time and dimension, and assuming that the first candidate word is "payment amount" and the corresponding entity category is dimension, the formed entity category sequence may be { time, dimension }.

In addition, the regular expression (Regular Expression) is a pattern for describing a set of string features to match a particular string. The mode description is carried out through special characters and common characters, so that the aim of text matching is fulfilled.

The special characters may include, but are not limited to "\", "", "x" and "{ }", and the common characters may be each english character representing each entity class.

In one example, the verifying the entity class sequence by using the regular expression may include inputting the entity class sequence into a state machine corresponding to the regular expression and performing state migration, where the state migration includes comparing a current entity class in the entity class sequence with a labeled entity class corresponding to a migration edge of the current state, migrating to a next state if the current entity class is consistent with the labeled entity class, updating the current entity class, and ending if the current entity class is not consistent with the labeled entity class. After the state transition is finished, if the state of the state machine is a matching state, checking passing, otherwise, checking failing.

FIG. 4a illustrates a state machine schematic according to one embodiment. In fig. 4a, the state machine may be based on a regular expression "a (bb) +a" conversion, where a and b here represent two different entity classes, respectively. S ₀-S₄ in fig. 4a are the 5 states of the state machine, respectively, and S ₄ is the matching state. Further, the one-way arrow from each state represents the transition edge of that state, and the characters above or below the one-way arrow represent the nominal entity class of the corresponding transition edge. For example, the class of calibration entities for the transition edge of state S ₀ is "a".

The above state transition procedure is described below with reference to fig. 4 a.

Assuming that the entity class sequence (hereinafter referred to as sequence) is abbbba, the 1 st a in the sequence is first taken as the current entity class, the state S ₀ is taken as the current state, and the 1 st a is matched with the labeled entity class of the migration edge of the state S ₀, so that the next state S ₁ is migrated, namely, the state S ₁ is taken as the updated current state, the 1 st b in the sequence is taken as the updated current entity class, and then the 1 st b is matched with the labeled entity class of the migration edge of the state S ₁, namely, the "b" is carried out until the migration end condition is met. The migration end condition of this includes, but is not limited to, a failure of a match or completion of a match for each entity class in the sequence.

In this example, after each entity class in the sequence is matched, state S ₄ may be reached, so that the sequence check passes.

It should be understood that fig. 4a is only an exemplary illustration, and in practical applications, the transition edge of the state may be multiple. For example, the state machine described in the embodiments of the present disclosure may also be as shown in fig. 4 b.

In the example that the natural language query content is "yesterday city payment", the selected target candidate words may be, for example, "payment amount", "payment number" and "payment number".

It should be noted that the regular expressions described in this specification may be written based on conventional combinations among entity categories. Therefore, the target candidate words screened based on the regular expression have stronger relevance with the natural language query content of the user, so that the problem that the complement content is not related to the natural language query content can be solved, and the computing resource can be saved.

Step 210, determining the complement content of the natural language query content according to each target candidate word.

In one example, the target candidate words may be ranked first according to a ranking algorithm. And then determining each sorted target candidate word as the complement content of the natural language query content.

The sorting algorithm can comprise any one of a longest matching algorithm, a state priority algorithm, a dictionary base algorithm, a word combination heat algorithm, a custom priority algorithm and a word use frequency algorithm.

In addition, it should be noted that the complement of the present embodiment may change with the movement of the cursor. For example, when the position of the cursor is detected to be located at the middle position of the natural language query content, the content of the natural language query content cut to the middle position is taken as updated natural language query content. And complementing the updated natural language query content. Therefore, the complement method of the scheme is more flexible.

The method for complementing the updated natural language query content may also be implemented through steps 202-210, which is not repeated herein.

For example, assuming that the natural language query content currently input by the user is "pay pens for each city," the natural language query content may be first completed. Then, when the cursor moves between "pay" and "pen", the "pay for each city" is completed.

In view of the above, the method for automatically completing query content provided in the embodiments of the present disclosure may obtain candidate words as completed content based on entity categories. Because the entity categories are provided with the conventional combination mode, the method can solve the problem that the complement content is not related to the natural language query content by selecting the candidate words based on the entity categories. In addition, the candidate words are used as the complement contents, namely, the natural language query contents can be complemented with word granularity, so that the accuracy of the complement contents can be improved, and the user experience can be further improved.

Corresponding to the above method for automatically completing query contents, an embodiment of the present disclosure further provides an apparatus for automatically completing query contents, as shown in fig. 5, where the apparatus may include:

The acquiring unit 502 is configured to acquire natural language query content for target data currently input by a user.

And the segmentation unit 504 is configured to segment the natural language query content to obtain a plurality of query terms.

A query unit 506, configured to query a plurality of dictionary trees corresponding to different entity categories with a plurality of query words as current query words, to obtain candidate words corresponding to a plurality of entity categories for each query word, where the plurality of dictionary trees are pre-constructed according to a data query for target data.

Wherein the entity category comprises at least one of time, operator, unit, function, intention, dimension value, metric and the like.

Optionally, the plurality of dictionary trees includes a first dictionary tree corresponding to a first entity class, the first dictionary tree including a prefix tree and a suffix tree. The prefix tree is constructed based on at least part of words from the beginning of each entity word of the first entity class, and the suffix tree is constructed based on at least part of words from the end of each entity word of the first entity class;

The query unit 506 is specifically configured to:

inquiring a prefix tree by taking the current query word as a prefix word to obtain a first entity word of a first entity class of the current query word, and inquiring a suffix tree by taking the current query word as a suffix word to obtain a second entity word of the first entity class of the current query word;

the first entity word and the second entity word form candidate words of the first entity class of the current query word.

Optionally, the plurality of dictionary trees include a first dictionary tree corresponding to a first entity category, the first dictionary tree includes a plurality of branches, each inter-node path in each branch corresponds to at least part of characters in the represented entity word, and the value of a leaf node is the represented entity word;

The query unit 506 is specifically configured to:

And sequentially matching the current query word with each branch in the first dictionary tree, and taking the value of the leaf node of the first branch as a candidate word of the first entity class of the current query word if the word covered by any first branch contains the current query word.

And a selecting unit 508, configured to select each target candidate word from the candidate words at least based on the entity class corresponding to each candidate word of each query word.

A determining unit 510, configured to determine, according to each target candidate word, the complement content of the natural language query content.

Optionally, the apparatus further comprises:

the identifying unit 512 is configured to perform entity identification on the natural language query content, so as to obtain a corresponding basic entity category.

The selection unit 508 includes:

A forming module 5082 is configured to form, for any first candidate word of the candidate words, an entity class sequence based on the base entity class and the target entity class of the first candidate word.

And the verification module 5084 is configured to verify the entity class sequence by using the regular expression, and if the verification passes, use the first candidate word as a target candidate word.

The verification module 5084 specifically is configured to:

Inputting an entity class sequence into a state machine corresponding to the regular expression, and performing state migration, wherein the state migration comprises the steps of comparing a current entity class in the entity class sequence with a labeling entity class corresponding to a migration edge of the current state, and migrating to the next state and updating the current entity class if the current entity class is consistent with the labeling entity class;

after the state transition is finished, if the state of the state machine is a matching state, checking passing, otherwise, checking failing.

Optionally, the apparatus further comprises:

a ranking unit 514, configured to rank the target candidate words according to a ranking algorithm.

The determining unit 510 is specifically configured to:

Determining each sorted target candidate word as the complement content of the natural language query content;

The sorting algorithm comprises any one of a longest matching algorithm, a state priority algorithm, a dictionary base algorithm, a phrase heat algorithm, a custom priority algorithm and a word use frequency algorithm.

Optionally, the apparatus further comprises a complementing unit 516;

the obtaining unit 502 is further configured to, when it is detected that the position of the cursor is located at a middle position of the natural language query content, use content in the natural language query content up to the middle position as updated natural language query content;

and a complementing unit 516, configured to complement the updated natural language query content.

The functions of the functional modules of the apparatus in the foregoing embodiments of the present disclosure may be implemented by the steps of the foregoing method embodiments, so that the specific working process of the apparatus provided in one embodiment of the present disclosure is not repeated herein.

The device for automatically completing the query content provided by the embodiment of the specification can provide accuracy of completing the content.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method described in connection with fig. 2.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a server. The processor and the storage medium may reside as discrete components in a server.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The foregoing detailed description of the embodiments has further described the objects, technical solutions and advantages of the present specification, and it should be understood that the foregoing description is only a detailed description of the embodiments of the present specification, and is not intended to limit the scope of the present specification, but any modifications, equivalents, improvements, etc. made on the basis of the technical solutions of the present specification should be included in the scope of the present specification.

Claims

1. A method for automatically completing query content, comprising:

Obtain the natural language query content currently input by the user for the target data;

Segmenting the natural language query content to obtain a number of query words;

The plurality of query words are respectively used as current query words, and multiple dictionary trees corresponding to multiple entity categories are queried for the current query words to obtain candidate words corresponding to the multiple entity categories for the current query words; the multiple dictionary trees are pre-built according to historical natural language queries for the target data;

At least based on the entity categories corresponding to the candidate words of the query words, select the target candidate words from the candidate words;

According to the target candidate words, the completion content of the natural language query content is determined, and the completion content is used to prompt the user for content that may be desired to input later.

2. The method according to claim 1, wherein the plurality of dictionary trees include a first dictionary tree corresponding to a first entity category; the first dictionary tree includes a prefix tree and a suffix tree; the prefix tree is constructed based on at least part of the characters of each entity word of the first entity category starting from the beginning; the suffix tree is constructed based on at least part of the characters of each entity word of the first entity category ending at the end;

The query corresponds to a plurality of dictionary trees of a plurality of entity categories, respectively, including:

Taking the current query word as a prefix word, querying the prefix tree, and obtaining a first entity word of a first entity category of the current query word; and taking the current query word as a suffix word, querying the suffix tree, and obtaining a second entity word of the first entity category of the current query word;

The first entity word and the second entity word constitute candidate words of the first entity category of the current query word.

3. The method according to claim 1, wherein the plurality of dictionary trees include a first dictionary tree corresponding to a first entity category, the first dictionary tree includes a plurality of branches, each node path in each branch corresponds to at least part of the characters in the represented entity word, and the value of the leaf node is the represented entity word;

The current query word is matched with each branch in the first dictionary tree word by word in turn. If the word covered by any first branch contains the current query word, the value of the leaf node of the first branch is used as a candidate word of the first entity category of the current query word.

4. The method according to claim 1, wherein before segmenting the natural language query content, it further comprises:

Performing entity recognition on the natural language query content to obtain corresponding basic entity categories;

The step of selecting a target candidate word from the candidate words includes:

For any first candidate word among the candidate words, forming an entity category sequence based on the basic entity category and the target entity category of the first candidate word;

The entity category sequence is verified using a regular expression, and if the verification passes, the first candidate word is used as a target candidate word.

5. The method according to claim 4, wherein the verifying the entity category sequence comprises:

Input the entity category sequence into the state machine corresponding to the regular expression, and perform state migration; the state migration includes: comparing the current entity category in the entity category sequence with the labeled entity category corresponding to the migration edge of the current state, if they are consistent, migrate to the next state and update the current entity category; otherwise end;

After the state migration is completed, if the state of the state machine is a matching state, the verification passes, otherwise the verification fails.

6. The method according to claim 1, wherein, before determining each target candidate word as a completion content of the natural language query content, the method further comprises:

Sorting the target candidate words according to a sorting algorithm;

Determine the sorted target candidate words as the completion content of the natural language query content;

The sorting algorithm includes any one of the following: longest match algorithm, state priority algorithm, dictionary cardinality algorithm, word combination heat algorithm, custom priority algorithm and word usage frequency algorithm.

7. The method according to claim 1, further comprising:

When it is detected that the cursor is located at the middle position of the natural language query content, the content up to the middle position in the natural language query content is used as the updated natural language query content;

The updated natural language query content is completed.

8. The method of claim 1, wherein the entity category comprises at least one of: time, operator, unit, function, intent, dimension, dimension value, and measure.

9. A device for automatically completing query content, comprising:

An acquisition unit, used to acquire natural language query content currently input by the user for target data;

A segmentation unit, used to segment the natural language query content to obtain a number of query words;

A query unit, configured to use the plurality of query words as current query words respectively, and query a plurality of dictionary trees corresponding to a plurality of entity categories respectively for the current query words, so as to obtain candidate words corresponding to the plurality of entity categories for the current query words; the plurality of dictionary trees are pre-constructed according to historical natural language queries for the target data;

A selection unit, configured to select target candidate words from the candidate words based at least on the entity categories corresponding to the candidate words of the query words;

The determination unit is used to determine the completion content of the natural language query content according to the target candidate words, and the completion content is used to prompt the user for the content that he may want to input later.

10. The apparatus according to claim 9, wherein the plurality of dictionary trees include a first dictionary tree corresponding to a first entity category; the first dictionary tree includes a prefix tree and a suffix tree; the prefix tree is constructed based on at least part of the characters of each entity word of the first entity category starting from the beginning; the suffix tree is constructed based on at least part of the characters of each entity word of the first entity category ending at the end;

The query unit is specifically used for:

11. The device according to claim 9, wherein the plurality of dictionary trees include a first dictionary tree corresponding to a first entity category, the first dictionary tree includes a plurality of branches, each node path in each branch corresponds to at least part of the characters in the represented entity word, and the value of the leaf node is the represented entity word;

The query unit is specifically used for:

12. The apparatus according to claim 9, further comprising:

An identification unit, configured to perform entity identification on the natural language query content to obtain a corresponding basic entity category;

The selection unit comprises:

A forming module, configured to form an entity category sequence for any first candidate word among the candidate words based on the basic entity category and the target entity category of the first candidate word;

The verification module is used to verify the entity category sequence using a regular expression, and if the verification passes, the first candidate word is used as a target candidate word.

13. The device according to claim 12, wherein the verification module is specifically used for:

14. The apparatus according to claim 9, further comprising:

A sorting unit, used to sort the target candidate words according to a sorting algorithm;

The determining unit is specifically used for:

15. The apparatus according to claim 9, further comprising: a completion unit;

The acquisition unit is further configured to, when detecting that the cursor is located at a middle position of the natural language query content, use the content of the natural language query content up to the middle position as the updated natural language query content;

The completion unit is used to complete the updated natural language query content.

16. The apparatus of claim 9, wherein the entity category comprises at least one of: time, operator, unit, function, intent, dimension, dimension value, and measure.

17. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed in a computer, the computer is caused to execute the method according to any one of claims 1 to 8.

18. A computing device comprising a memory and a processor, wherein the memory stores executable codes, and when the processor executes the executable codes, the method according to any one of claims 1 to 8 is implemented.