CN117874065A

CN117874065A - Data acquisition method and device based on service database

Info

Publication number: CN117874065A
Application number: CN202311782678.5A
Authority: CN
Inventors: 叶炜康; 杨自闯; 鲁学昆; 张荣涛; 郑小锋; 具学圆; 韦立雷
Original assignee: Beijing Washi Intelligent Technology Co ltd
Current assignee: Beijing Washi Intelligent Technology Co ltd
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-04-12

Abstract

The present invention discloses a data acquisition method and device based on a business database. The method comprises: performing slot initialization and question initialization based on the business database to obtain a slot dictionary and a standard question list; performing entity search on the obtained query sentence according to the slot dictionary to obtain a preprocessed query sentence; determining the slot field and the corresponding slot in the preprocessed query sentence according to the slot dictionary; performing normalization processing on the slot field in the preprocessed query sentence according to the slot of the preprocessed query sentence to obtain a normalized query sentence; performing keyword recall and semantic recall on the normalized query sentence according to the standard question list, and obtaining an optimal intent according to the results of the keyword recall and the results of the semantic recall; obtaining a corresponding configuration SQL template based on the optimal intent; generating a query statement according to the configuration SQL template and the slot field of the preprocessed query sentence, executing the query statement, and obtaining query data.

Description

A data acquisition method and device based on business database

技术领域Technical Field

本发明涉及一种基于业务数据库的数据获取方法及装置。The invention relates to a data acquisition method and device based on a business database.

背景技术Background technique

随着信息技术的迅速发展，企业和组织的数据积累日益增多，数据库中蕴含着海量的重要信息。为了从这些数据中获取有用的洞见和答案，业务数据库问答系统应运而生。这些系统允许用户以自然语言提出问题，而不是直接编写复杂的SQL查询语句，因此它们在业务决策、数据分析和报告生成等领域扮演着至关重要的角色。With the rapid development of information technology, enterprises and organizations are accumulating more and more data, and databases contain a huge amount of important information. In order to obtain useful insights and answers from this data, business database question answering systems have emerged. These systems allow users to ask questions in natural language instead of directly writing complex SQL query statements, so they play a vital role in business decision-making, data analysis, and report generation.

传统的数据库问答系统在数据检索方面提供了便利，基于精确的关键词匹配，在处理结构化数据方面提供了便利。Traditional database question-answering systems provide convenience in data retrieval and, based on precise keyword matching, in processing structured data.

发明内容Summary of the invention

为了基于自然语言获得更准确的查询数据，本发明实施例提供了一种基于业务数据库的数据获取方法及装置。In order to obtain more accurate query data based on natural language, an embodiment of the present invention provides a data acquisition method and device based on a business database.

第一方面，本发明实施例提供一种基于业务数据库的数据获取方法，该方法包括：In a first aspect, an embodiment of the present invention provides a data acquisition method based on a business database, the method comprising:

基于业务数据库进行槽位初始化和问题初始化，得到槽位字典和规范问题列表；所述槽位字典包括多个槽位与每一槽位对应的多个槽位字段；Slot initialization and question initialization are performed based on the business database to obtain a slot dictionary and a standard question list; the slot dictionary includes multiple slots and multiple slot fields corresponding to each slot;

根据所述槽位字典对获取的查询问句进行实体查找，得到预处理后查询问句；Performing entity search on the acquired query sentence according to the slot dictionary to obtain a preprocessed query sentence;

根据所述槽位字典确定所述预处理后查询问句中的槽位字段与对应的槽位；Determine the slot field and the corresponding slot in the preprocessed query sentence according to the slot dictionary;

根据所述预处理后查询问句的槽位将所述预处理后查询问句中的槽位字段进行规范化处理，得到规范化后的查询问句；Normalizing the slot fields in the preprocessed query sentence according to the slots of the preprocessed query sentence to obtain a normalized query sentence;

根据所述规范问题列表，将所述规范化的查询问句进行关键字召回和语义召回，并根据所述关键字召回的结果和语义召回的结果，得到最优意图；According to the standard question list, the standardized query sentence is subjected to keyword recall and semantic recall, and the optimal intent is obtained according to the results of the keyword recall and the results of the semantic recall;

基于所述最优意图获取对应的配置SQL模板；Acquire a corresponding configuration SQL template based on the optimal intent;

根据所述配置SQL模板和所述预处理后查询问句的槽位字段生成查询语句，并在所述业务数据库中执行所述查询语句，得到查询数据。A query statement is generated according to the configured SQL template and the slot field of the pre-processed query statement, and the query statement is executed in the business database to obtain query data.

本申请实施例的一个或一些可选的实施方式中，所述基于业务数据库进行槽位初始化和问题初始化，得到槽位字典和规范问题列表，包括：In one or some optional implementations of the embodiment of the present application, the slot initialization and question initialization based on the business database to obtain the slot dictionary and the standard question list include:

获取所述业务数据库中的SQL类槽位和枚举类槽位；Obtaining SQL class slots and enumeration class slots in the business database;

从所述业务数据库中获取每一SQL类槽位对应的所有字段，组成SQL类槽位字典；Obtain all fields corresponding to each SQL class slot from the business database to form an SQL class slot dictionary;

从所述业务数据库中获取每一枚举类槽位对应的所有字段，组成枚举槽位字典；Obtain all fields corresponding to each enumeration slot from the business database to form an enumeration slot dictionary;

将所述SQL类槽位字典与所述枚举槽位字典组合得到槽位字典；Combining the SQL class slot dictionary with the enumeration slot dictionary to obtain a slot dictionary;

基于所述业务数据库进行问题初始化得到规范问题列表。Question initialization is performed based on the business database to obtain a standard question list.

本申请实施例的一个或一些可选的实施方式中，所述基于所述业务数据库得到规范问题列表，包括：In one or some optional implementations of the embodiment of the present application, obtaining a list of standard questions based on the business database includes:

基于所述业务数据库获取标准问题列表；Obtaining a list of standard questions based on the business database;

将所述标准问题中的SQL类槽位改为与所述业务数据库无关的字段；Changing the SQL-type slots in the standard questions to fields that are not related to the business database;

将所述标准问题中的枚举槽位改为该枚举槽位对应的所有取值中的任意一项，得到规范问题列表。The enumeration slot in the standard question is changed to any one of all values corresponding to the enumeration slot to obtain a standard question list.

本申请实施例的一个或一些可选的实施方式中，所述根据所述预处理后查询问句的槽位将所述预处理后查询问句中的槽位字段进行规范化处理，得到规范化后的查询问句，包括：In one or some optional implementations of the embodiments of the present application, the step of normalizing the slot field in the preprocessed query sentence according to the slot of the preprocessed query sentence to obtain a normalized query sentence includes:

将所述预处理后查询问句的槽位字段属于SQL类槽位字典的替换为对应的槽位，得到替换后的查询问句；The slot fields of the preprocessed query sentence belonging to the SQL slot dictionary are replaced with corresponding slots to obtain a replaced query sentence;

将所述替换后的查询问句中的槽位改为所述业务数据库无关的字段。The slots in the replaced query sentence are changed to fields that are irrelevant to the business database.

本申请实施例的一个或一些可选的实施方式中，所述根据所述槽位字典对获取的查询问句进行拼音纠错和实体查找，得到预处理后查询问句，包括：In one or some optional implementations of the embodiment of the present application, performing pinyin correction and entity search on the acquired query sentence according to the slot dictionary to obtain the preprocessed query sentence includes:

基于所述槽位字典对所述查询问句进行拼音纠错，得到纠错后的查询问句；Performing pinyin correction on the query sentence based on the slot dictionary to obtain a corrected query sentence;

根据停用词将所述纠错后的查询问句切分，得到多个子句；Segmenting the corrected query sentence according to stop words to obtain multiple clauses;

对每一子句进行分词，得到单词列表，并生成N元组列表；Segment each clause to get a word list and generate an N-tuple list;

统一所述N元组列表的数字和英文大小写规范；Unify the upper and lower case specifications of numbers and English letters in the N-tuple list;

根据所述槽位字典和所述N元组列表，得到每一N元组的多个候选字段以及对应的相似度得分；According to the slot dictionary and the N-tuple list, a plurality of candidate fields and corresponding similarity scores of each N-tuple are obtained;

将相似度得分最高的候选字段作为对应N元组的最优候选字段，得到所述N元组列表中每一项的最优候选字段；Taking the candidate field with the highest similarity score as the optimal candidate field of the corresponding N-tuple, and obtaining the optimal candidate field of each item in the N-tuple list;

将所述N元组列表中重叠的最优候选字段去除，得到所述子句的最终结果；Removing overlapping optimal candidate fields from the N-tuple list to obtain a final result of the clause;

将所述纠错后的查询问句中的子句替换为对应的最终结果，得到预处理后查询问句。The clauses in the error-corrected query sentence are replaced with the corresponding final results to obtain the preprocessed query sentence.

本申请实施例的一个或一些可选的实施方式中，所述根据所述规范问题列表，将所述规范化的查询问句进行关键字召回和语义召回，并根据所述关键字召回的结果和语义召回的结果，得到最优意图，包括：In one or some optional implementations of the embodiments of the present application, the standardized query sentence is subjected to keyword recall and semantic recall according to the standardized question list, and the optimal intent is obtained according to the results of the keyword recall and the results of the semantic recall, including:

在所述规范问题列表中对所述规范化的查询问句进行关键词检索，得到检索到的多个相似度得分从大到小依次排列的查询问句作为所述关键字召回的结果；Performing keyword search on the standardized query sentence in the standardized question list, and obtaining a plurality of retrieved query sentences arranged in descending order of similarity scores as the keyword recall result;

将所述规范问题列表提取为规范问题语义列表；Extracting the canonical question list into a canonical question semantic list;

在所述规范问题语义列表中对所述规范化的查询问句进行语义检索，得到检索到的多个相似度得分从大到小依次排列的查询问句作为所述语义召回的结果；所述关键词召回的结果和所述语义召回的结果分别包括多个查询问句和对应的相似度得分；Performing semantic retrieval on the standardized query in the standardized question semantic list, obtaining multiple query retrieved sentences arranged in descending order of similarity scores as the result of the semantic recall; the result of the keyword recall and the result of the semantic recall respectively include multiple query sentences and corresponding similarity scores;

计算所述关键词召回的结果与所述语义召回的结果中所有相同查询问句的相似度得分的平均值，并将所述平均值作为对应查询问句的相似度得分；Calculating an average of similarity scores of all identical query sentences in the keyword recall result and the semantic recall result, and using the average as the similarity score of the corresponding query sentence;

选择相似度得分最高的查询问句作为候选意图；Select the query with the highest similarity score as the candidate intent;

计算所述候选意图与所述规范化的查询问句的编辑距离，判断所述编辑距离是否小于预设阈值：Calculate the edit distance between the candidate intent and the normalized query sentence, and determine whether the edit distance is less than a preset threshold:

若是，将所述候选意图作为最优意图；If so, taking the candidate intention as the optimal intention;

若否，返回表示无法匹配的错误信息。If not, an error message indicating that a match could not be made is returned.

本申请实施例的一个或一些可选的实施方式中，在返回表示无法匹配的错误信息之前，还包括：In one or some optional implementations of the embodiment of the present application, before returning an error message indicating that the match cannot be made, the following step is further included:

获取所述查询问句的上一查询问句；Obtaining a previous query sentence of the query sentence;

根据所述槽位字典确定所述上一查询问句中的槽位字段与对应的槽位；Determining a slot field and a corresponding slot in the previous query sentence according to the slot dictionary;

判断所述预处理后查询问句中的槽位是否都存在于所述上一查询问句的槽位中：Determine whether all slots in the preprocessed query sentence exist in the slots of the previous query sentence:

若是，则将所述规范化后的查询问句的槽位字段对应替换到上一查询问句中，将上一查询问句作为更新后的所述预处理后查询问句；If yes, the slot field of the normalized query sentence is replaced correspondingly into the previous query sentence, and the previous query sentence is used as the updated pre-processed query sentence;

若否，返回表示所述查询问句并非追问的信息。If not, information indicating that the query sentence is not a follow-up question is returned.

本申请实施例的一个或一些可选的实施方式中，在根据所述规范问题列表，将所述规范化的查询问句进行关键字召回和语义召回，并将所述关键字召回的结果和语义召回的结果合并，得到最优意图之后，还包括：In one or some optional implementations of the embodiments of the present application, after performing keyword recall and semantic recall on the standardized query sentence according to the standardized question list, and merging the results of the keyword recall and the results of the semantic recall to obtain the optimal intent, the method further includes:

判断所述预处理后查询问句的槽位是否包括了所述最优意图所需的所有槽位：Determine whether the slots of the preprocessed query sentence include all the slots required by the optimal intent:

若否，根据所述预处理后查询问句的槽位和所述最优意图所需的所有槽位确定缺失槽位，并返回补充缺失槽位的请求；If not, determine the missing slots according to the slots of the preprocessed query and all the slots required by the optimal intent, and return a request to supplement the missing slots;

接收对应的缺失槽位的值，得到优化后所述预处理后查询问句的槽位字段。The value of the corresponding missing slot is received to obtain the slot field of the optimized preprocessed query sentence.

第二方面，本发明实施例提供一种基于业务数据库的数据获取装置，该装置包括：In a second aspect, an embodiment of the present invention provides a data acquisition device based on a business database, the device comprising:

配置初始化模块，用于基于业务数据库进行槽位初始化和问题初始化，得到槽位字典和规范问题列表；所述槽位字典包括多个槽位与每一槽位对应的多个槽位字段；A configuration initialization module is configured to perform slot initialization and question initialization based on a business database to obtain a slot dictionary and a standard question list; the slot dictionary includes a plurality of slots and a plurality of slot fields corresponding to each slot;

文本预处理模块，用于根据所述槽位字典对获取的查询问句进行实体查找，得到预处理后查询问句；A text preprocessing module, used to perform entity search on the acquired query sentence according to the slot dictionary to obtain a preprocessed query sentence;

意图识别模块，用于根据所述槽位字典确定所述预处理后查询问句中的槽位字段与对应的槽位；根据所述预处理后查询问句的槽位将所述预处理后查询问句中的槽位字段进行规范化处理，得到规范化后的查询问句；根据所述规范问题列表，将所述规范化的查询问句进行关键字召回和语义召回，并根据所述关键字召回的结果和语义召回的结果，得到最优意图；An intention recognition module is used to determine the slot field and the corresponding slot in the preprocessed query sentence according to the slot dictionary; normalize the slot field in the preprocessed query sentence according to the slot of the preprocessed query sentence to obtain a normalized query sentence; perform keyword recall and semantic recall on the normalized query sentence according to the normalized question list, and obtain the optimal intention according to the results of the keyword recall and the results of the semantic recall;

后台查询模块，用于基于所述最优意图获取对应的配置SQL模板；根据所述配置SQL模板和所述预处理后查询问句的槽位字段生成查询语句，并在所述业务数据库中执行所述查询语句，得到查询数据。The background query module is used to obtain the corresponding configuration SQL template based on the optimal intent; generate a query statement according to the configuration SQL template and the slot field of the preprocessed query question, and execute the query statement in the business database to obtain query data.

第三方面，本发明实施例提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上述的基于业务数据库的数据获取方法。In a third aspect, an embodiment of the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the data acquisition method based on the business database as described above.

第四方面，本发明实施例提供一种计算机设备，包括存储器，处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上述的基于业务数据库的数据获取方法。In a fourth aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the data acquisition method based on the business database as described above is implemented.

本发明实施例提供的上述技术方案的有益效果至少包括：The beneficial effects of the above technical solution provided by the embodiment of the present invention include at least:

本发明实施例提供的基于业务数据库的数据获取方法，相较传统数据库问答系统使用代码提前编写意图与SQL语句的对应关系，本方法以将自然语言转为意图与槽位数据，再根据槽位数据意图对应的配置SQL模板得到SQL语句的方式，只需提前定义与对应的配置SQL模板，若出现新的问答需求，只需添加新的意图与对应的配置SQL模板，不需要反复进行代码编写。相较NL2SQL技术的直接将自然语言转化为SQL语言，能有效保持高准确性、可解释性和性能，为技术人员提供了一种解决不断变化需求的有效途径，降低了开发成本和复杂性，使得业务数据库问答系统能够更好地适应多样的业务场景。The data acquisition method based on the business database provided by the embodiment of the present invention, compared with the traditional database question-and-answer system that uses code to write the correspondence between intent and SQL statements in advance, this method converts natural language into intent and slot data, and then obtains SQL statements according to the configuration SQL template corresponding to the slot data intent. It only needs to define the corresponding configuration SQL template in advance. If new question-and-answer requirements arise, it only needs to add new intents and corresponding configuration SQL templates, and there is no need to repeatedly write codes. Compared with the NL2SQL technology that directly converts natural language into SQL language, it can effectively maintain high accuracy, explainability and performance, and provides technical personnel with an effective way to solve changing needs, reducing development costs and complexity, so that the business database question-and-answer system can better adapt to various business scenarios.

本发明的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present invention will be described in the following description, and partly become apparent from the description, or understood by practicing the present invention. The purpose and other advantages of the present invention can be realized and obtained by the structures particularly pointed out in the written description, claims, and drawings.

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solution of the present invention is further described in detail below through the accompanying drawings and embodiments.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图用来提供对本发明的进一步理解，并且构成说明书的一部分，与本发明的实施例一起用于解释本发明，并不构成对本发明的限制；在附图中：The accompanying drawings are used to provide a further understanding of the present invention and constitute a part of the specification. Together with the embodiments of the present invention, they are used to explain the present invention and do not constitute a limitation of the present invention. In the accompanying drawings:

图1为本发明实施例提供的一种基于业务数据库的数据获取的步骤示意图；FIG1 is a schematic diagram of steps for acquiring data based on a business database provided by an embodiment of the present invention;

图2为本发明实施例提供的槽位初始化的步骤示意图；FIG2 is a schematic diagram of the steps of slot initialization provided by an embodiment of the present invention;

图3为本发明实施例提供的问题初始化的步骤示意图；FIG3 is a schematic diagram of the steps of question initialization provided by an embodiment of the present invention;

图4为本发明实施例提供的拼音纠错的步骤示意图；FIG4 is a schematic diagram of the steps of pinyin error correction provided by an embodiment of the present invention;

图5为本发明实施例提供的实体查找的步骤示意图；FIG5 is a schematic diagram of the steps of entity search provided by an embodiment of the present invention;

图6为本发明实施例提供的根据查询问句得到最优意图的流程示意图；FIG6 is a schematic diagram of a process for obtaining an optimal intent according to a query sentence provided by an embodiment of the present invention;

图7为本申请实施例提供的基于业务数据库的数据获取装置的结构示意图。FIG. 7 is a schematic diagram of the structure of a data acquisition device based on a business database provided in an embodiment of the present application.

具体实施方式Detailed ways

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本申请实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本申请的描述。In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application may also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present application.

应当理解，当在本申请说明书和所附权利要求书中使用时，术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the present specification and the appended claims, the term "comprising" indicates the presence of described features, wholes, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, wholes, steps, operations, elements, components and/or combinations thereof.

还应当理解，在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。It should also be understood that the term “and/or” used in the specification and appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

如在本申请说明书和所附权利要求书中所使用的那样，术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地，短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the specification and appended claims of this application, the term "if" can be interpreted as "when" or "uponce" or "in response to determining" or "in response to detecting", depending on the context. Similarly, the phrase "if it is determined" or "if [described condition or event] is detected" can be interpreted as meaning "uponce it is determined" or "in response to determining" or "uponce [described condition or event] is detected" or "in response to detecting [described condition or event]", depending on the context.

另外，在本申请说明书和所附权利要求书的描述中，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In addition, in the description of the present application specification and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the descriptions and cannot be understood as indicating or implying relative importance.

在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此，在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例，而是意味着“一个或多个但不是所有的实施例”，除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”，除非是以其他方式另外特别强调。References to "one embodiment" or "some embodiments" etc. described in the specification of this application mean that one or more embodiments of the present application include specific features, structures or characteristics described in conjunction with the embodiment. Therefore, the statements "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. that appear in different places in this specification do not necessarily refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways. The terms "including", "comprising", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized in other ways.

应理解，以下实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the serial numbers of the steps in the following embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

为了说明本申请的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solution of the present application, a specific embodiment is provided below for illustration.

发明人发现，在现有技术中，传统的数据库问答系统虽然在数据检索方面提供了便利，但仍然存在明显的局限性，系统要求在使用前的设计阶段就需要明确定义用户可能提出的问题或意图，并将这些意图映射到特定的SQL查询模板，对于每个意图都需要进行代码编写，定义好每个模板的输入输出，使用填槽的方式完善SQL语句，以根据预设的意图对数据库数据进行查询。然而，这种类型的系统在处理不断变化的问答需求时，需要进行定制化开发，修改代码以实现新的需求，开发的成本和复杂性高、灵活性不足。The inventors found that in the prior art, although the traditional database question-and-answer system provides convenience in data retrieval, it still has obvious limitations. The system requires that the questions or intentions that users may ask must be clearly defined in the design phase before use, and these intentions must be mapped to specific SQL query templates. For each intention, code must be written, the input and output of each template must be defined, and the SQL statement must be completed using a slot-filling method to query the database data according to the preset intention. However, this type of system requires customized development and code modification to meet new requirements when dealing with ever-changing question-and-answer needs. The development cost and complexity are high, and the flexibility is insufficient.

基于此，发明人经过进一步研发，做出本发明，提供一种基于业务数据库的数据获取方法及装置。Based on this, the inventors have made the present invention after further research and development, providing a data acquisition method and device based on a business database.

实施例一Embodiment 1

本发明实施例提供一种基于业务数据库的数据获取方法，参照图1所示，该方法包括：An embodiment of the present invention provides a data acquisition method based on a business database. As shown in FIG1 , the method includes:

S101：基于业务数据库进行槽位初始化和问题初始化，得到槽位字典和规范问题列表；所述槽位字典包括多个槽位与每一槽位对应的多个槽位字段。S101: Perform slot initialization and question initialization based on the business database to obtain a slot dictionary and a standard question list; the slot dictionary includes multiple slots and multiple slot fields corresponding to each slot.

本申请实施例中，上述步骤S101中，基于业务数据库进行槽位初始化和问题初始化，得到槽位字典和规范问题列表，主要包括两个部分：槽位初始化与问题初始化，用来分别得到槽位字典和规范问题列表。In an embodiment of the present application, in the above step S101, slot initialization and question initialization are performed based on the business database to obtain a slot dictionary and a standard question list, which mainly includes two parts: slot initialization and question initialization, which are used to obtain a slot dictionary and a standard question list respectively.

本申请实施例中，所涉及的槽位可以包括SQL类槽位和枚举类槽位。上述步骤S101中槽位初始化的步骤包括：从业务数据库中获取字段，将获取的字段分为SQL类槽位与枚举类槽位；从业务数据库中获取每一SQL类槽位对应的所有字段，组成SQL类槽位字典；从业务数据库中获取每一枚举类槽位对应的所有字段，组成枚举槽位字典；将SQL类槽位字典与枚举槽位字典组合得到槽位字典。In the embodiment of the present application, the slots involved may include SQL slots and enumeration slots. The slot initialization step in the above step S101 includes: obtaining fields from the business database, dividing the obtained fields into SQL slots and enumeration slots; obtaining all fields corresponding to each SQL slot from the business database to form an SQL slot dictionary; obtaining all fields corresponding to each enumeration slot from the business database to form an enumeration slot dictionary; combining the SQL slot dictionary with the enumeration slot dictionary to obtain a slot dictionary.

在一具体实施例中，槽位初始化步骤如图2所示，具体可以包括：从业务数据库中获取字段，根据获取的字段构成SQL类槽位字典与枚举类槽位字典，分别对应图2中的SQL类槽位与枚举类槽位，得到槽位字典后，需要将槽位字典中的所有字段进行数字和英文大小写的预处理，并导入分词系统(例如jieba)和Elasticsearch中，方便后续的实体查找步骤，还需要将槽位字典中的所有字段转化为拼音，形成拼音字典，方便后续的拼音纠错步骤使用。In a specific embodiment, the slot initialization step is shown in Figure 2, which may specifically include: obtaining fields from the business database, forming an SQL-type slot dictionary and an enumeration-type slot dictionary based on the obtained fields, which correspond to the SQL-type slots and enumeration-type slots in Figure 2 respectively. After obtaining the slot dictionary, all fields in the slot dictionary need to be pre-processed with numbers and English uppercase and lowercase letters, and imported into the word segmentation system (such as jieba) and Elasticsearch to facilitate subsequent entity search steps. It is also necessary to convert all fields in the slot dictionary into pinyin to form a pinyin dictionary to facilitate the subsequent pinyin error correction steps.

本申请实施例中，问题初始化的步骤包括：基于业务数据库获取标准问题列表；针对标准问题列表中的每一标准问题，去掉停用词及标点；将标准问题中的SQL类槽位改为与业务数据库无关的字段；将标准问题中的枚举槽位改为该枚举槽位对应的所有取值中的任意一项，得到规范问题列表。In an embodiment of the present application, the question initialization step includes: obtaining a list of standard questions based on a business database; removing stop words and punctuation for each standard question in the list of standard questions; changing the SQL class slots in the standard questions to fields that are not related to the business database; changing the enumeration slots in the standard questions to any one of all values corresponding to the enumeration slots to obtain a list of standardized questions.

在一具体实施例中，问题初始化步骤如图3所示，具体可以包括：从业务数据库中获取所有问题相关字段，进行人工处理为标准问题列表，即图3中的意图问题库，去掉其中的停用词和标点，进行问题规范化，即将标准问题中的SQL类槽位改为与业务数据库无关的字段，并将标准问题中的枚举槽位改为该枚举槽位对应的所有取值中的任意一项，得到规范问题列表，然后并将规范问题列表分别存储在两个不同的存储引擎中，一个存储在Elasticsearch中，用于后续的关键词召回，另一个通过语义模型转换为语义向量后存储在向量检索引擎(例如Milvus)中，用于后续的语义召回。In a specific embodiment, the question initialization step is shown in Figure 3, which may specifically include: obtaining all question-related fields from the business database, manually processing them into a standard question list, i.e., the intent question library in Figure 3, removing stop words and punctuation therein, and normalizing the questions, that is, changing the SQL class slots in the standard questions to fields that are not related to the business database, and changing the enumeration slots in the standard questions to any one of all values corresponding to the enumeration slots to obtain a standardized question list, and then storing the standardized question list in two different storage engines, one stored in Elasticsearch for subsequent keyword recall, and the other converted into a semantic vector through a semantic model and stored in a vector retrieval engine (e.g., Milvus) for subsequent semantic recall.

在一具体实施例中，为了增强可配置性，将通过执行SQL语句从数据库中获取字段的SQL类槽位，以及通过配置中的JSON字典获取枚举类槽位。执行SQL语句获取SQL类槽位的具体示例为：In a specific embodiment, in order to enhance configurability, the SQL class slot of the field is obtained from the database by executing SQL statements, and the enumeration class slot is obtained through the JSON dictionary in the configuration. A specific example of executing SQL statements to obtain SQL class slots is:

标识：stationLogo: station

名称：电站Name: Power Station

取值范围：Ranges:

查询库：AQuery library: A

查询语句：select distinct station_id,station_name from stationQuery statement: select distinct station_id,station_name from station

通过对业务数据库A执行槽位“电站”的查询语句获取该槽位对应的所有字段，导出形成字典文件，命名为station_dict.txt，字典格式参考jieba自定义词典格式，每行数据格式为：“词词频词性”，如：“01电站1000station”。By executing the query statement of slot "power station" on business database A, all fields corresponding to the slot are obtained and exported to form a dictionary file named station_dict.txt. The dictionary format refers to the jieba custom dictionary format. The format of each line of data is: "word frequency part of speech", such as: "01 power station 1000station".

执行SQL语句获取枚举类槽位的具体示例为：A specific example of executing SQL statements to obtain enumeration class slots is as follows:

标识：alerttypeIdentifier: alerttype

名称：告警类型Name: Alarm Type

取值范围：Ranges:

字典：{“预警”：1，“中断”：2，“告警”：3}Dictionary: {"warning": 1, "interrupt": 2, "alarm": 3}

枚举类槽位字典中的key为该槽位对应的槽位字段，后面的数字value表示该中文名在数据库表中对应的参数值。The key in the enumeration slot dictionary is the slot field corresponding to the slot, and the following numeric value represents the parameter value corresponding to the Chinese name in the database table.

通过配置中的JSON字典获取枚举类槽位的具体示例为：A specific example of getting the enumeration slot through the JSON dictionary in the configuration is:

标准问题：<电站>是否有<告警类型>？Standard question: Does <plant> have <alarm type>?

返回答案：Return answer:

查询库:AQuery library: A

查询语句：SELECT ad.rule_id FROM alert_data ad INNER JOIN(SELECTSUBSTRING_INDEX(group_concat(id ORDER BY collected_time DESC),',',1)mdid FROMalert_data WHERE alert_status！＝1AND deleted_flag＝1GROUP BY station_id,object_code,rule_name,alert_type)md ON ad.id＝md.mdid AND ad.deleted_flag＝1AND ad.continued_time IS NULL AND ad.alert_type IN(<告警类型>)ANDad.station_id IN(<电站>)Query statement: SELECT ad.rule_id FROM alert_data ad INNER JOIN(SELECTSUBSTRING_INDEX(group_concat(id ORDER BY collected_time DESC),',',1)mdid FROMalert_data WHERE alert_status!=1AND deleted_flag=1GROUP BY station_id,object_code,rule_name,alert_type)md ON ad.id=md.mdid AND ad.deleted_flag=1AND ad.continued_time IS NULL AND ad.alert_type IN(<alarm type>)ANDad.station_id IN(<power station>)

标准问题中“<>”表示的是槽位，即为问题中的可变参数，<电站>为槽位字典中的电站槽位，可以取任意该槽位对应的所有字段，<告警类型>同理。上述示例问题表示，某个具体的电站是否有告警类型，比如，02电站是否有告警？或03电站是否有预警？In the standard question, "<>" indicates the slot, which is the variable parameter in the question. <power station> is the power station slot in the slot dictionary, and all fields corresponding to any slot can be taken. The same is true for <alarm type>. The above example question indicates whether a specific power station has an alarm type, for example, whether power station 02 has an alarm? Or whether power station 03 has an early warning?

本申请实施例中，为了提高意图识别准确率，可以执行将标准问题中的SQL类槽位改为与业务数据库无关的字段的步骤。举例来说，由于“<电站>是否有告警”和“电站是否有告警”两个问句是有区别的，第一句“<电站>”表示可变参数，即取该槽位下的某个具体电站，第二句“电站”为不可变参数，表示电站概念即所有电站，为提高意图识别准确率，需要进行转化，以区分可变参数和不可变参数，将可变参数转化为与当前业务数据库不重叠不冲突的其他领域单词，不可变参数不进行转化。具体执行转换的示例为：“<电站>是否有告警”规范化为“<苹果>是否有告警”,“电站是否有告警”保持为“电站是否有告警”不变，其中，“苹果”即为与业务数据库无关的字段。In an embodiment of the present application, in order to improve the accuracy of intent recognition, the step of changing the SQL-type slot in the standard question to a field that is not related to the business database can be executed. For example, since the two questions "Does <power station> have an alarm" and "Does the power station have an alarm" are different, the first sentence "<power station>" represents a variable parameter, that is, a specific power station under the slot is taken, and the second sentence "power station" is an immutable parameter, which represents the concept of power station, that is, all power stations. In order to improve the accuracy of intent recognition, it is necessary to perform a conversion to distinguish between variable parameters and immutable parameters, and convert variable parameters into other field words that do not overlap or conflict with the current business database, and immutable parameters are not converted. An example of a specific conversion is: "Does <power station> have an alarm" is normalized to "Does <apple> have an alarm", and "Does the power station have an alarm" remains unchanged as "Does the power station have an alarm", where "apple" is a field that is not related to the business database.

本申请实施例中，为了提高意图识别准确率，可以执行将所述标准问题中的枚举槽位改为该枚举槽位对应的所有取值中的任意一项。例如：“<电站>是否有<告警类型>”规范化为“<电站>是否有告警”。In the embodiment of the present application, in order to improve the accuracy of intent recognition, the enumeration slot in the standard question can be changed to any one of all the values corresponding to the enumeration slot. For example, "Does <power station> have <alarm type>" is normalized to "Does <power station> have an alarm".

本申请实施例中，该步骤S101允许通过配置参数的方式灵活定义的槽位和问题。这种可配置性使本方法在不同场景、业务需求下能够进行灵活配置，而无需修改代码。适应不同的数据库结构、问题集合和业务规则。In the embodiment of the present application, step S101 allows slots and questions to be flexibly defined by configuring parameters. This configurability enables the method to be flexibly configured in different scenarios and business requirements without modifying the code, and to adapt to different database structures, question sets and business rules.

S102：根据槽位字典对获取的查询问句进行拼音纠错和实体查找，得到预处理后查询问句。S102: performing pinyin correction and entity search on the acquired query sentence according to the slot dictionary to obtain a pre-processed query sentence.

本申请实施例中，上述步骤S102中，根据槽位字典对获取的查询问句进行拼音纠错和实体查找，得到预处理后查询问句，包含两个步骤：拼音纠错与实体查找，通过拼音纠错得到纠错后的查询问句，再根据实体查找得到预处理后查询问句。In an embodiment of the present application, in the above step S102, pinyin correction and entity search are performed on the obtained query sentence according to the slot dictionary to obtain a preprocessed query sentence, which includes two steps: pinyin correction and entity search. The corrected query sentence is obtained by pinyin correction, and the preprocessed query sentence is obtained according to the entity search.

上述步骤S102中拼音纠错的步骤如图4所示，具体可以包括：第一步，对查询问句进行分词；第二步，对分词结果进行前后拼接生成n-gram词串，其中n为预设的最大词串长度；第三步，对所述词串进行拼音纠错，将所有词串转化为对应的拼音，然后查询拼音字典，查询有单个结果则直接纠错，如果有多个结果，则计算每个结果与词串之间的编辑距离，取编辑距离最小的结果作为纠错结果，将纠错结果与原词串替换；重复第一步至第三步的步骤，每次生成的词串长度都较上一次减一，直到完成查询问句的纠错。The steps of pinyin correction in the above step S102 are shown in FIG4 , and may specifically include: a first step of segmenting the query sentence; a second step of concatenating the segmentation results to generate an n-gram string, wherein n is a preset maximum string length; a third step of performing pinyin correction on the string, converting all string into corresponding pinyin, and then querying a pinyin dictionary. If there is a single result, the query is directly corrected. If there are multiple results, the edit distance between each result and the string is calculated, and the result with the smallest edit distance is taken as the correction result, and the correction result is replaced with the original string; the steps from the first step to the third step are repeated, and the length of the string generated each time is reduced by one compared with the previous one, until the query sentence is corrected.

上述步骤S102中实体查找的步骤如图5所示，具体可以包括：根据停用词将纠错后的查询问句切分，得到多个子句，其中停用词为文本分析中无太大价值的词，如“有”、“和”、“是”等；对每一子句进行分词，得到单词列表，并前后拼接，形成1-gram至n-gram的词串，即N元组列表；对N元组列表进行预处理，进行数字和英文大小写转换；槽位字典和N元组列表存储在Elasticsearch中，使用Elasticsearch查找得到每一N元组的多个候选字段以及对应的相似度得分；将相似度得分最高的候选字段作为对应N元组的最优候选字段，得到N元组列表中每一项的最优候选字段；将最优候选字段与查询问句中对应的原始字段映射连接起来，方便后续处理；将N元组列表中重叠的最优候选字段去除，得到子句的最终结果；将纠错后的查询问句中的子句替换为对应的最终结果，进行还原，得到预处理后查询问句，即图5中标准化实体后的问句。The entity search step in the above step S102 is shown in FIG5 , and specifically may include: segmenting the corrected query sentence according to stop words to obtain multiple clauses, wherein the stop words are words that are not very valuable in text analysis, such as “有”, “和”, “是”, etc.; segmenting each clause to obtain a word list, and concatenating them front and back to form a word string from 1-gram to n-gram, that is, an N-tuple list; preprocessing the N-tuple list to perform number and English case conversion; storing the slot dictionary and the N-tuple list in Elasticsearch, and using Elasticsearch to search to obtain multiple candidate fields and corresponding similarity scores for each N-tuple; using the candidate field with the highest similarity score as the optimal candidate field corresponding to the N-tuple, and obtaining the optimal candidate field for each item in the N-tuple list; mapping and connecting the optimal candidate field to the corresponding original field in the query sentence to facilitate subsequent processing; removing the overlapping optimal candidate fields in the N-tuple list to obtain the final result of the clause; replacing the clause in the corrected query sentence with the corresponding final result, and restoring it to obtain the preprocessed query sentence, that is, the sentence after the standardized entity in FIG5 .

在一具体实施例中，实体查找流程的具体示例为：In a specific embodiment, a specific example of the entity search process is:

纠错后的查询问句：十Kv智慧变电b站有告警吗Query sentence after error correction: Is there any alarm at the 10Kv smart substation B?

划分子句：十Kv智慧变电b站，告警Division clause: 10Kv smart substation b, alarm

对第一个子句进行分词：['十','Kv','智慧','变电','b','站']Segment the first clause: ['十', 'Kv', '智慧', '变電', 'b', '站']

N元组列表：['智慧变电b站'，'Kv智慧变电'，'十Kv智慧变电'，'智慧变电'，'十Kv智慧'，'Kv智慧变电b站'，'智慧变电b'，'十Kv智慧变电b'，'Kv智慧'，'十Kv智慧变电b站'，'Kv智慧变电b']N-tuple list: ['Smart Substation B', 'Kv Smart Substation', 'Ten Kv Smart Substation', 'Smart Substation', 'Ten Kv Wisdom', 'Kv Smart Substation B', 'Smart Substation B', 'Ten Kv Smart Substation B', 'Kv Wisdom', 'Ten Kv Smart Substation B', 'Kv Smart Substation B']

预处理：['智慧变电b'，'kv智慧变电b'，'kv智慧变电'，'kv智慧'，'10kv智慧变电b站'，'智慧变电'，'智慧变电b站'，'kv智慧变电b站'，'10kv智慧变电b'，'10kv智慧'，'10kv智慧变电']Preprocessing: ['smart substation b', 'kv smart substation b', 'kv smart substation', 'kv wisdom', '10kv smart substation b station', 'smart substation', 'smart substation b station', 'kv smart substation b station', '10kv smart substation b', '10kv wisdom', '10kv smart substation']

候选字段：{'10kv智慧变电b@@十Kv智慧变电b'：{'智慧10kv变电b站'：21.14124},'智慧变电b@@智慧变电b'：{'智慧10kv变电b站'：17.298553，'智慧b变电站':16.139145}，，...}Candidate fields: {'10kv smart substation b@@10Kv smart substation b': {'smart 10kv substation b station': 21.14124}, 'smart substation b@@smart substation b': {'smart 10kv substation b station': 17.298553, 'smart b substation': 16.139145},,...}

其中，符号“@@”前的字段‘10kv智慧变电b’为送入Elasticsearch查找的预处理后的字段，符号“@@”后的字段‘十Kv智慧变电b’为该字段未进行预处理前的原字段，'智慧10kv变电b站'为候选字段，‘21.14124’为该候选字段的相似度得分。Among them, the field ‘10kv smart substation b’ before the symbol “@@” is the preprocessed field sent to Elasticsearch for search, the field ‘10Kv smart substation b’ after the symbol “@@” is the original field before the preprocessing, ‘Smart 10kv substation b’ is the candidate field, and ‘21.14124’ is the similarity score of the candidate field.

最优候选字段：{'智慧10kv变电b站'：{'score'：23.621014，'gram'：'10kv智慧变电b站@@十Kv智慧变电b站'}，'智慧b变电站'：{'score'：19.018782，'gram'：'智慧变电b站@@智慧变电b站'}，...}Optimal candidate fields: {'Smart 10kv substation b': {'score': 23.621014, 'gram': '10kv smart substation b@@10Kv smart substation b'}, 'Smart b substation': {'score': 19.018782, 'gram': 'Smart substation b@@Smart substation b'}, ...}

其中，'智慧10kv变电b站'为最优候选字段，'score'：23.621014表示相似度得分，'gram'：'10kv智慧变电b站@@十Kv智慧变电b站'表示对应分词，符号“@@”前的字段‘10kv智慧变电b站’为送入Elasticsearch查找的预处理后的字段，符号“@@”后的字段‘十Kv智慧变电b’为该字段未进行预处理前的原字段。Among them, 'Smart 10kv Substation b' is the optimal candidate field, 'score': 23.621014 represents the similarity score, 'gram': '10kv Smart Substation b@@十Kv Smart Substation b' represents the corresponding word segmentation, the field '10kv Smart Substation b' before the symbol "@@" is the preprocessed field sent to Elasticsearch for search, and the field '十Kv Smart Substation b' after the symbol "@@" is the original field before preprocessing.

字段还原：{'ori'：'十Kv智慧变电b站'，'candidate'：'智慧10kV变电B站'，'score':23.621014}，{'ori'：'智慧变电b站'，'candidate'：'智慧B变电站'，'score'：19.018782}，...}Field restoration: {'ori':'10KV Smart Substation B', 'candidate':'Smart 10kV Substation B', 'score':23.621014}, {'ori':'Smart Substation B', 'candidate':'Smart B Substation', 'score':19.018782}, ...}

其中，'ori'：'十Kv智慧变电b站'表示未进行预处理前的原字段，'candidate'：'智慧10kV变电B站'为最优候选字段，'score':23.621014为最优候选字段的相似度得分。Among them, 'ori': '10KV smart substation B' represents the original field before preprocessing, 'candidate': 'Smart 10kV Substation B' is the optimal candidate field, and 'score': 23.621014 is the similarity score of the optimal candidate field.

重叠项处理：{'ori':'十Kv智慧变电b站','candidate':'智慧10kV变电B站','score':23.621014}Overlapping item processing: {'ori':'10KV smart substation B','candidate':'Smart 10kV substation B','score':23.621014}

句子还原：智慧10kV变电B站有告警吗；Sentence restoration: Is there any alarm at the smart 10kV substation B?

纠错后的查询问句“十Kv智慧变电b站有告警吗”，通过以上步骤即可完成实体查找，转化为标准的“智慧10kV变电B站有告警吗”。After error correction, the query sentence "Is there any alarm at the 10KV smart substation B" can complete the entity search through the above steps and be converted into the standard "Is there any alarm at the smart 10kV substation B"

本申请实施例中，执行实体查找的步骤，通过对纠错后的查询问句进行数字和英文大小写的统一转换，去除停用词和标点符号，确保了输入的文本是干净且标准化的，解决了问句中的少字、多字、顺序颠倒等问题，得到预处理后查询问句。In the embodiment of the present application, the step of executing entity search is performed by uniformly converting numbers and English uppercase and lowercase letters of the query sentence after error correction, removing stop words and punctuation marks, ensuring that the input text is clean and standardized, solving problems such as missing words, extra words, and reversed order in the question sentence, and obtaining the preprocessed query sentence.

S103：根据所述槽位字典确定所述预处理后查询问句中的槽位字段与对应的槽位。S103: Determine a slot field and a corresponding slot in the preprocessed query sentence according to the slot dictionary.

本申请实施例中，上述步骤S103中，将槽位字典导入jieba分词系统中，使用jieba分词系统对预处理后查询问句进行处理，得到预处理后查询问句中的槽位字段，并确定槽位字段对应的槽位。In the embodiment of the present application, in the above step S103, the slot dictionary is imported into the jieba word segmentation system, the jieba word segmentation system is used to process the preprocessed query sentence, the slot field in the preprocessed query sentence is obtained, and the slot corresponding to the slot field is determined.

S104：根据所述预处理后查询问句的槽位将所述预处理后查询问句中的槽位字段进行规范化处理，得到规范化后的查询问句。S104: normalizing the slot fields in the preprocessed query sentence according to the slots of the preprocessed query sentence to obtain a normalized query sentence.

本身申请实施例中，上述步骤S104中，对确定的预处理后查询问句中的槽位字段进行规范化处理，包括：将预处理后查询问句的槽位字段属于SQL类槽位字典的替换为对应的槽位，得到替换后的查询问句，再将替换后的查询问句中的槽位改为业务数据库无关的字段。与步骤S101中的问题初始化操作相似，将可变参数转换为其他领域的单词，得到规范化的查询问句，以便后续更好地进行意图识别。In the embodiment of the present application, in the above step S104, the slot field in the determined pre-processed query sentence is normalized, including: replacing the slot field of the pre-processed query sentence belonging to the SQL slot dictionary with the corresponding slot to obtain the replaced query sentence, and then changing the slot in the replaced query sentence to a field unrelated to the business database. Similar to the question initialization operation in step S101, the variable parameters are converted into words in other fields to obtain a normalized query sentence, so as to better perform intent recognition later.

在一具体实施例中，使用jieba分词系统处理预处理后查询问句“智慧10kV变电A站有告警吗”，得到槽位字段“智慧10kV变电A站”和对应槽位“<电站>”，规范化处理后得到“<苹果>有告警吗”。In a specific embodiment, the jieba word segmentation system is used to process the pre-processed query sentence "Is there an alarm at the smart 10kV substation A?", and the slot field "Smart 10kV substation A" and the corresponding slot "<power station>" are obtained. After normalization, "Is there an alarm at <Apple>?" is obtained.

S105：根据所述规范问题列表，将所述规范化的查询问句进行关键字召回和语义召回，并根据所述关键字召回的结果和语义召回的结果，得到最优意图。S105: According to the standard question list, the standardized query sentence is subjected to keyword recall and semantic recall, and the optimal intent is obtained according to the results of the keyword recall and the semantic recall.

本申请实施例中，上述步骤S105能够实现对规范化的查询问句进行意图识别，得到最优意图。意图识别的过程可以包括关键字召回和语义召回两个部分。具体来讲，在进行关键字召回时，在规范问题列表中对规范化的查询问句进行关键词检索，即基于Elasticsearch中的规范问题列表进行关键词召回，并设置相似度得分阈值，得到检索到的多个相似度得分大于相似度得分阈值的、从大到小依次排列的查询问句和对应的相似度得分作为关键字召回的结果。在进行语义召回时，将规范问题列表提取为规范问题语义列表，即将存储在向量检索引擎中的规范问题列表对应的语义向量通过查询嵌入(QueryEmbedding)转换为连续语义向量，得到规范问题语义列表，存储在向量检索引擎中；在规范问题语义列表中对规范化的查询问句进行语义检索，即基于向量检索引擎如Milvus中的规范问题语义列表进行语义召回，同样设置相似度得分阈值，得到检索到的多个相似度得分大于相似度得分阈值的、从大到小依次排列的查询问句和对应的相似度得分作为语义召回的结果。之后，取关键词召回的结果中前n个问题作为第一候选集，取语义召回的结果中前m个问题作为第二候选集，基于第一候选集和第二候选集分别归一化，确保第一候选集中的n个问题与第二候选集中的m个问题的相似度得分总和等于1；根据问题的位移标识(如问题ID)计算第一候选集与第二候选集中相同查询问句的相似度得分的平均值，并将平均值作为对应查询问句的相似度得分；选择相似度得分最高的查询问句作为候选意图；计算候选意图与规范化的查询问句的编辑距离，判断编辑距离是否小于预设阈值：若是，将候选意图作为最优意图；若否，返回表示无法匹配的错误信息，即认为查询问句与规范问题列表中的所有问题意图不同。In an embodiment of the present application, the above step S105 can realize the intent recognition of the standardized query sentence and obtain the optimal intent. The process of intent recognition may include two parts: keyword recall and semantic recall. Specifically, when performing keyword recall, a keyword search is performed on the standardized query sentence in the standard question list, that is, keyword recall is performed based on the standard question list in Elasticsearch, and a similarity score threshold is set to obtain multiple retrieved query sentences with similarity scores greater than the similarity score threshold, arranged in order from large to small, and the corresponding similarity scores as the result of keyword recall. When performing semantic recall, the standard question list is extracted as a standard question semantic list, that is, the semantic vector corresponding to the standard question list stored in the vector retrieval engine is converted into a continuous semantic vector through query embedding (QueryEmbedding) to obtain a standard question semantic list, which is stored in the vector retrieval engine; semantic retrieval is performed on the standardized query sentences in the standard question semantic list, that is, semantic recall is performed based on the standard question semantic list in the vector retrieval engine such as Milvus, and a similarity score threshold is also set to obtain multiple retrieved query sentences with similarity scores greater than the similarity score threshold, arranged in order from large to small, and the corresponding similarity scores as the result of semantic recall. Afterwards, the first n questions in the results of keyword recall are taken as the first candidate set, and the first m questions in the results of semantic recall are taken as the second candidate set. The first candidate set and the second candidate set are normalized respectively to ensure that the sum of the similarity scores of the n questions in the first candidate set and the m questions in the second candidate set is equal to 1; the average similarity score of the same query sentence in the first candidate set and the second candidate set is calculated according to the displacement identifier of the question (such as question ID), and the average value is used as the similarity score of the corresponding query sentence; the query sentence with the highest similarity score is selected as the candidate intent; the edit distance between the candidate intent and the standardized query sentence is calculated to determine whether the edit distance is less than a preset threshold: if so, the candidate intent is taken as the optimal intent; if not, an error message indicating that the match cannot be made is returned, that is, the query sentence is considered to be different from all the question intents in the standardized question list.

本申请实施例中，上述步骤S102至S105的框架示意图如图6所示，首先是步骤S102，根据查询问句进行拼音纠错和实体查找，得到预处理后查询问句，对应图6中的用户问句Query经过句子预处理操作，包括拼音纠错和实体查找。然后是步骤S103与S104，确定预处理后查询问句中的槽位字段和对应的槽位并进行规范化处理，得到规范化的查询问句，对应图6中的要素抽取与规范化转换。最后执行步骤S105，根据规范化的查询问句通过关键词召回与语义召回，得到最优意图，对应图6中的关键词召回与语义召回。其中，关键词召回为根据ES文本知识库与意图问题库进行ES(Elasticsearch)召回，召回算法为BM25Search，图中的意图问题库即为上文中的规范问题列表。语义召回包括，将规范化的查询问句转换为语义向量，再通过查询嵌入得到连续语义向量进行Milvus召回，召回算法为Dense Vector Search。接着，取关键词召回结果中相似度得分最高的n个问题和语义召回结果中相似度得分最高的m个问题，合并得到TopK意图候选集，加和平均后排序，选择相似度得分最高的查询问句作为候选意图，该候选意图即为图6中的最优意图获取。最后，再进行编辑距离判断，若候选意图与规范化的查询问句之间的编辑距离小于预设阈值，则候选意图即为最优意图，若候选意图与规范化的查询问句之间的编辑距离不小于预设阈值，则认为查询问句与规范问题列表中的所有问题意图不同，返回表示无法匹配的错误信息。In the embodiment of the present application, the framework diagram of the above steps S102 to S105 is shown in Figure 6. First, step S102 performs pinyin error correction and entity search according to the query sentence to obtain the preprocessed query sentence, corresponding to the user sentence Query in Figure 6 after sentence preprocessing operations, including pinyin error correction and entity search. Then steps S103 and S104 are performed to determine the slot field and the corresponding slot in the preprocessed query sentence and perform normalization processing to obtain a normalized query sentence, corresponding to the element extraction and normalization conversion in Figure 6. Finally, step S105 is executed to obtain the optimal intent through keyword recall and semantic recall according to the normalized query sentence, corresponding to the keyword recall and semantic recall in Figure 6. Among them, keyword recall is ES (Elasticsearch) recall based on the ES text knowledge base and the intent question library, and the recall algorithm is BM25Search. The intent question library in the figure is the standard question list mentioned above. Semantic recall includes converting the standardized query into a semantic vector, and then performing Milvus recall by embedding the query to obtain a continuous semantic vector. The recall algorithm is Dense Vector Search. Next, the n questions with the highest similarity scores in the keyword recall results and the m questions with the highest similarity scores in the semantic recall results are taken and merged to obtain the TopK intent candidate set. After summing and averaging, the query with the highest similarity score is selected as the candidate intent. The candidate intent is the optimal intent obtained in Figure 6. Finally, the edit distance is judged. If the edit distance between the candidate intent and the standardized query is less than the preset threshold, the candidate intent is the optimal intent. If the edit distance between the candidate intent and the standardized query is not less than the preset threshold, the query is considered to be different from all the question intents in the standardized question list, and an error message indicating that the match cannot be made is returned.

在一具体实施例中，将所述规范化的查询问句进行关键字召回和语义召回得到最优意图的步骤示例为：In a specific embodiment, the steps of performing keyword recall and semantic recall on the standardized query sentence to obtain the optimal intent are as follows:

规范化转换：<苹果>有告警吗Normalization conversion: Does Apple have any warnings?

关键词召回：Keyword recall:

Ids：[16417929980417,16418097314507,16418046495959374,...]Ids: [16417929980417,16418097314507,16418046495959374,...]

Questions：['<苹果>有告警吗'，'<苹果>有哪些告警'，'<苹果>哪些设备有告警'，...]Questions: ['Does <Apple> have any alerts', 'What alerts does <Apple> have', 'Which <Apple> devices have alerts', ...]

Scores：[0.2148，0.2036，0.1935，...]Scores: [0.2148, 0.2036, 0.1935, ...]

语义召回：Semantic Recall:

Ids：[16417929980417，16417929980459，16417929985741，...]Ids: [16417929980417, 16417929980459, 16417929985741, ...]

Questions：['<苹果>有告警吗'，'<苹果>是否有告警'，'<苹果>有告警发生吗'，...]Questions: ['Does <Apple> have an alarm?', 'Does <Apple> have an alarm?', 'Does <Apple> have an alarm?', ...]

Scores：[0.2493，0.2344，0.196，...]Scores: [0.2493, 0.2344, 0.196, ...]

合并排序：{id：'16417929980417，question：'<苹果>有告警吗'，'score'：0.2701}Merge sort: {id:'16417929980417, question:'Does <Apple> have an alert?', 'score': 0.2701}

最优意图获取：<苹果>有告警吗Best intention acquisition: Does Apple have any warnings?

其中，Ids表示问题ID，Questions和Scores表示对应召回的结果中的问题与对应相似度得分。Among them, Ids represents the question ID, Questions and Scores represent the questions in the corresponding recalled results and the corresponding similarity scores.

本申请实施例中，若出现候选意图与规范化的查询问句之间的编辑距离不小于预设阈值的情况，在返回表示无法匹配的错误信息之前，还可以进行上下文处理，实现连续问答。该上下文处理的过程具体包括：获取查询问句的上一查询问句；根据槽位字典确定上一查询问句中的槽位字段与对应的槽位；判断预处理后查询问句中的槽位是否都存在于上一查询问句的槽位中：若是，则将规范化后的查询问句的槽位字段对应替换到上一查询问句中，将上一查询问句作为更新后的预处理后查询问句；若否，返回表示查询问句并非追问的信息。以下述T-1时刻问句和T时刻问句进行上下文处理的为例：In an embodiment of the present application, if the edit distance between the candidate intent and the normalized query sentence is not less than a preset threshold, context processing can be performed to achieve continuous question and answer before returning an error message indicating that the match cannot be made. The context processing process specifically includes: obtaining the previous query sentence of the query sentence; determining the slot field and the corresponding slot in the previous query sentence according to the slot dictionary; determining whether the slots in the preprocessed query sentence all exist in the slots of the previous query sentence: if so, replacing the slot field of the normalized query sentence with the corresponding one in the previous query sentence, and using the previous query sentence as the updated preprocessed query sentence; if not, returning information indicating that the query sentence is not a follow-up question. Take the following T-1 time sentence and T time sentence for context processing as an example:

T-1时刻问句：智慧10kV变电B站有告警吗Question at T-1: Is there any alarm at the smart 10kV substation B?

T时刻问句：智慧10kV变电A站呢Question at T time: Where is the smart 10kV substation A?

更新后的预处理后查询问句：智慧10kV变电A站有告警吗Updated pre-processed query question: Is there any alarm at the smart 10kV substation A?

通过上述上下文处理方式，得到当前T时刻问句的更新后的预处理后查询问句，实现连续问答。Through the above context processing method, an updated pre-processed query sentence of the current question at time T is obtained to achieve continuous question answering.

本申请实施例中，在得到最优意图之后，还可以通过参数核验和反问的方式，检查查询问句是否包含了最优意图中所需的所有参数。具体的实现过程包括：判断预处理后查询问句的槽位是否包括了最优意图所需的所有槽位：若否，根据预处理后查询问句的槽位和最优意图所需的所有槽位确定缺失槽位，并返回补充缺失槽位的请求；接收对应的缺失槽位的值，得到优化后预处理后查询问句的槽位字段。举例说明，对于规范化的查询问句"<苹果>有告警吗"，其意图为"<电站>有<告警类型>吗"，需要提供<电站>和<告警类型>两个必要参数，<电站>参数可以通过jieba分词系统实现实体识别获取<电站>参数。<告警类型>参数则通过关键词匹配，即使用枚举类槽位<告警类型>中的槽位字段与查询问句进行匹配。若判断预处理后查询问句的槽位未包括最优意图所需的所有槽位，则会触发反问，返回提出补充缺失槽位信息的请求。例如，当查询问句为"有告警吗"时，本方法会识别到匹配的意图为"<电站>有<告警类型>吗"，检测到'<电站>'参数缺失，触发反问模块，反问"请问您所询问的是哪个电站？"。随后，将等待用户的下一次交互，接收对应的缺失槽位的值，得到优化后预处理后查询问句的槽位字段，完成缺失参数的补充，从而顺利完成问答过程。In the embodiment of the present application, after obtaining the optimal intent, it is also possible to check whether the query sentence contains all the parameters required in the optimal intent by means of parameter verification and counter-questioning. The specific implementation process includes: judging whether the slots of the query sentence after preprocessing include all the slots required for the optimal intent: if not, determining the missing slots according to the slots of the query sentence after preprocessing and all the slots required for the optimal intent, and returning a request to supplement the missing slots; receiving the value of the corresponding missing slot, and obtaining the slot field of the optimized query sentence after preprocessing. For example, for the standardized query sentence "Does <Apple> have an alarm?", its intention is "Does <Power Station> have <Alarm Type>?", it is necessary to provide two necessary parameters, <Power Station> and <Alarm Type>, and the <Power Station> parameter can be obtained by entity recognition through the jieba word segmentation system. The <Alarm Type> parameter is matched by keywords, that is, the slot field in the enumeration slot <Alarm Type> is used to match the query sentence. If it is determined that the slots of the preprocessed query sentence do not include all the slots required for the optimal intent, a reverse question will be triggered, and a request to supplement the missing slot information will be returned. For example, when the query sentence is "Is there an alarm?", this method will recognize that the matching intent is "Does <power station> have <alarm type>?", detect that the '<power station>' parameter is missing, trigger the reverse question module, and ask "Which power station are you asking about?". Subsequently, it will wait for the next interaction of the user, receive the value of the corresponding missing slot, obtain the slot field of the optimized preprocessed query sentence, complete the supplement of the missing parameters, and successfully complete the question-answering process.

S106：基于最优意图获取对应的配置SQL模板。S106: Obtain a corresponding configuration SQL template based on the optimal intent.

S107：根据配置SQL模板和预处理后查询问句的槽位字段生成查询语句，并在业务数据库中执行查询语句，得到查询数据。S107: Generate a query statement according to the configured SQL template and the slot field of the pre-processed query statement, and execute the query statement in the business database to obtain query data.

本申请实施例中，上述步骤S106与S107中，基于最优意图获取对应的配置SQL模板，根据预处理后查询问句中的槽位字段转成通用参数格式，将参数填充到对应的配置SQL模板中，得到查询语句。接着，执行查询语句以检索相应的数据，最终返回检索到的数据。In the embodiment of the present application, in the above steps S106 and S107, the corresponding configuration SQL template is obtained based on the optimal intent, the slot field in the preprocessed query is converted into a general parameter format, and the parameters are filled into the corresponding configuration SQL template to obtain a query statement. Then, the query statement is executed to retrieve the corresponding data, and the retrieved data is finally returned.

在一具体实施例中，通用参数格式分为“name”和“code”两部分，其中，“name”用于记录查询问句中的槽位字段，“code”记录了槽位字段对应的参数。示例：In a specific embodiment, the general parameter format is divided into two parts: "name" and "code", where "name" is used to record the slot field in the query sentence, and "code" records the parameter corresponding to the slot field. Example:

"args":{"name":{"station":["智慧10kV变电A站"],"alerttype":["告警"]}，"args":{"name":{"station":["Smart 10kV Substation A"],"alerttype":["Alarm"]},

"code":{"station":["bdz001"],"alerttype":[3]}}"code":{"station":["bdz001"],"alerttype":[3]}}

对应生成的查询语句：The corresponding generated query statement:

SELECT ad.rule_id FROM alert_data ad INNER JOIN(SELECT SUBSTRING_INDEX(group_concat(id ORDER BY collected_time DESC),',',1)mdid FROM alert_data WHERE alert_status！＝1AND deleted_flag＝1GROUP BY station_id,object_code,rule_name,alert_type)md ON ad.id＝md.mdid AND ad.deleted_flag＝1ANDad.continued_time IS NULL AND ad.alert_type IN(3)AND ad.station_id IN("bdz001")SELECT ad.rule_id FROM alert_data ad INNER JOIN(SELECT SUBSTRING_INDEX(group_concat(id ORDER BY collected_time DESC),',',1)mdid FROM alert_data WHERE alert_status!=1AND deleted_flag=1 GROUP BY station_id,object_code,rule_name,alert_type)md ON ad.id=md.mdid AND ad.deleted_flag=1ANDad.continued_time IS NULL AND ad.alert_type IN(3)AND ad.station_id IN("bdz001")

本发明实施例提供的基于业务数据库的数据获取方法，相较传统数据库问答系统使用代码提前编写意图与SQL语句的对应关系的方式，本方法以将自然语言转为意图与槽位数据，能准确的获得查询问句对应的最优意图，再根据最优意图对应的配置SQL模板得到SQL语句，这种方式只需提前定义与对应的配置SQL模板，若出现新的问答需求，只需添加新的意图与对应的配置SQL模板，不需要反复进行代码编写。能够有效保持高准确性、可解释性和性能，为技术人员提供了一种解决不断变化需求的有效途径，降低了开发成本和复杂性，使得业务数据库问答系统能够更好地适应多样的业务场景。Compared with the traditional database question-and-answer system that uses code to write the correspondence between intent and SQL statements in advance, the data acquisition method based on the business database provided by the embodiment of the present invention converts natural language into intent and slot data, can accurately obtain the optimal intent corresponding to the query, and then obtain the SQL statement according to the configuration SQL template corresponding to the optimal intent. This method only needs to define the corresponding configuration SQL template in advance. If new question-and-answer requirements arise, only new intents and corresponding configuration SQL templates need to be added, and there is no need to repeatedly write code. It can effectively maintain high accuracy, explainability and performance, provide technical personnel with an effective way to solve changing needs, reduce development costs and complexity, and enable the business database question-and-answer system to better adapt to various business scenarios.

实施例二Embodiment 2

基于同一发明构思，本发明实施例还提供一种基于业务数据库的数据获取装置，参照图7所示，该装置包括：Based on the same inventive concept, an embodiment of the present invention further provides a data acquisition device based on a business database, as shown in FIG. 7 , the device includes:

配置初始化模块101，用于基于业务数据库进行槽位初始化和问题初始化，得到槽位字典和规范问题列表；所述槽位字典包括多个槽位与每一槽位对应的多个槽位字段；A configuration initialization module 101 is configured to perform slot initialization and question initialization based on a business database, and obtain a slot dictionary and a standard question list; the slot dictionary includes a plurality of slots and a plurality of slot fields corresponding to each slot;

文本预处理模块102，用于根据所述槽位字典对获取的查询问句进行实体查找，得到预处理后查询问句；A text preprocessing module 102, configured to perform entity search on the acquired query sentence according to the slot dictionary to obtain a preprocessed query sentence;

意图识别模块103，用于根据所述槽位字典确定所述预处理后查询问句中的槽位字段与对应的槽位；根据所述预处理后查询问句的槽位将所述预处理后查询问句中的槽位字段进行规范化处理，得到规范化后的查询问句；根据所述规范问题列表，将所述规范化的查询问句进行关键字召回和语义召回，并根据所述关键字召回的结果和语义召回的结果，得到最优意图；The intention recognition module 103 is used to determine the slot field and the corresponding slot in the preprocessed query sentence according to the slot dictionary; normalize the slot field in the preprocessed query sentence according to the slot of the preprocessed query sentence to obtain a normalized query sentence; perform keyword recall and semantic recall on the normalized query sentence according to the normalized question list, and obtain the optimal intention according to the results of the keyword recall and the results of the semantic recall;

后台查询模块104，用于基于所述最优意图和所述预处理后查询问句的槽位字段生成查询语句，并在所述业务数据库中执行所述查询语句，得到查询数据。The background query module 104 is used to generate a query statement based on the optimal intent and the slot field of the pre-processed query sentence, and execute the query statement in the business database to obtain query data.

实施例三Embodiment 3

基于同一发明构思，本发明实施例还提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上述实施例一中所描述的基于业务数据库的数据获取方法。Based on the same inventive concept, an embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the data acquisition method based on the business database as described in the above embodiment 1 is implemented.

实施例四Embodiment 4

基于同一发明构思，本发明实施例还提供一种计算机设备，包括存储器，处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上述实施例一中所描述的基于业务数据库的数据获取方法。Based on the same inventive concept, an embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the data acquisition method based on the business database as described in the above-mentioned embodiment 1 is implemented.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) containing computer-usable program codes.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to the flowchart and/or block diagram of the method, device (system), and computer program product according to the embodiment of the present invention. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims

1. A method for acquiring data based on a service database, comprising:

carrying out slot initialization and problem initialization based on a service database to obtain a slot dictionary and a specification problem list; the slot dictionary comprises a plurality of slots and a plurality of slot fields corresponding to each slot;

performing entity searching on the acquired inquiry question according to the slot dictionary to obtain a preprocessed inquiry question;

Determining a slot field and a corresponding slot in the query question after preprocessing according to the slot dictionary;

normalizing the slot fields in the preprocessed query question according to the slots of the preprocessed query question to obtain a normalized query question;

according to the standard problem list, carrying out keyword recall and semantic recall on the standardized query question sentence, and obtaining the optimal intention according to the keyword recall result and the semantic recall result;

acquiring a corresponding configuration SQL template based on the optimal intention;

generating a query sentence according to the configuration SQL template and the slot field of the preprocessed query question, and executing the query sentence in the service database to obtain query data.

2. The method of claim 1, wherein the performing slot initialization and problem initialization based on the service database to obtain a slot dictionary and a canonical problem list comprises:

acquiring SQL class slots and enumeration class slots in the service database;

acquiring all fields corresponding to each SQL class slot from the service database to form an SQL class slot dictionary;

Acquiring all fields corresponding to each enumeration slot from the service database to form an enumeration slot dictionary;

combining the SQL class slot dictionary with the enumeration slot dictionary to obtain a slot dictionary;

and initializing the problems based on the service database to obtain a standard problem list.

3. The method of claim 2, wherein the obtaining a list of canonical problems based on the business database comprises:

acquiring a standard problem list based on the service database;

changing SQL class slots in the standard problem into fields irrelevant to the service database;

and changing the enumeration slot position in the standard problem into any one of all values corresponding to the enumeration slot position to obtain a standard problem list.

4. The method of claim 2, wherein normalizing the slot fields in the preprocessed query question according to the slots of the preprocessed query question to obtain a normalized query question, comprises:

replacing the slot fields of the preprocessed query question with corresponding slots belonging to the SQL class slot dictionary to obtain a replaced query question;

and changing the slot positions in the replaced inquiry question into fields irrelevant to the service database.

5. The method of claim 1, wherein the performing keyword recall and semantic recall on the normalized query question according to the normalized question list, and obtaining the optimal intent according to the result of the keyword recall and the result of the semantic recall, comprises:

keyword retrieval is carried out on the normalized query question sentences in the standard question list, and a plurality of query question sentences with similarity scores which are sequentially arranged from large to small are obtained to serve as the recall result of the keywords;

extracting the standard problem list as a standard problem semantic list;

carrying out semantic retrieval on the normalized query question sentences in the standard question semantic list to obtain a plurality of query question sentences with similarity scores which are sequentially arranged from large to small as a result of the semantic recall; the keyword recall result and the semantic recall result respectively comprise a plurality of inquiry questions and corresponding similarity scores;

calculating the average value of similarity scores of all the same query questions in the keyword recall result and the semantic recall result, and taking the average value as the similarity score of the corresponding query questions;

Selecting the query question with the highest similarity score as a candidate intention;

calculating the editing distance between the candidate intention and the normalized query question, and judging whether the editing distance is smaller than a preset threshold value or not:

if yes, taking the candidate intention as an optimal intention;

if not, returning error information indicating that the matching is not possible.

6. The method of claim 5, further comprising, prior to returning the error information indicating the inability to match:

acquiring a previous query question of the query question;

determining a slot field and a corresponding slot in the previous inquiry question according to the slot dictionary;

judging whether slots in the query question after pretreatment are all in the slots of the previous query question:

if yes, correspondingly replacing the slot field of the normalized query question into a previous query question, and taking the previous query question as the updated preprocessed query question;

if not, returning information indicating that the inquiry question is not an additional inquiry.

7. The method of claim 1, wherein after keyword recall and semantic recall of the normalized query question according to the canonical question list, and merging the result of the keyword recall and the result of the semantic recall, obtaining an optimal intent, further comprising:

Judging whether the slots of the query question after preprocessing comprise all slots required by the optimal intention or not:

if not, determining missing slots according to the slots of the query question after the preprocessing and all slots required by the optimal intention, and returning a request for supplementing the missing slots;

and receiving the corresponding missing slot values to obtain the optimized slot field of the query question after preprocessing.

8. A data acquisition device based on a service database, comprising:

the configuration initialization module is used for carrying out slot initialization and problem initialization based on the service database to obtain a slot dictionary and a specification problem list; the slot dictionary comprises a plurality of slots and a plurality of slot fields corresponding to each slot;

the text preprocessing module is used for carrying out entity searching on the acquired inquiry question according to the slot dictionary to obtain a preprocessed inquiry question;

the intention recognition module is used for determining a slot field and a corresponding slot in the query question after pretreatment according to the slot dictionary; normalizing the slot fields in the preprocessed query question according to the slots of the preprocessed query question to obtain a normalized query question; according to the standard problem list, carrying out keyword recall and semantic recall on the standardized query question sentence, and obtaining the optimal intention according to the keyword recall result and the semantic recall result;

The background query module is used for acquiring a corresponding configuration SQL template based on the optimal intention; generating a query sentence according to the configuration SQL template and the slot field of the preprocessed query question, and executing the query sentence in the service database to obtain query data.

9. A computer readable storage medium having instructions stored therein which, when run on a terminal, cause the terminal to perform the service database based data acquisition method according to any one of claims 1 to 7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of service database based data acquisition as claimed in any one of claims 1 to 7 when the computer program is executed.