CN117149951A

CN117149951A - An intelligent retrieval method, device, electronic equipment and storage medium

Info

Publication number: CN117149951A
Application number: CN202311093756.0A
Authority: CN
Inventors: 王宇琪; 唐焱; 张译; 谷鹏; 段海斌; 朱占生; 宋肖翔; 白汶鑫
Original assignee: Xinjiang Lianhai Ina Int Information Technology Ltd
Current assignee: Xinjiang Lianhai Ina Int Information Technology Ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2023-12-01

Abstract

The application discloses an intelligent retrieval method, a system, electronic equipment and a storage medium, which are characterized in that text identifiers in a document library corresponding to text input by a user in a pre-established inverted index list are read according to the text input by the user, and the text identifiers in the document library corresponding to the text input by the user read from the inverted index list are stored in a two-dimensional array, so that the matching degree of the text corresponding to the text identifiers in the document library stored in the two-dimensional array is calculated only during retrieval, and the matching degree of all the texts in the document library is not calculated, thereby greatly improving the retrieval efficiency.

Description

An intelligent retrieval method, device, electronic equipment and storage medium

技术领域Technical field

本申请涉及数据处理技术领域，尤其涉及一种智能检索方法、装置、电子设备及存储介质。This application relates to the field of data processing technology, and in particular to an intelligent retrieval method, device, electronic equipment and storage medium.

背景技术Background technique

随着智能检索技术的发展，越来越多的检索系统出现在大众视野中。尤其是应用在公安系统中，对于法律法规、案例以及相关场景的搜索，对于提升民警的办案效率有很大的帮助。With the development of intelligent retrieval technology, more and more retrieval systems have appeared in the public eye. Especially when used in the public security system, the search for laws, regulations, cases and related scenarios is of great help to improve the efficiency of police case handling.

然而，对于法律法规、案例以及相关场景的搜索来说，传统的检索方法都是通过获取用户输入的文本，然后将用户输入的文本与检索库中保存的所有文本逐字逐句进行对比，直到检索出与用户输入的文本匹配的文本才会输出检索结果，导致在办案过程中浪费了大量的时间用于检索，降低了民警的办案效率。However, for searches of laws, regulations, cases, and related scenarios, traditional retrieval methods obtain the text input by the user, and then compare the text entered by the user with all the texts saved in the retrieval database word by word until the relevant text is retrieved. The search results will be output only if the text input by the user matches the text, which leads to a lot of time wasted on retrieval during the case handling process and reduces the efficiency of the police case handling.

发明内容Contents of the invention

鉴于上述问题，本申请提供了一种智能检索方法、装置、电子设备及存储介质，以实现。具体方案如下：In view of the above problems, this application provides an intelligent retrieval method, device, electronic equipment and storage medium to implement. The specific plans are as follows:

一种智能检索方法，包括：An intelligent retrieval method including:

获取用户输入的文本；Get the text entered by the user;

依据用户输入的文本，读取预先建立的倒排索引列表中与用户输入的文本对应的文档库中的文本标识，并将从倒排索引列表中读取出的与用户输入的文本对应的文档库中的文本标识存储在二维数组中；Based on the text entered by the user, read the text identifier in the document library corresponding to the text entered by the user in the pre-established inverted index list, and read the document corresponding to the text entered by the user from the inverted index list Text identifiers in the library are stored in a two-dimensional array;

逐个计算二维数组中存储的文档库中的文本标识对应的文本与用户输入的文本的匹配度；Calculate the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one;

确定匹配度满足预设要求的文本，作为与用户输入的文本对应的检索结果，并输出。Determine the text whose matching degree meets the preset requirements, use it as a search result corresponding to the text input by the user, and output it.

可选地，预先建立倒排索引列表的过程包括：Optionally, the process of pre-building the inverted index list includes:

提取文档库中各个文本的关键信息，关键信息包括：关键字或关键词；Extract the key information of each text in the document library. The key information includes: keywords or keywords;

生成文档库中各个文本对应的倒排索引列表，倒排索引列表包含关键信息，以及，包含关键信息对应的文本标识。Generate an inverted index list corresponding to each text in the document library. The inverted index list contains key information and contains text identifiers corresponding to the key information.

可选地，将从倒排索引列表中读取出的与用户输入的文本对应的文档库中的文本标识存储在二维数组中包括：Optionally, storing the text identifier in the document library corresponding to the user-entered text read from the inverted index list in a two-dimensional array includes:

将从倒排索引中读取出的文档库中的文本存储在二维数组的行信息中，二维数组中的每一行信息表示文档库中的每一个文本信息；The text in the document library read from the inverted index is stored in the row information of the two-dimensional array. Each row of information in the two-dimensional array represents each text information in the document library;

将从倒排索引中读取出的文档库中的文本的长度信息存储在二维数组的列信息中，二维数组中的每一列信息表示文档库中的每一个文本的长度信息。The length information of the text in the document library read from the inverted index is stored in the column information of the two-dimensional array. Each column information in the two-dimensional array represents the length information of each text in the document library.

可选地，逐个计算二维数组中存储的文档库中的文本标识对应的文本与用户输入的文本的匹配度包括：Optionally, calculating the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one includes:

分别计算用户输入的文本与二维数组中存储的文档库中的文本标识对应的文本的最大公共字串长度，最大公共字串长度为用户输入的文本与二维数组中存储的文档库中的文本标识对应的文本之间相同且连续的字段长度；Calculate the maximum common string length of the text entered by the user and the text identifier stored in the document library stored in the two-dimensional array respectively. The maximum common string length is the text entered by the user and the text identifier stored in the document library stored in the two-dimensional array. The same and continuous field length between the texts corresponding to the text identifier;

利用最大公共子串长度和用户输入的文本长度以及二维数组中存储的文档库中的文本标识对应的文本长度，计算得到用户输入的文本与二维数组中存储的文档库中的文本标识对应的文本的匹配度。Using the maximum common substring length, the text length input by the user and the text length corresponding to the text identifier in the document library stored in the two-dimensional array, it is calculated that the text input by the user corresponds to the text identifier in the document library stored in the two-dimensional array The matching degree of the text.

可选地，利用最大公共子串长度和用户输入的文本长度以及二维数组中存储的文档库中的文本标识对应的文本长度，计算得到二维数组中存储的文档库中的文本标识对应的文本与用户输入的文本的匹配度包括：Optionally, use the maximum common substring length and the text length input by the user and the text length corresponding to the text identifier in the document library stored in the two-dimensional array to calculate the corresponding text identifier in the document library stored in the two-dimensional array. Matching of text to user-entered text includes:

通过计算匹配度的公式计算得到匹配度，计算匹配度的公式为：The matching degree is calculated through the formula for calculating the matching degree. The formula for calculating the matching degree is:

其中，length(lcs)为最大公共子串长度，length(a)为用户输入的文本长度，length(b)为二维数组中存储的文档库中的文本标识对应的文本长度。Among them, length(lcs) is the maximum common substring length, length(a) is the text length input by the user, and length(b) is the text length corresponding to the text identifier in the document library stored in the two-dimensional array.

可选地，确定匹配度满足预设要求的文本包括：Optionally, determining text whose matching degree meets preset requirements includes:

确定所有二维数组中存储的文档库中的文本标识对应的文本与用户输入的文本的匹配度，并将匹配度最高的文本，作为确定出的匹配度满足预设要求的文本。Determine the matching degree of the text corresponding to the text identifier in the document library stored in all two-dimensional arrays and the text input by the user, and use the text with the highest matching degree as the determined text whose matching degree meets the preset requirements.

判断匹配度最高的文本对应的匹配度是否超过设定阈值；Determine whether the matching degree corresponding to the text with the highest matching degree exceeds the set threshold;

其中，判断出匹配度最高的文本对应的匹配度超过设定阈值，则将匹配度最高的文本，作为确定出的匹配度满足预设要求的文本；Among them, if it is determined that the matching degree corresponding to the text with the highest matching degree exceeds the set threshold, then the text with the highest matching degree will be regarded as the text whose matching degree is determined to meet the preset requirements;

若判断出匹配度最高的文本对应的匹配度没有超过设定阈值，则本次检索返回失败。If it is determined that the matching degree corresponding to the text with the highest matching degree does not exceed the set threshold, then this retrieval will fail.

可选地，还包括：Optionally, also includes:

获取从倒排索引中读取出的文档库中的文本对应的文档名称列表信息，以及获取从倒排索引中读取出的文档库中的文本对应在二维数组中的每一行中最后一个与用户输入的文本匹配的文本对应的位置关系，并将信息和位置关系存储在二维数组中。Obtain the document name list information corresponding to the text in the document library read from the inverted index, and obtain the last one in each row of the two-dimensional array corresponding to the text in the document library read from the inverted index The positional relationship corresponding to the text that matches the text entered by the user, and stores the information and positional relationship in a two-dimensional array.

一种智能检索系统，包括：An intelligent retrieval system including:

获取模块，用于获取用户输入的文本；Get module, used to get the text input by the user;

读取模块，用于依据用户输入的文本，读取预先建立的倒排索引列表中与用户输入的文本对应的文档库中的文本标识，并将从倒排索引列表中读取出的与用户输入的文本对应的文档库中的文本标识存储在二维数组中；The reading module is used to read the text identifier in the document library corresponding to the text input by the user in the pre-established inverted index list based on the text input by the user, and read out the text from the inverted index list that is consistent with the user input. The text identifier in the document library corresponding to the input text is stored in a two-dimensional array;

计算模块，用于逐个计算二维数组中存储的文档库中的文本标识对应的文本与用户输入的文本的匹配度；The calculation module is used to calculate the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one;

输出模块，用于确定匹配度满足预设要求的文本，作为与用户输入的文本对应的检索结果，并输出。The output module is used to determine the text whose matching degree meets the preset requirements, as the retrieval result corresponding to the text input by the user, and output it.

一种电子设备，包括至少一个处理器和与处理器连接的存储器，其中：An electronic device including at least one processor and a memory connected to the processor, wherein:

存储器用于存储计算机程序；Memory is used to store computer programs;

处理器用于执行计算机程序，以使电子设备能够实现前述的智能检索方法。The processor is used to execute computer programs so that the electronic device can implement the aforementioned intelligent retrieval method.

一种计算机存储介质，存储介质承载有一个或多个计算机程序，当一个或多个计算机程序被电子设备执行时，能够使电子设备实现前述的智能检索方法。A computer storage medium carries one or more computer programs. When one or more computer programs are executed by an electronic device, the electronic device can implement the aforementioned intelligent retrieval method.

借由上述技术方案，本申请提供的一种智能检索方法中，由于本申请依据用户输入的文本，读取预先建立的倒排索引列表中与用户输入的文本对应的文档库中的文本标识，并将从倒排索引列表中读取出的与用户输入的文本对应的文档库中的文本标识存储在二维数组中，因此在检索时只需要对二维数组中存储的文档库中的文本标识对应的文本进行匹配度的计算，而不需要对文档库中的所有文本进行匹配度的计算，因此大大提高了检索效率。Through the above technical solution, in the intelligent retrieval method provided by this application, because this application reads the text identifier in the document library corresponding to the text input by the user in the pre-established inverted index list based on the text input by the user, And the text identifier in the document library corresponding to the user-entered text read from the inverted index list is stored in a two-dimensional array. Therefore, only the text in the document library stored in the two-dimensional array needs to be retrieved. The matching degree is calculated for the text corresponding to the logo, without the need to calculate the matching degree for all the texts in the document library, thus greatly improving the retrieval efficiency.

附图说明Description of the drawings

结合附图并参考以下具体实施方式，本申请各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中，相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的，原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of various embodiments of the present application will become more apparent with reference to the following detailed description taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It is to be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

图1为本申请提供的一种智能检索方法的流程图；Figure 1 is a flow chart of an intelligent retrieval method provided by this application;

图2为本申请提供的一种智能检索系统的界面示意图；Figure 2 is a schematic interface diagram of an intelligent retrieval system provided by this application;

图3为本申请提供的一种智能检索系统中场景搜索的界面示意图；Figure 3 is a schematic interface diagram of scene search in an intelligent retrieval system provided by this application;

图4为本申请提供的一种智能检索装置的结构示意图；Figure 4 is a schematic structural diagram of an intelligent retrieval device provided by this application;

图5为本申请提供的一种电子设备的结构示意图。Figure 5 is a schematic structural diagram of an electronic device provided by this application.

具体实施方式Detailed ways

下面将参照附图更详细地描述本申请的实施例。虽然附图中显示了本申请的某些实施例，然而应该理解的是，本申请可以通过各种形式来实现，而且不应该被解释为限于这里阐述的实施例，相反提供这些实施例是为了更加透彻和完整地理解本申请。应当理解的是，本申请的附图及实施例仅用于示例性作用，并非用于限制本申请的保护范围。Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather these embodiments are provided for Understand this application more thoroughly and completely. It should be understood that the drawings and embodiments of the present application are for illustrative purposes only and are not intended to limit the scope of protection of the present application.

本文使用的术语“包括”及其变形是开放性包括，即“包括但不限于”。术语“基于”是“至少部分地基于”；术语“一个实施例”表示“至少一个实施例”；术语“另一实施例”表示“至少一个另外的实施例”；术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "include" and its variations are open-ended, ie, "including but not limited to." The term "based on" means "based at least in part on"; the term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "At least some embodiments". Relevant definitions of other terms will be given in the description below.

需要注意，本申请中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分，并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as “first” and “second” mentioned in this application are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units. Or interdependence.

需要注意，本申请中提及的“一个”、“多个”的修饰是示意性而非限制性的，本领域技术人员应当理解，除非在上下文另有明确指出，否则应该理解为“一个或多个”。It should be noted that the modifications of "one" and "multiple" mentioned in this application are illustrative and not restrictive. Those skilled in the art will understand that unless the context clearly indicates otherwise, it should be understood as "one or Multiple”.

现有技术中，对于法律法规、案例以及相关场景的搜索来说，传统的检索方法都是通过获取用户输入的文本，然后将用户输入的文本与检索库中保存的所有文本逐字逐句进行对比，直到检索出与用户输入的文本匹配的文本才会输出检索结果，导致民警在办案过程中浪费了大量的时间用于检索，降低了民警的办案效率。In the existing technology, for the search of laws, regulations, cases and related scenarios, the traditional retrieval method is to obtain the text input by the user, and then compare the text input by the user with all the texts saved in the retrieval database, word by word, until The search results will only be output after retrieving text that matches the text entered by the user, which causes the police to waste a lot of time on retrieval during the case handling process and reduces the efficiency of the police case handling.

为了解决上述问题，本申请提供了一种智能检索方法、装置、电子设备及存储介质，以下将结合具体实施例对本申请做详细介绍。In order to solve the above problems, the present application provides an intelligent retrieval method, device, electronic device and storage medium. The present application will be introduced in detail below with reference to specific embodiments.

图1展示了本申请实施例提供的一种智能检索方法的流程图。具体包括以下几个步骤：Figure 1 shows a flow chart of an intelligent retrieval method provided by an embodiment of the present application. Specifically, it includes the following steps:

S10、获取用户输入的文本。S10. Get the text input by the user.

图2展示了本申请实施例提供的一种智能检索系统界面的示意图。用户可以通过点击菜单栏选择搜索类别，其中搜索类别包括法规搜索202、案例搜索203以及场景搜索204。每一栏对应不同的搜索场景，法规搜索202中可以搜索出应对不同复杂办案场景下的各种法律法规，案例搜索203中可以搜索出与办案民警正在办理的此案相类似的案例供办案民警参考，场景搜索204中可以搜索出与办案民警正在办理的此案相类似的场景供办案民警参考。Figure 2 shows a schematic diagram of an intelligent retrieval system interface provided by an embodiment of the present application. Users can select search categories by clicking on the menu bar, where search categories include regulation search 202, case search 203, and scenario search 204. Each column corresponds to a different search scenario. In Regulation Search 202, you can search for various laws and regulations that deal with different complex case handling scenarios. In Case Search 203, you can search for cases similar to the case being handled by the police handling the case. For reference, scene search 204 can search for scenes similar to the case being handled by the police handling the case for reference.

示例性的，用户可以通过点击菜单栏中的法规搜索202，在输入栏201中输入相关的法律法规的关键词，实现对相关法律法规的搜索。For example, the user can search for relevant laws and regulations by clicking the regulation search 202 in the menu bar and inputting keywords of relevant laws and regulations in the input field 201.

本申请实施例提供的智能检索系统中的法律法规库涵盖了国家的法律法规、地方的法律法规、国际条约和国际惯例以及司法解释等等，内容涉及社会生活中的各个领域。本申请实施例提供的智能检索系统通过用户输入的关键字就可以实现从海量的法律法规库中搜索相关法条，同时用户还可以通过点击搜索结果查看与该法条相关的两至三个案例，提高用户的搜索效率。The laws and regulations library in the intelligent retrieval system provided by the embodiment of this application covers national laws and regulations, local laws and regulations, international treaties and international practices, judicial interpretations, etc., and the content involves various fields in social life. The intelligent retrieval system provided by the embodiment of this application can search for relevant legal provisions from a massive database of laws and regulations through the keywords input by the user. At the same time, the user can also click on the search results to view two to three cases related to the legal provisions. , improve users’ search efficiency.

同样，用户可以通过点击菜单栏中的案例搜索203，在输入栏201中输入相关的案例的关键词，实现对相关案例的搜索。具体来说，本申请实施例提供的智能检索系统中的案例数据库中涵盖了公安民警中各个业务部门以及派出所常见的案件类型，用户可以通过输入相关案例的关键词实现从海量的案例数据库中搜索出与用户输入案例的相关案例。Similarly, the user can search for related cases by clicking case search 203 in the menu bar and entering the keywords of the related cases in the input field 201. Specifically, the case database in the intelligent retrieval system provided by the embodiment of this application covers common case types in various business departments of the public security police and police stations. Users can search from the massive case database by inputting keywords of relevant cases. Output related cases to the case entered by the user.

需要说明的是，本申请实施例提供的智能检索系统还支持用户对相关场景的搜索，用户可以通过点击菜单栏中的场景搜索204，在输入栏201中输入简单的关于案件场景的描述，其关于案件场景的描述的字数应当控制在500个字以内，从而实现对相关案例的场景的搜索。It should be noted that the intelligent retrieval system provided by the embodiment of the present application also supports users to search for related scenes. The user can click scene search 204 in the menu bar and enter a simple description of the case scene in the input field 201. The number of words describing the case scenario should be controlled within 500 words to enable searching for relevant case scenarios.

具体来说，用户通过对案例场景的搜索可以得到三部分的搜索结果，第一部分主要包括适用于用户搜索的案例场景的相关法律法规，用户可以通过智能检索系统提供的相关法律法规从中选择与用户所遇到的场景最相关的法律法规供用户参考；第二部分主要包括提供给用户两到三个与用户遇到的场景相似的场景涉及到的案例内容，用户可以从智能检索系统提供的两到三个案例内容中寻找与用户遇到的场景相关的内容，供用户进行选择和判断；第三部分主要包括对于用户遇到的场景适用的一些执法规范化提示，用户可以根据智能检索系统提供的执法规范化提示采取相应的应对措施。Specifically, users can obtain three parts of search results by searching for case scenarios. The first part mainly includes relevant laws and regulations applicable to the case scenarios that users search. Users can choose from the relevant laws and regulations provided by the intelligent retrieval system to match the user's search results. The most relevant laws and regulations for the scenarios encountered are for users’ reference; the second part mainly includes providing users with two to three case contents related to scenarios similar to the scenarios encountered by users. Users can choose from the two cases provided by the intelligent retrieval system. Go to the three case contents to find content related to the scenarios encountered by users for users to make choices and judgments; the third part mainly includes some standardized law enforcement tips applicable to the scenarios encountered by users. Users can follow the information provided by the intelligent retrieval system. Standardization of law enforcement prompts the adoption of corresponding countermeasures.

示例性的，图3为本申请提供的一种智能检索系统中场景搜索的界面示意图。用户可以通过点击菜单栏中的场景搜索204，在输入栏201中输入简单的关于案件场景的描述，其关于案件场景的描述的字数应当控制在500个字以内，从而实现对相关案例的场景的搜索，进入如图3所示的智能检索系统中场景搜索的界面示意图。For example, FIG. 3 is a schematic interface diagram of scene search in an intelligent retrieval system provided by this application. The user can click scene search 204 in the menu bar and enter a simple description of the case scene in the input field 201. The number of words of the description of the case scene should be controlled within 500 words, thereby realizing the scene search of the relevant case. Search and enter the interface diagram of scene search in the intelligent retrieval system as shown in Figure 3.

如图3所示，智能检索系统中场景搜索的界面包括搜索框301、相似案例302、信息提示框303以及适用的法律法规304。其中，搜索框301可以用于用户继续搜索关于相关案例的场景描述，用户可以通过再次输入详细描述相关案例场景的描述词去详细描述案例场景，智能检索系统根据用户输入的详细案例场景信息去搜索相关案例场景。相似案例302用于展示与用户搜索的相关的案例场景的简略描述，用户可以通过点击查看具体案例的搜索结果界面查看相关案例的详细展示。信息提示框303用于向办案民警提示一些针对相关案例的解决办法，例如办案民警可以通过点击查看各类伤情的急救措施以及相关做法。适用的法律法规304用于向用户展示与用户搜索的案例场景相关的法律法规，方便用户根据法律法规合理合法去处理相关案例，同时用户可以通过点击查看具体该法律法规的搜索结果界面。As shown in Figure 3, the scene search interface in the intelligent retrieval system includes a search box 301, similar cases 302, an information prompt box 303, and applicable laws and regulations 304. Among them, the search box 301 can be used by the user to continue searching for scene descriptions about related cases. The user can describe the case scene in detail by re-entering descriptors that describe the related case scene in detail. The intelligent retrieval system searches based on the detailed case scene information input by the user. Related case scenarios. Similar cases 302 are used to display brief descriptions of case scenarios related to the user's search. The user can view detailed displays of related cases by clicking on the search results interface to view specific cases. The information prompt box 303 is used to prompt the police handling the case with some solutions to the relevant cases. For example, the police handling the case can click to view the first aid measures and related practices for various injuries. Applicable laws and regulations 304 is used to display to the user the laws and regulations related to the case scenario that the user searches for, so that the user can reasonably and legally handle the relevant cases in accordance with the laws and regulations. At the same time, the user can click to view the search result interface of the specific laws and regulations.

还需要说明的是，本申请实施例提供的智能检索系统还支持用户自行录入案例或是执法规范化的提示信息。具体来说，用户可以通过点击智能检索系统界面中的我的资源库205进入用户的资源库界面。具体来说，在用户的资源库界面中，用户可以查看自行录入的案例信息或执法规范化提示信息。用户录入进资源库中的信息可以设置为仅自己可见，拥有高级权限的用户同样可以自行录入相关案例或执法规范化的提示信息，同样拥有高级权限的客户可以选定范围可见，是选定仅自己可见还是所有人可见都可以由用户自行选择。It should also be noted that the intelligent retrieval system provided by the embodiments of this application also supports users to enter cases or law enforcement standardized prompt information by themselves. Specifically, the user can enter the user's resource library interface by clicking My Resource Library 205 in the intelligent retrieval system interface. Specifically, in the user's resource library interface, users can view self-entered case information or law enforcement standardization prompt information. The information entered by users into the resource library can be set to be visible only to themselves. Users with advanced permissions can also enter relevant cases or law enforcement standardization prompt information by themselves. Customers who also have advanced permissions can select the range to be visible, and select only themselves. It is up to the user to choose whether it is visible or visible to everyone.

此外，本申请实施例提供的智能检索系统还支持用户的个人信息数据同步的功能，用户可以在不同的终端设备中登录同一账号，且不同设备之间的个人信息支持数据同步的功能，还支持用户收藏常用的法律法规以及案例等功能，同样还支持笔记功能，即用户可以通过笔记功能查看用户自己录入的案例信息。In addition, the intelligent retrieval system provided by the embodiment of the present application also supports the function of synchronizing the user's personal information data. The user can log in to the same account in different terminal devices, and the personal information between different devices supports the function of data synchronization. It also supports Users can collect commonly used laws, regulations, cases and other functions. It also supports the note-taking function, that is, users can view the case information entered by the user through the note-taking function.

S20、依据用户输入的文本，读取预先建立倒排索引列表中与用户输入的文本对应的文档库中的文本标识，并将从倒排索引中读取出的文档库中的文本标识存储在二维数组中。S20. According to the text input by the user, read the text identifier in the document library corresponding to the text input by the user in the pre-established inverted index list, and store the text identifier in the document library read from the inverted index in in a two-dimensional array.

本申请实施例提供的智能检索系统在对用户输入的文本进行检索时，需要依据用户输入的文本，读取预先建立的倒排索引列表中中与用户输的文本对应的文档库中的文本标识，并将从倒排索引列表中读取出的文档库中的文本标识存储在二维数组中。When the intelligent retrieval system provided by the embodiment of the present application searches for text input by the user, it needs to read the text identifier in the document library corresponding to the text input by the user in the pre-established inverted index list. , and store the text identifiers in the document library read from the inverted index list in a two-dimensional array.

需要说明的是，本申请实施例提供的智能检索系统的核心是采用倒排索引的方式处理文档库中的所有文本，具体来说，倒排索引是一种用于全文搜索的数据结构，它将文档库中的每个文本拆分成关键字或关键词的形式，并且将包含该关键字或关键词所有文档组成一个列表，用该列表去替换文档库中的所有文本。It should be noted that the core of the intelligent retrieval system provided by the embodiment of the present application is to use the inverted index to process all texts in the document library. Specifically, the inverted index is a data structure used for full-text search. Split each text in the document library into keywords or key words, and form a list of all documents containing the keyword or keywords, and use this list to replace all texts in the document library.

具体来说，倒排索引的数据结构通常由两部分组成：词典和倒排列表。其中，词典中存储的是将文档库中的文本拆分成的所有关键字或关键词，通常可以按照关键字或关键词的首字母或哈希值按照一定顺序进行存储，词典中的关键字或关键词对应倒排列表中的文档列表。倒排列表是倒排索引的核心数据结构，它用于记录每一个关键字或关键词在文档库中的哪一个文档中出现过，并记录相关的统计数据，例如在文档中出现的频率、在文档中的位置信息以及词频等信息。Specifically, the data structure of an inverted index usually consists of two parts: the dictionary and the inverted list. Among them, what is stored in the dictionary are all keywords or keywords that split the text in the document library into Or the keyword corresponds to the document list in the inverted list. The inverted list is the core data structure of the inverted index. It is used to record which document in the document library each keyword or keyword appears in, and record relevant statistical data, such as the frequency of occurrence in the document, Information such as location information and word frequency in the document.

还需要说明的是，本申请实施例提供的智能检索系统会依据用户输入的文本，读取倒排列表中与用户输入的文本对应的文档库中的文本标识，并将从倒排列表中读取出的文档库中的文本标识存储在二维数组中。具体来说，本申请实施例提供的智能检索系统会建立一个二维数组X*Y，其中，X代表二维数组的行数，Y代表二维数组的列数，该二维数组的行信息用于存储从倒排列表中读取出的文档库中的文本对应的文档列表，即从倒排列表中读取出的文档库中的文本对应的文档列表数与该二维数组的行数相等，该二维数组的列信息用于存储从倒排列表中读取出的文档库中的文本在倒排索引中存储的文档位置信息。It should also be noted that the intelligent retrieval system provided by the embodiment of the present application will read the text identifier in the document library corresponding to the user-entered text in the inverted list based on the text input by the user, and will read the text identifier from the inverted list. The text identifiers in the retrieved document library are stored in a two-dimensional array. Specifically, the intelligent retrieval system provided by the embodiment of this application will create a two-dimensional array X*Y, where X represents the number of rows of the two-dimensional array, Y represents the number of columns of the two-dimensional array, and the row information of the two-dimensional array Used to store the document list corresponding to the text in the document library read from the inverted list, that is, the number of document lists corresponding to the text in the document library read from the inverted list and the number of rows of the two-dimensional array Equal, the column information of this two-dimensional array is used to store the document position information stored in the inverted index of the text in the document library read from the inverted list.

S30、逐个计算二维数组中存储的文档库中的文本标识对应的文本与用户输入的文本的匹配度。S30. Calculate the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one.

本申请实施例中，需要逐个计算二维数组中存储的文档库中的文本标识对应的文本与用户输入的文本的匹配度。In the embodiment of the present application, it is necessary to calculate the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one.

需要说明的是，本申请实施例提供的智能检索系统在计算用户输入的文本与文档库中的文本的匹配度时，只需要计算存储在二维数组中的文档库中的文本，不需要将文档库中的所有文本与用户输入的文本计算匹配度，因此大大提高了检索效率。具体来说，计算用户输入的文本与文档库中的文本的匹配度的具体计算公式为It should be noted that when the intelligent retrieval system provided by the embodiment of the present application calculates the matching degree between the text input by the user and the text in the document library, it only needs to calculate the text in the document library stored in the two-dimensional array, and does not need to All text in the document library is calculated to match the text entered by the user, thus greatly improving retrieval efficiency. Specifically, the specific calculation formula for calculating the matching degree between the text input by the user and the text in the document library is:

式中，length用于表示文本的长度，length(a)表示文本a的长度，length(b)表示文本b的长度，length(lcs)表示文本a和文本b的最大公共子串长度。针对本申请实施例提供的智能检索系统中，length(a)可以对应用户输入的文本长度，length(b)可以对应文档库中的文本长度，length(lcs)可以表示用户输入的文本与文档库中的文本对应的最大公共子串长度In the formula, length is used to represent the length of text, length(a) represents the length of text a, length(b) represents the length of text b, and length(lcs) represents the maximum common substring length of text a and text b. In the intelligent retrieval system provided by the embodiment of the present application, length(a) may correspond to the length of the text input by the user, length(b) may correspond to the length of the text in the document library, and length(lcs) may represent the difference between the text input by the user and the document library. The maximum common substring length corresponding to the text in

需要说明的是，此处的lcs指的是最大公共子串长度，最大公共子串长度不同于最长公共子序列，最长公共子序列不要求子序列中的文字连续，而最大公共子串长度要求子串中的文字连续。It should be noted that lcs here refers to the maximum common substring length. The maximum common substring length is different from the longest common subsequence. The longest common subsequence does not require the text in the subsequence to be continuous, while the maximum common substring length The length requires that the text in the substring be continuous.

示例性的，字符串“ascde”和字符串“axcxdde”的最大公共子串是“de”，因为“de”在两个字符串中都是连续的，而最长公共子序列是“acde”，因为最长公共子序列不要求字符串中的字符是连续的，只要满足最长公共子序列在两个字符串中的前后顺序一致即可。因此，如果是计算字符串“ascde”和字符串“axcxdde”的匹配度，则通过上述计算匹配度的公式可以求出字符串“ascde”和字符串“axcxdde”的匹配度是0.11。For example, the largest common subsequence of the string "ascde" and the string "axcxdde" is "de", because "de" is continuous in both strings, and the longest common subsequence is "acde" , because the longest common subsequence does not require the characters in the string to be consecutive, as long as the longest common subsequence is in the same order in the two strings. Therefore, if the matching degree of the string "ascde" and the string "axcxdde" is calculated, the matching degree of the string "ascde" and the string "axcxdde" can be found to be 0.11 through the above formula for calculating the matching degree.

本申请实施例提供的智能检索系统就是通过计算用户输入的文本与二维数组中存储的文档库中的文本的匹配度，得到与用户输入的文本匹配度最高的文档库中的文本。The intelligent retrieval system provided by the embodiment of the present application calculates the matching degree between the text input by the user and the text in the document library stored in the two-dimensional array, and obtains the text in the document library that has the highest matching degree with the text input by the user.

需要说明的是，本申请实施例提供的智能检索系统中，倒排索引在处理文档库中的文本时，将文档库中的文本拆分成关键字的形式，并将包含该关键字的文档组成一个列表，形成倒排列表。因此，本申请实施例提供的智能检索系统在对用户输入的文本进行处理时，是将用户输入的文本拆分成关键字的形式去匹配用户输入的文本与文档库中的文本。It should be noted that in the intelligent retrieval system provided by the embodiment of the present application, when the inverted index processes the text in the document library, the text in the document library is split into the form of keywords, and the documents containing the keywords are Form a list to form an inverted list. Therefore, when processing the text input by the user, the intelligent retrieval system provided by the embodiment of the present application splits the text input by the user into the form of keywords to match the text input by the user with the text in the document library.

示例性的，用户输入的文本是“无斗殴”，并且通过对用户输入的文本进行分析，可以从倒排索引中查询到包含用户输入的文本的文档名称为250的“打架斗殴”以及文档名称为888的“斗殴”，并将文档名称为250的“打架斗殴”以及文档名称为888的“斗殴”存储在二维数组中。For example, the text entered by the user is "No Fight", and by analyzing the text entered by the user, the document name "Fight Fight" containing the text entered by the user can be queried from the inverted index 250 and the document name for "fight" of 888, and store "fight" with the document name of 250 and "brawl" with the document name of 888 in a two-dimensional array.

具体来说，针对用户输入的文本“无斗殴”可以将用户输入的文本拆分成以关键字的形式，即拆分成“无”、“斗”、“殴”的形式，也可以将用户输入的文本拆分成关键字加关键词的形式，即拆分成“无”和“斗殴”的形式。本申请实施例提供的智能检索系统在计算匹配度时采用的是将用户输入的文本拆分成以关键字加关键词的形式，即拆分成“无”和“斗殴”的形式。Specifically, for the user-entered text "无fight", the text input by the user can be split into the form of keywords, that is, into the form of "无", "fight", and "fight", or the user-entered text can be split into the form of keywords. The input text is split into the form of keyword plus keyword, that is, it is split into the form of "none" and "fight". When calculating the matching degree, the intelligent retrieval system provided by the embodiment of the present application splits the text input by the user into the form of keyword plus keyword, that is, into the form of "none" and "fight".

首先智能检索系统会处理“无斗殴”中的“无”字，从倒排索引中存储的文档名称来看，倒排索引中只存储了文档名称为250的“打架斗殴”以及文档名称为888的“斗殴”，并没有哪一个文档名称出现“无”字，因此不去计算“无斗殴”中的“无”字与二维数组中存储的文档库中的文本的匹配度，二维数组中的数据不作任何修改。First, the intelligent retrieval system will process the word "None" in "No Fight". Judging from the document names stored in the inverted index, the inverted index only stores the document name 250 "Fight Fight" and the document name 888 "Fight", the word "无" does not appear in any document name, so the matching degree between the word "None" in "No Fight" and the text in the document library stored in the two-dimensional array is not calculated. The two-dimensional array The data in will not be modified in any way.

其次智能检索系统会处理“无斗殴”中的关键词“斗殴”，从倒排索引中存储的文档名称来看，倒排索引中的文档名称为250的“打架斗殴”以及文档名称为888的“斗殴”均命中了用户输入的文本中的关键词“斗殴”，因此需要计算用户输入的文本中的关键词“斗殴”与倒排索引中的文档名称为250的“打架斗殴”以及文档名称为888的“斗殴”的匹配度。Secondly, the intelligent retrieval system will process the keyword "fight" in "no fight". Judging from the document names stored in the inverted index, the document name in the inverted index is "fight" with the name 250 and the document name 888. "Fighting" both hit the keyword "fighting" in the text entered by the user, so it is necessary to calculate the keyword "fighting" in the text entered by the user and the document name "fighting" with the name of 250 in the inverted index and the document name The matching degree of "brawl" for 888.

具体来说，在处理用户输入的文本的“无斗殴”中的关键词“斗殴”时，需要计算用户输入的文本的“无斗殴”中的关键词“斗殴”与倒排索引中的文档名称为250的“打架斗殴”的匹配度，首先需要确定用户输入的文本的“无斗殴”与倒排索引中的文档名称为250的“打架斗殴”的最大公共子串长度。根据最大公共子串长度的定义可知，最大公共子串长度要求子串中的文字连续，因此可以确定用户输入的文本的“无斗殴”与倒排索引中的文档名称为250的“打架斗殴”的最大公共子串长度为2，即length(lcs)可以等于2，接下来需要确定用户输入的文本的“无斗殴”与倒排索引中的文档名称为250的“打架斗殴”各自的文本长度，可知用户输入的文本的“无斗殴”的文本长度为3，倒排索引中的文档名称为250的“打架斗殴”的文本长度为4，因此length(a)可以等于3，length(b)可以等于4，将上述数值代入计算用户输入的文本与文档库中的文本的匹配度的公式可以得出用户输入的文本的“无斗殴”与倒排索引中的文档名称为250的“打架斗殴”的匹配度为0.33。Specifically, when processing the keyword "fight" in the "no fight" of the text entered by the user, it is necessary to calculate the keyword "fight" in the "no fight" of the text entered by the user and the document name in the inverted index To achieve a matching degree of "fighting" of 250, we first need to determine the maximum common substring length of "no fighting" in the text entered by the user and the document name "fighting" in the inverted index of 250. According to the definition of the maximum common substring length, it can be seen that the maximum common substring length requires the text in the substring to be continuous. Therefore, it can be determined that the "no fight" of the text entered by the user and the "fight" of the document name 250 in the inverted index can be determined The maximum common substring length is 2, that is, length(lcs) can be equal to 2. Next, you need to determine the respective text lengths of "no fighting" of the text entered by the user and "fighting" with a document name of 250 in the inverted index , it can be seen that the text length of "no fighting" of the text entered by the user is 3, and the text length of "fighting" of the document name 250 in the inverted index is 4, so length(a) can be equal to 3, length(b) It can be equal to 4. Substituting the above value into the formula for calculating the matching degree between the text entered by the user and the text in the document library, we can get the "no fight" of the text entered by the user and the "fight" of the document name 250 in the inverted index. "The matching degree is 0.33.

由于倒排索引中存储有两个与用户输入的文本相匹配的文档，因此还需计算用户输入的文本的“无斗殴”中的关键词“斗殴”与文档名称为888的“斗殴”的匹配度，同理，首先需要确定用户输入的文本的“无斗殴”与倒排索引中的文档名称为888的“斗殴”的最大公共子串长度为2，即length(lcs)可以等于2，其次确定用户输入的文本的“无斗殴”的文本长度为3，倒排索引中的文档名称为888的“斗殴”的文本长度为2，因此length(a)可以等于3，length(b)可以等于2，将上述数值代入计算用户输入的文本与文档库中的文本的匹配度的公式可以得出用户输入的文本的“无斗殴”与倒排索引中的文档名称为888的“斗殴”的匹配度为0.67。Since there are two documents matching the text entered by the user stored in the inverted index, it is also necessary to calculate the match between the keyword "fight" in "no fight" of the text entered by the user and the document name "fight" of 888 Degree, in the same way, firstly, it is necessary to determine that the maximum common substring length of "no fighting" of the text entered by the user and "fighting" of the document name 888 in the inverted index is 2, that is, length(lcs) can be equal to 2, and secondly Determine the text length of "no brawl" for the text entered by the user is 3, and the text length of "brawl" for the document name 888 in the inverted index is 2, so length(a) can be equal to 3 and length(b) can be equal to 2. Substituting the above values into the formula for calculating the matching degree between the text entered by the user and the text in the document library, we can get the match between "no fighting" of the text entered by the user and "fighting" of the document name 888 in the inverted index. The degree is 0.67.

S40、确定匹配度满足预设要求的文本，作为与用户输入的文本对应的检索结果，并输出。S40. Determine the text whose matching degree meets the preset requirements, use it as a search result corresponding to the text input by the user, and output it.

本申请实施例中，在上述步骤S30计算出二维数组中存储的文档库中的文本与用户输入的文本的匹配度之后，可以将计算出的文档库中的文本与用户输入的文本的匹配度从高到低进行排序，输出匹配度最高对应的文本。In the embodiment of the present application, after the above-mentioned step S30 calculates the matching degree between the text in the document library stored in the two-dimensional array and the text input by the user, the calculated matching degree between the text in the document library and the text input by the user can be calculated. The values are sorted from high to low, and the text corresponding to the highest matching degree is output.

示例性的，上述步骤S30中通过计算可以得到用户输入的文本的“无斗殴”与倒排索引中的文档名称为250的“打架斗殴”的匹配度为0.33和用户输入的文本的“无斗殴”与倒排索引中的文档名称为888的“斗殴”的匹配度为0.67，因此通过对比可以得出与用户输入的文本“无斗殴”匹配度最高的是倒排索引中的文档名称为888的“斗殴”，因此输出倒排索引中的文档名称为888的“斗殴”对应的文本。For example, in the above step S30, the matching degree of "no fighting" of the text input by the user and the "fighting and fighting" document name 250 in the inverted index can be obtained by calculation, and the matching degree of "no fighting" of the text entered by the user is 0.33. "The matching degree with the document name 888 in the inverted index is 0.67. Therefore, through comparison, it can be concluded that the highest matching degree with the user-entered text "No Fighting" is the document name 888 in the inverted index. "Fighting", so the document name in the output inverted index is the text corresponding to "Fighting" of 888.

在其他一些实施例中，还可以在输出匹配度最高的文本之前设置阈值范围，只有当匹配度最高的文本对应的匹配度超过预先设置的阈值范围之后，才会输出匹配度最高的文本，否则本次搜索结果将提醒用户文档库中的文本没有与用户输入的文本相匹配的文本，或者文档库中的文本与用户输入的文本的匹配度没有达到预设的阈值范围，因此需要提醒用户重新更换关键字或关键词进行搜索。In some other embodiments, a threshold range can also be set before outputting the text with the highest matching degree. Only when the matching degree corresponding to the text with the highest matching degree exceeds the preset threshold range, the text with the highest matching degree will be output. Otherwise, the text with the highest matching degree will be output. The search results will remind the user that the text in the document library does not match the text entered by the user, or the matching degree between the text in the document library and the text entered by the user does not reach the preset threshold range, so the user needs to be reminded to try again. Change keywords or keywords to search.

示例性的，本申请实施例可以将预先设定的阈值范围设置成只有匹配度大于0.8的文本才会输出，否则将提醒用户本次搜索无效，并提醒用户通过更换关键字或关键词的形式重新进行搜索，直到用户输入的文本可以搜索出符合阈值范围的文本即可。具体来说，上述步骤S30中通过计算可以得到用户输入的文本的“无斗殴”与倒排索引中的文档名称为250的“打架斗殴”的匹配度为0.33和用户输入的文本的“无斗殴”与倒排索引中的文档名称为888的“斗殴”的匹配度为0.67，因此倒排索引命中的两个文档名称计算出的匹配度均没有达到用户预先设定的阈值范围，因此本次搜索结果失败，需要提醒用户通过更换关键字或关键词的形式重新进行搜索，直到用户输入的文本可以搜索出符合阈值范围的文本即可。For example, the embodiment of the present application can set the preset threshold range so that only text with a matching degree greater than 0.8 will be output. Otherwise, the user will be reminded that the search is invalid and the user will be reminded to replace the keyword or keywords. Re-search until the text entered by the user can search for text that meets the threshold range. Specifically, in the above step S30, it can be obtained through calculation that the matching degree of "no fighting" of the text input by the user and the "fighting and fighting" of the document name 250 in the inverted index is 0.33 and the "no fighting" of the text input by the user "The matching degree with "Fight" with the document name 888 in the inverted index is 0.67. Therefore, the calculated matching degrees of the two document names hit by the inverted index do not reach the threshold range preset by the user. Therefore, this time If the search result fails, the user needs to be reminded to search again by changing the keyword or keywords until the text entered by the user can search for text that meets the threshold range.

本申请实施例通过依据用户输入的文本，读取倒排索引中与用户输入的文本对应的文档库中的文本，并将从倒排索引中读取出的文档库中的文本存储在二维数组中，因此在检索时只需要对二维数组中存储的文本进行匹配度的计算，而不需要对文档库中的所有文本进行匹配度的计算，因此大大提高了检索效率。The embodiment of the present application reads the text in the document library corresponding to the user-input text in the inverted index based on the text input by the user, and stores the text in the document library read from the inverted index in a two-dimensional Therefore, during retrieval, only the text stored in the two-dimensional array needs to be calculated for the matching degree, and there is no need to calculate the matching degree for all texts in the document library, thus greatly improving the retrieval efficiency.

需要说明的是，由于本申请实施例提供的智能检索方法需要在检索过程中遍历整个二维数组，且遍历二维数组的方式是复杂多变的，因此为了提高本申请实施例提供的智能检索系统的正确率和速度，在一些实施例中还对本申请提供的智能检索算法进行优化，具体方法如下：It should be noted that since the intelligent retrieval method provided by the embodiment of the present application needs to traverse the entire two-dimensional array during the retrieval process, and the method of traversing the two-dimensional array is complex and changeable, in order to improve the intelligent retrieval method provided by the embodiment of the present application In order to improve the accuracy and speed of the system, in some embodiments, the intelligent retrieval algorithm provided by this application is also optimized. The specific methods are as follows:

本申请实施例在遍历二维数组中存储的文档库中的文本与用户输入的文本的匹配度时，为了加快遍历二维数组的过程，可以在二维数组中额外增加两列信息用于保存文档库中的文本的其他信息。In this embodiment of the present application, when traversing the matching degree between the text in the document library stored in the two-dimensional array and the text input by the user, in order to speed up the process of traversing the two-dimensional array, two additional columns of information can be added to the two-dimensional array for storage. Additional information about the text in the document library.

示例性的，在二维数组中额外增加两列信息，其中一列信息用于保存从倒排索引中读取出的所有的文档库中的文本对应的文档名称所组成的列表，通过将文档库中的文本对应的文档名称组成列表保存在二维数组中，这样检索系统在遍历二维数组时就可以不用遍历二维数组中的所有行和所有列，只需遍历保存从倒排索引中读取出的所有的文档库中的文本对应的文档名称所组成的列表对应的那一列信息即可，大大节约了检索时间。同时，增加的另外一列信息用于保存从倒排索引中读取出的文档库中的文本对应在二维数组中的每一行中最后一个与用户输入的文本所匹配的文本对应的位置关系，这样检索系统在遍历二维数组时就可以不用遍历二维数组中所有行信息中存储的所有文本信息，只需要检索到二维数组中的每一行中最后一个与用户输入的文本所匹配的文本对应的位置关系，不再需要遍历每一行之后存储的文本信息，大大节约了检索时间。For example, two additional columns of information are added to the two-dimensional array. One column of information is used to save a list of document names corresponding to the texts in all document libraries read from the inverted index. By adding the document library The list of document names corresponding to the text in is stored in a two-dimensional array. In this way, the retrieval system does not need to traverse all rows and all columns in the two-dimensional array when traversing the two-dimensional array. It only needs to traverse and save and read from the inverted index. The column information corresponding to the list composed of the document names corresponding to the texts in all the document libraries can be retrieved, which greatly saves retrieval time. At the same time, another column of information is added to save the position relationship between the text in the document library read from the inverted index and the last text in each row in the two-dimensional array that matches the text entered by the user. In this way, when the retrieval system traverses the two-dimensional array, it does not need to traverse all the text information stored in all the row information in the two-dimensional array. It only needs to retrieve the last text in each row in the two-dimensional array that matches the text entered by the user. The corresponding position relationship no longer requires traversing the text information stored after each line, which greatly saves retrieval time.

需要指出的是，本申请实施例提供的智能检索系统中采用的数据库是分布式关系数据库，分布式关系数据库不同于传统数据库，传统的单服务器数据库在处理能力上只能垂直扩展，在数据量增长到一定程度时，往往传统的单服务器就不能满足需求。而分布式关系数据库的优势是容易扩充，且在最初创建分布式关系数据库之后，一个新的数据种类能被添加而不需要修改所有的现有应用软件。分布式关系数据库中一个应用程序可以对数据库进行透明操作，数据库中的数据分别在不同的局部数据库中存储、由不同的数据库管理系统进行管理、在不同的机器上运行、由不同的操作系统支持、被不同的通信网络连接在一起。It should be pointed out that the database used in the intelligent retrieval system provided by the embodiment of the present application is a distributed relational database. Distributed relational databases are different from traditional databases. Traditional single-server databases can only expand vertically in terms of processing capabilities. In terms of data volume, When it grows to a certain level, a traditional single server often cannot meet the demand. The advantage of a distributed relational database is that it is easy to expand, and after the distributed relational database is initially created, a new data type can be added without modifying all existing application software. In a distributed relational database, an application can perform transparent operations on the database. The data in the database are stored in different local databases, managed by different database management systems, run on different machines, and supported by different operating systems. , are connected together by different communication networks.

如图4所示，本申请实施例提供的一种智能检索装置包括：As shown in Figure 4, an intelligent retrieval device provided by an embodiment of the present application includes:

获取模块401，用于获取用户输入的文本；Obtain module 401, used to obtain the text input by the user;

读取模块402，用于依据用户输入的文本，读取预先建立的倒排索引列表中与用户输入的文本对应的文档库中的文本标识，并将从倒排索引列表中读取出的与用户输入的文本对应的文档库中的文本标识存储在二维数组中；The reading module 402 is configured to read the text identifier in the document library corresponding to the text input by the user in the pre-established inverted index list based on the text input by the user, and compare the text identifier read from the inverted index list with the text input by the user. The text identifier in the document library corresponding to the text entered by the user is stored in a two-dimensional array;

计算模块403，用于逐个计算二维数组中存储的文档库中的文本标识对应的文本与用户输入的文本的匹配度；The calculation module 403 is used to calculate the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one;

输出模块404，用于确定匹配度满足预设要求的文本，作为与用户输入的文本对应的检索结果，并输出。The output module 404 is used to determine the text whose matching degree meets the preset requirements as the search result corresponding to the text input by the user, and output it.

图5为本申请实施例提供的电子设备的示意图。FIG. 5 is a schematic diagram of an electronic device provided by an embodiment of the present application.

参考图5所示，其示出了适于用来实现本申请实施例中的电子设备的结构示意图。本申请实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、PDA(个人数字助理)、PAD(平板电脑)、台式计算机等等的固定终端。图5示出的电子设备仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。Referring to FIG. 5 , a schematic structural diagram of an electronic device suitable for implementing an embodiment of the present application is shown. Electronic devices in embodiments of the present application may include, but are not limited to, fixed terminals such as mobile phones, notebook computers, PDAs (personal digital assistants), PADs (tablet computers), desktop computers, and the like. The electronic device shown in FIG. 5 is only an example and should not impose any restrictions on the functions and scope of use of the embodiments of the present application.

如图5所示，该电子设备可以包括处理装置(例如中央处理器、图形处理器等)501，其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机存取存储器(RAM)503中的程序而执行各种适当的动作和处理。在电子设备通电的状态下，RAM 503中还存储有电子设备操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in Figure 5, the electronic device may include a processing device (such as a central processing unit, a graphics processor, etc.) 501, which may be loaded into a random access memory according to a program stored in a read-only memory (ROM) 502 or from a storage device 508. The program in the memory (RAM) 503 is retrieved to execute various appropriate actions and processes. When the electronic device is powered on, RAM 503 also stores various programs and data required for the operation of the electronic device. The processing device 501, the ROM 502 and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

通常，以下装置可以连接至I/O接口505：包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507；包括例如内存卡、硬盘等的存储装置508；以及通信装置509。通信装置509可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备，但是应理解的是，并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 507 such as a computer; a storage device 508 including a memory card, a hard disk, etc.; and a communication device 509. The communication device 509 may allow the electronic device to communicate wirelessly or wiredly with other devices to exchange data. Although FIG. 5 illustrates an electronic device having various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.

本申请的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this application, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

需要说明的是，本申请上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: electrical connections having one or more conductors, portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. As used herein, a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program code is carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.

尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题，但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反，上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

虽然在上面论述中包含了若干具体实现细节，但是这些不应当被解释为对本申请的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地，在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。Although several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the application. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本申请中所涉及的公开范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述公开构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an explanation of the technical principles used. Persons skilled in the art should understand that the disclosure scope involved in this application is not limited to technical solutions composed of specific combinations of the above technical features, but should also cover solutions consisting of the above technical features or without departing from the above disclosed concept. Other technical solutions formed by any combination of equivalent features. For example, a technical solution is formed by replacing the above features with technical features with similar functions disclosed in this application (but not limited to).

Claims

1. An intelligent retrieval method, characterized by including:

Get the text entered by the user;

According to the text input by the user, the text identifier in the document library corresponding to the text input by the user in the pre-established inverted index list is read, and the text identifier read from the inverted index list corresponding to the text input by the user is read. The text identifier in the document library corresponding to the text input by the user is stored in a two-dimensional array;

Calculate the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one;

The text whose matching degree meets the preset requirements is determined as a search result corresponding to the text input by the user, and is output.

2. The intelligent retrieval method according to claim 1, characterized in that the process of pre-establishing an inverted index list includes:

Extract key information of each text in the document library, where the key information includes: keywords or keywords;

Generate an inverted index list corresponding to each text in the document library, where the inverted index list includes the key information and a text identifier corresponding to the key information.

3. The intelligent retrieval method according to claim 1, characterized in that the text identifier in the document library corresponding to the text input by the user and read from the inverted index list is stored in two Dimensional arrays include:

The text in the document library read from the inverted index is stored in the row information of the two-dimensional array, and each row of information in the two-dimensional array represents each text information in the document library;

The length information of the text in the document library read from the inverted index is stored in the column information of the two-dimensional array, and each column information in the two-dimensional array represents each text in the document library. length information.

4. The intelligent retrieval method according to claim 1, wherein calculating the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one includes:

Calculate respectively the maximum common string length of the text corresponding to the text input by the user and the text identifier in the document library stored in the two-dimensional array, where the maximum common string length is the sum of the text input by the user and the text identifier stored in the two-dimensional array. The same and continuous field lengths between the texts corresponding to the text identifiers in the document library stored in the two-dimensional array;

Using the maximum common substring length, the text length input by the user, and the text length corresponding to the text identifier in the document library stored in the two-dimensional array, the text input by the user and the two-dimensional array are calculated. The matching degree of the text corresponding to the text identifier in the document library stored in .

5. The intelligent retrieval method according to claim 4, characterized in that the use of the maximum common substring length and the text length input by the user and the text identifier in the document library stored in the two-dimensional array The corresponding text length, the calculated matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user includes:

The matching degree is calculated by the formula for calculating the matching degree. The formula for calculating the matching degree is:

Wherein, the length(lcs) is the maximum common substring length, the length(a) is the text length input by the user, and the length(b) is the document library stored in the two-dimensional array. The text length corresponding to the text identifier.

6. The intelligent retrieval method according to claim 1, wherein determining the text whose matching degree meets the preset requirements includes:

Determine the matching degree of the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user, and use the text with the highest matching degree as the determined matching degree that meets the preset requirements text;

or;

Determine whether the matching degree corresponding to the text with the highest matching degree exceeds a set threshold;

Wherein, if it is determined that the matching degree corresponding to the text with the highest matching degree exceeds the set threshold, then the text with the highest matching degree will be used as the determined text whose matching degree meets the preset requirements;

If it is determined that the matching degree corresponding to the text with the highest matching degree does not exceed the set threshold, then this retrieval will fail.

7. The intelligent retrieval method according to claim 1, further comprising:

Obtaining the document name list information corresponding to the text in the document library read from the inverted index, and obtaining the two-dimensional representation of the text in the document library read from the inverted index. The position relationship corresponding to the last text in each row in the array that matches the text input by the user is stored in a two-dimensional array.

8. An intelligent retrieval device, characterized by comprising:

Get module, used to get the text input by the user;

A reading module, configured to read the text identifier in the document library corresponding to the text input by the user in the pre-established inverted index list based on the text input by the user, and read the text identifier from the inverted index list The read text identifier in the document library corresponding to the text input by the user is stored in a two-dimensional array;

A calculation module configured to calculate the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one;

An output module is used to determine the text whose matching degree meets the preset requirements as a retrieval result corresponding to the text input by the user, and output it.

9. An electronic device, characterized by comprising at least one processor and a memory connected to the processor, wherein:

The memory is used to store computer programs;

The processor is configured to execute the computer program so that the electronic device can implement the intelligent retrieval method according to any one of claims 1 to 7.

10. A computer storage medium, characterized in that the storage medium carries one or more computer programs, and when the one or more computer programs are executed by an electronic device, the electronic device can implement the claims as claimed in The intelligent retrieval method described in any one of 1 to 7.