WO2017161749A1 - Method and device for information matching - Google Patents
Method and device for information matching Download PDFInfo
- Publication number
- WO2017161749A1 WO2017161749A1 PCT/CN2016/088684 CN2016088684W WO2017161749A1 WO 2017161749 A1 WO2017161749 A1 WO 2017161749A1 CN 2016088684 W CN2016088684 W CN 2016088684W WO 2017161749 A1 WO2017161749 A1 WO 2017161749A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- fuzzy
- nodes
- level
- common
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
Definitions
- the present invention relates to computer technology, and in particular, to an information matching method and apparatus.
- fuzzy matching technology is needed to provide users with the most accurate search results.
- fuzzy matching of search terms can be achieved by using a depth-first search algorithm on the structure of the Trie tree.
- the Trie tree also known as the prefix tree or dictionary tree, is a common method for matching lookups. It can be used to sort and save a large number of strings and so on, so it is often used by search engine systems for text word frequency statistics. Its advantage is: use the common prefix of the string to reduce the query time, minimize the unnecessary string comparison, the query efficiency is higher than the hash tree.
- Depth-first search is a commonly used enumeration search method. The rules are simple. Starting from the root node, each possible branch path can't go deeper, and each node can only access once.
- the inventors have found that the existing retrieval method has too many times of searching and comparing, which results in a slow running speed, low efficiency, and poor user experience.
- the embodiments of the present invention provide an information matching method and apparatus, which improve efficiency in fuzzy retrieval.
- an information matching method comprising:
- the data in the preset database is stored in a Trie tree manner, and the Trie
- the tree includes a fuzzy node in addition to the ordinary node, and the fuzzy node is used to directly reach the child nodes of the common nodes in the same level when crossing the common nodes in the same level when the fuzzy search needs to be performed;
- the preset rules include:
- the data matching the information to be retrieved is searched in the preset database according to the depth-first algorithm.
- the fuzzy search is performed at this time, only the fuzzy node in the next node is selected.
- the fuzzy node is divided into a first-level fuzzy node and a second-level fuzzy node, and each common node of the Trie tree has a first-order fuzzy node, and each first-order fuzzy node has a second-order fuzzy node. All common nodes in the child nodes of the parent node of the first-level fuzzy node are linked to the current level one fuzzy node, and all the ordinary nodes in the child child nodes of the grandparent node of the second-level fuzzy node are linked to the current two Under the fuzzy node, the first-level fuzzy node as the leaf node stores the data stored in all the common nodes of the same parent, and the secondary fuzzy node as the leaf node stores the data stored in all the common nodes of the same grandfather. .
- the method further includes:
- determine whether it is a fuzzy search by:
- the information to be retrieved includes a fuzzy wildcard, when the search proceeds to the position of the fuzzy wildcard, it is determined that the fuzzy search is performed at this time.
- the data that is found by the output and matched with the information to be retrieved includes:
- an information matching apparatus comprising:
- An obtaining unit configured to obtain input information to be retrieved
- a matching unit configured to search for the information to be retrieved according to a preset rule in a preset database Matching data
- An output unit configured to output the found data that matches the information to be retrieved
- the data in the preset database is stored in a Trie tree manner, and the Trie tree includes a fuzzy node in addition to the common node, and the fuzzy node is used to cross each of the same levels when the fuzzy search needs to be performed.
- the brothers are ordinary nodes and directly reach the child nodes of the common nodes of the brothers in the same level;
- the preset rules include:
- the data matching the information to be retrieved is searched in the preset database according to the depth-first algorithm.
- the fuzzy search is performed at this time, only the fuzzy node in the next node is selected.
- the fuzzy node is divided into a first-level fuzzy node and a second-level fuzzy node, and each common node of the Trie tree has a first-order fuzzy node, and each first-order fuzzy node has a second-order fuzzy node. All common nodes in the child nodes of the parent node of the first-level fuzzy node are linked to the current level one fuzzy node, and all the ordinary nodes in the child child nodes of the grandparent node of the second-level fuzzy node are linked to the current two Under the fuzzy node, the first-level fuzzy node as the leaf node stores the data stored in all the common nodes of the same parent, and the secondary fuzzy node as the leaf node stores the data stored in all the common nodes of the same grandfather. .
- the device further includes:
- a default database generation unit for:
- Obtaining data for generating the preset database storing the data for generating the preset database according to a Trie tree manner; adding a first-level fuzzy node under each common node of the Trie tree, at each Adding a second-level fuzzy node under the first-level fuzzy node, linking all the common nodes in the child nodes of the parent node of the first-level fuzzy node to the current first-level fuzzy node, and sub-nodes of the grandparent node of the second-level fuzzy node All the common nodes in the child nodes are linked to the current secondary fuzzy nodes, and the first-level fuzzy nodes that are leaf nodes store the data stored by all the brothers of the same parent, and store them as the secondary fuzzy nodes of the leaf nodes. There is data stored by all the brothers' common nodes of the grandfather to generate the preset database.
- the matching unit determines whether the fuzzy search is performed by:
- the information to be retrieved includes a fuzzy wildcard, when the search proceeds to the position of the fuzzy wildcard, it is determined that the fuzzy search is performed at this time.
- the output unit is configured to:
- a computer storage medium is further provided, wherein the computer storage medium can store a program, and when the program is executed, the information matching method provided by the first aspect of the embodiment of the present invention can be implemented. Some or all of the steps in each implementation.
- the structure of the Trie tree is improved, and the fuzzy node is added to the common node of the Trie tree to make it become a data structure more suitable for fuzzy matching, and the number of times of searching and comparing can be greatly reduced when used. This improves the speed of fuzzy lookups.
- FIG. 1 is a schematic diagram of a Trie tree in the prior art
- FIG. 2 is a schematic diagram of a search sequence in a prior art search in a Trie tree
- FIG. 3 is a flowchart of an information matching method according to an exemplary embodiment of the present invention.
- FIG. 4 is a schematic diagram of a Trie tree with a fuzzy node, according to an exemplary embodiment of the present invention.
- FIG. 5 is a schematic diagram showing a fuzzy node link according to an exemplary embodiment of the present invention.
- FIG. 7 is a schematic diagram showing performing fuzzy search according to an exemplary embodiment of the present invention.
- FIG. 8 is a flowchart of an information matching method according to an exemplary embodiment of the present invention.
- FIG. 9 is a schematic diagram of an information matching apparatus according to an exemplary embodiment of the present invention.
- FIG. 10 is a schematic diagram of an information matching apparatus according to an exemplary embodiment of the present invention.
- the retrieval system will divide the "Brave Legend” into four single characters 'yong', ' ⁇ ', 'pass', 'say', and then start from the root node, retrieve the word 'yong', enter the node, from ' Starting from the byte point, search for the word 'holder', enter the node, and continue until you find all the words, hit the "Brave Legend", the search ends.
- the process can be seen in Figure 2, in Figure 2, next to the node
- the numbers represent the order in which the nodes are searched, the same below.
- FIG. 3 is a flowchart of an information matching method according to an exemplary embodiment.
- the method can be used for terminal devices such as mobile phones, tablet computers, desktop computers, and the like, and can also be used for servers and the like.
- the method may include:
- Step S301 Acquire the input information to be retrieved.
- Step S302 Search for data matching the to-be-retrieved information according to a preset rule in a preset database.
- the data in the preset database is stored in a Trie tree manner, and the Trie tree includes a fuzzy node in addition to the common node, and the fuzzy node is used to cross each of the same levels when the fuzzy search needs to be performed.
- the brothers are ordinary nodes and directly reach the child nodes of the common nodes of the brothers in the same level;
- the preset rules include:
- the data matching the information to be retrieved is searched in the preset database according to the depth-first algorithm.
- the fuzzy search is performed at this time, only the fuzzy node in the next node is selected.
- the common nodes of the siblings of the same level may be children of the same parent, or children of the same grandfather.
- the fuzzy node is divided into a first-level fuzzy node and a second-level fuzzy node, and each common node of the Trie tree has a first-order fuzzy node, and each first-order fuzzy node has a second-order fuzzy node. All common nodes in the child nodes of the parent node of the first-level fuzzy node are linked to the current level one fuzzy node, and all the ordinary nodes in the child child nodes of the grandparent node of the second-level fuzzy node are linked to the current Under the two-level fuzzy node, the first-level fuzzy node as the leaf node stores the data stored in all the common nodes of the same parent, and the secondary fuzzy node as the leaf node stores the information stored in all the common nodes of the same grandfather. data.
- the F1 node is a first-order fuzzy node
- the F2 node is a second-order fuzzy node.
- all the ordinary nodes in the child nodes of the parent node of the F1 node are linked to the current F1 node and become the child nodes of F1
- all the ordinary nodes in the child child nodes of the grandparent node of the F2 node are linked. Go to the current F2 node and become a child of F2.
- a first-order fuzzy node should be added under each common node.
- the fuzzy node has no meaning, so in FIG. 4
- some common nodes in Figure 5 do not draw fuzzy nodes. Taking the 'punch' byte as an example, there is no child node under this node, so even if an F1 node is added to the node, there is no ordinary node that can be linked under the F1 node, so the F1 node has no meaning. Therefore, the first-order fuzzy nodes are not drawn under the 'punch' byte points in Figures 4 and 5. Similarly, some secondary fuzzy nodes are not drawn under the first-level fuzzy nodes.
- Step S303 outputting the found data that matches the information to be retrieved.
- the data that is found to match the information to be retrieved by the output may include:
- the data stored in the Trie tree is described below in conjunction with an example.
- the storage in the Trie tree can be a string, such as a phrase composed of natural language.
- Data is usually stored primarily at leaf nodes. For example, for the phrase "Brave Legend", there are four common nodes, each of which is used to index 'yong' 'pass' 'pass' 'say', and the actual data is the term "Brave Legend”.
- Stored in the leaf node is the 'say' byte point.
- the ‘user’ byte can also store the word “ ⁇ ”.
- the data stored in the 'speak' byte point can include other related content, such as address, picture, profile, link, etc., in addition to the term “Brave Legend”, so that the user retrieves the "Brave Legend”. "You can further see other related content related to the search results.”
- the information to be retrieved includes a fuzzy wildcard, when the search proceeds to the position of the fuzzy wildcard, it is determined that the fuzzy search is performed at this time.
- the fuzzy wildcard can be "*”, “%”, and the like.
- step 3) Return to step 3) and continue the traversal of the child nodes of the 'yong' node, repeating 3) to 5). There is only one child node below the 'yong' node, so the traversal ends.
- step 2) Go back to step 2) and continue the traversal of the child nodes of the root node, repeating 2) to 6). There are three child nodes below the root node, which are traversed one by one.
- step e Repeat step e to complete the traversal.
- each node will have about 3,000 child nodes.
- the present invention avoids double traversal and optimizes speeds of approximately 3000 x 3000 times. This results in a significant increase in the speed of fuzzy matching, making the algorithm practically usable.
- the Trie tree fuzzy matching before optimization the search takes about 5 to 6 minutes, and the optimized fuzzy matching algorithm can be ignored (less than 1 millisecond).
- the method before acquiring the input information to be retrieved, the method further includes:
- Step S801 acquiring data for generating the preset database.
- Step S802 storing the data used to generate the preset database according to a Trie tree manner.
- Step S803 adding a first-level fuzzy node under the common node of each Trie tree, adding a second-level fuzzy node under each first-level fuzzy node, and all ordinary ones in the child nodes of the parent node of the first-level fuzzy node
- the nodes are all linked to the current first-level fuzzy node, and all the common nodes in the child sub-nodes of the grandfather node of the second-level fuzzy node are linked to the current second-level fuzzy node, and the first-level fuzzy node as the leaf node is stored.
- the data stored by all the siblings of the same parent, as the secondary fuzzy node of the leaf node stores the data stored by all the common nodes of the grandfather to generate the preset database.
- the embodiment of the invention improves the structure of the Trie tree, adds a fuzzy node to the common node of the Trie tree, and makes it become a data structure more suitable for fuzzy matching, which can greatly reduce the number of times of searching and comparing, thereby improving the number of times. Find speed.
- FIG. 9 is a schematic diagram of an information matching apparatus according to an exemplary embodiment.
- the device can be used for terminal devices such as mobile phones, tablet computers, desktop computers, and the like, and can also be used for servers and the like.
- the apparatus can include:
- the obtaining unit 901 is configured to obtain the input information to be retrieved.
- the matching unit 902 is configured to search for data matching the to-be-retrieved information according to a preset rule in a preset database.
- the data in the preset database is stored in a Trie tree manner, and the Trie tree includes a fuzzy node in addition to the common node, and the fuzzy node is used to cross each of the same levels when the fuzzy search needs to be performed.
- the brothers are ordinary nodes and directly reach the child nodes of the common nodes of the brothers in the same level;
- the preset rules include:
- the data matching the information to be retrieved is searched in the preset database according to the depth-first algorithm.
- the fuzzy search is performed at this time, only the fuzzy node in the next node is selected.
- the fuzzy node is divided into a first-level fuzzy node and a second-level fuzzy node, and each common node of the Trie tree has a first-order fuzzy node, and each first-order fuzzy node has a second-order fuzzy node.
- Point all common nodes in the child nodes of the parent node of the first-level fuzzy node are linked to the current level one fuzzy node, and all common nodes in the child child nodes of the grandparent node of the second-level fuzzy node are linked Under the current secondary fuzzy node, the first-level fuzzy node as the leaf node stores the data stored in all the common nodes of the same parent, and the secondary fuzzy node as the leaf node stores all the common nodes of the same grandfather. Stored data.
- the output unit 903 is configured to output the found data that matches the information to be retrieved.
- the output unit 903 can be specifically used to:
- the matching unit 902 can determine whether it is a fuzzy search by:
- the information to be retrieved includes a fuzzy wildcard, when the search proceeds to the position of the fuzzy wildcard, it is determined that the fuzzy search is performed at this time.
- the apparatus may further include:
- the preset database generating unit 904 is configured to:
- Obtaining data for generating the preset database storing the data for generating the preset database according to a Trie tree manner; adding a first-level fuzzy node under each common node of the Trie tree, at each Adding a second-level fuzzy node under the first-level fuzzy node, linking all the common nodes in the child nodes of the parent node of the first-level fuzzy node to the current first-level fuzzy node, and sub-nodes of the grandparent node of the second-level fuzzy node All the common nodes in the child nodes are linked to the current secondary fuzzy nodes, and the first-level fuzzy nodes that are leaf nodes store the data stored by all the brothers of the same parent, and store them as the secondary fuzzy nodes of the leaf nodes. There is data stored by all the brothers' common nodes of the grandfather to generate the preset database.
- the embodiment of the invention improves the structure of the Trie tree, adds a fuzzy node to the common node of the Trie tree, and makes it become a data structure more suitable for fuzzy matching, which can greatly reduce the number of times of searching and comparing, thereby improving the number of times. Find speed.
- the embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium can store a program, and when the program is executed, the implementation manners of an information matching method provided by the embodiments of FIG. 1 to FIG. 8 can be implemented. Part or all of the steps.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本申请要求于2016年3月21日提交中国专利局、申请号为2016101615591、发明名称为“一种信息匹配方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 2016101615591, the entire disclosure of which is hereby incorporated by reference. .
本发明涉及计算机技术,尤其涉及一种信息匹配方法及装置。The present invention relates to computer technology, and in particular, to an information matching method and apparatus.
当前很多网页或APP等都提供检索功能,用户在使用时可能会输入错误的查询词,或者是输入模糊的查询词,此时就需要使用模糊匹配技术,为用户提供尽可能正确的检索结果。在现有技术中,可以通过在Trie树的结构上使用深度优先搜索算法,来实现对检索词的模糊匹配。At present, many web pages or APPs provide search functions. Users may input incorrect query words or input fuzzy query words. In this case, fuzzy matching technology is needed to provide users with the most accurate search results. In the prior art, fuzzy matching of search terms can be achieved by using a depth-first search algorithm on the structure of the Trie tree.
Trie树又可称前缀树或字典树,是一种匹配查找的常用方法,可以用来排序和保存大量的字符串等信息,所以经常被搜索引擎系统用于文本词频统计。它的优点是:利用字符串的公共前缀来减少查询时间,最大限度地减少无谓的字符串比较,查询效率比哈希树高。深度优先搜索则是一种常用的枚举搜索法,其规则简单说来就是从根节点出发,对每一个可能的分支路径深入到不能再深入为止,且每个节点只能访问一次。The Trie tree, also known as the prefix tree or dictionary tree, is a common method for matching lookups. It can be used to sort and save a large number of strings and so on, so it is often used by search engine systems for text word frequency statistics. Its advantage is: use the common prefix of the string to reduce the query time, minimize the unnecessary string comparison, the query efficiency is higher than the hash tree. Depth-first search is a commonly used enumeration search method. The rules are simple. Starting from the root node, each possible branch path can't go deeper, and each node can only access once.
发明人在实现本发明的过程中发现,现有的这种检索方式在使用时查找和比较的次数太多,导致运行速度很慢,效率很低,用户体验不好。In the process of implementing the present invention, the inventors have found that the existing retrieval method has too many times of searching and comparing, which results in a slow running speed, low efficiency, and poor user experience.
发明内容Summary of the invention
为克服现有技术中存在的问题,本发明实施例提供一种信息匹配方法及装置,提高模糊检索时的效率。In order to overcome the problems in the prior art, the embodiments of the present invention provide an information matching method and apparatus, which improve efficiency in fuzzy retrieval.
根据本发明实施例的第一方面,提供一种信息匹配方法,所述方法包括:According to a first aspect of the embodiments of the present invention, an information matching method is provided, the method comprising:
获取输入的待检索信息;Obtain the input information to be retrieved;
在预设数据库中按照预设规则查找与所述待检索信息相匹配的数据;Searching, in the preset database, data matching the information to be retrieved according to a preset rule;
输出所查找到的与所述待检索信息相匹配的数据;Outputting the found data matching the information to be retrieved;
其中,所述预设数据库中的数据按照Trie树方式存储,并且,所述Trie 树中除了普通节点外还包括模糊节点,所述模糊节点用于在需要进行模糊查找时越过同一级中的各兄弟普通节点而直接达到所述同一级中的各兄弟普通节点的子节点;The data in the preset database is stored in a Trie tree manner, and the Trie The tree includes a fuzzy node in addition to the ordinary node, and the fuzzy node is used to directly reach the child nodes of the common nodes in the same level when crossing the common nodes in the same level when the fuzzy search needs to be performed;
所述预设规则包括:The preset rules include:
按照深度优先算法在所述预设数据库查找与所述待检索信息相匹配的数据,当需要选取下一个节点时,如果此时为模糊查找,则只选取下一个节点中的模糊节点。The data matching the information to be retrieved is searched in the preset database according to the depth-first algorithm. When the next node needs to be selected, if the fuzzy search is performed at this time, only the fuzzy node in the next node is selected.
可选的:Optional:
所述模糊节点分为一级模糊节点和二级模糊节点,所述Trie树的每个普通节点下带有一个一级模糊节点,每个一级模糊节点下带有一个二级模糊节点,所述一级模糊节点的父节点的子子节点中的所有普通节点均链接到当前一级模糊节点下,所述二级模糊节点的祖父节点的子子子节点中所有普通节点均链接到当前二级模糊节点下,作为叶子节点的一级模糊节点储存有同父的所有兄弟普通节点中所储存的数据,作为叶子节点的二级模糊节点储存有同祖父的所有兄弟普通节点中所储存的数据。The fuzzy node is divided into a first-level fuzzy node and a second-level fuzzy node, and each common node of the Trie tree has a first-order fuzzy node, and each first-order fuzzy node has a second-order fuzzy node. All common nodes in the child nodes of the parent node of the first-level fuzzy node are linked to the current level one fuzzy node, and all the ordinary nodes in the child child nodes of the grandparent node of the second-level fuzzy node are linked to the current two Under the fuzzy node, the first-level fuzzy node as the leaf node stores the data stored in all the common nodes of the same parent, and the secondary fuzzy node as the leaf node stores the data stored in all the common nodes of the same grandfather. .
可选的,在获取输入的待检索信息之前,所述方法还包括:Optionally, before obtaining the input information to be retrieved, the method further includes:
获取用于生成所述预设数据库的数据;Obtaining data for generating the preset database;
将所述用于生成所述预设数据库的数据按照Trie树方式存储;And storing the data used to generate the preset database according to a Trie tree manner;
在每个Trie树的普通节点下添加一个一级模糊节点,在每个一级模糊节点下添加一个二级模糊节点,将一级模糊节点的父节点的子子节点中的所有普通节点均链接到当前一级模糊节点下,将二级模糊节点的祖父节点的子子子节点中的所有普通节点均链接到当前二级模糊节点下,并令作为叶子节点的一级模糊节点储存有同父的所有兄弟普通节点所储存的数据,作为叶子节点的二级模糊节点储存有同祖父的所有兄弟普通节点所储存的数据,以生成所述预设数据库。Add a first-level fuzzy node under the common node of each Trie tree, add a second-level fuzzy node under each first-level fuzzy node, and link all common nodes in the child nodes of the parent node of the first-level fuzzy node. Under the current level one fuzzy node, all the common nodes in the child sub-nodes of the grandfather node of the second-level fuzzy node are linked to the current second-level fuzzy node, and the first-level fuzzy node as the leaf node is stored with the same parent. The data stored by all the brothers' ordinary nodes, as the secondary fuzzy nodes of the leaf nodes, store the data stored by all the brothers' ordinary nodes of the grandfather to generate the preset database.
可选的,通过如下方式判断是否为模糊查找:Optionally, determine whether it is a fuzzy search by:
如果所述待检索信息中包括模糊通配符,则当所述查找进行到所述模糊通配符的位置处时,判断为此时进行模糊查找。If the information to be retrieved includes a fuzzy wildcard, when the search proceeds to the position of the fuzzy wildcard, it is determined that the fuzzy search is performed at this time.
可选的,所述输出所查找到的与所述待检索信息相匹配的数据,包括:Optionally, the data that is found by the output and matched with the information to be retrieved includes:
输出所查找到的叶子节点中所存储的数据。Outputs the data stored in the leaf nodes found.
根据本发明实施例的第二方面,提供一种信息匹配装置,所述装置包括:According to a second aspect of the embodiments of the present invention, there is provided an information matching apparatus, the apparatus comprising:
获取单元,用于获取输入的待检索信息;An obtaining unit, configured to obtain input information to be retrieved;
匹配单元,用于在预设数据库中按照预设规则查找与所述待检索信息相匹 配的数据;a matching unit, configured to search for the information to be retrieved according to a preset rule in a preset database Matching data;
输出单元,用于输出所查找到的与所述待检索信息相匹配的数据;An output unit, configured to output the found data that matches the information to be retrieved;
其中,所述预设数据库中的数据按照Trie树方式存储,并且,所述Trie树中除了普通节点外还包括模糊节点,所述模糊节点用于在需要进行模糊查找时越过同一级中的各兄弟普通节点而直接达到所述同一级中的各兄弟普通节点的子节点;The data in the preset database is stored in a Trie tree manner, and the Trie tree includes a fuzzy node in addition to the common node, and the fuzzy node is used to cross each of the same levels when the fuzzy search needs to be performed. The brothers are ordinary nodes and directly reach the child nodes of the common nodes of the brothers in the same level;
所述预设规则包括:The preset rules include:
按照深度优先算法在所述预设数据库查找与所述待检索信息相匹配的数据,当需要选取下一个节点时,如果此时为模糊查找,则只选取下一个节点中的模糊节点。The data matching the information to be retrieved is searched in the preset database according to the depth-first algorithm. When the next node needs to be selected, if the fuzzy search is performed at this time, only the fuzzy node in the next node is selected.
可选的:Optional:
所述模糊节点分为一级模糊节点和二级模糊节点,所述Trie树的每个普通节点下带有一个一级模糊节点,每个一级模糊节点下带有一个二级模糊节点,所述一级模糊节点的父节点的子子节点中的所有普通节点均链接到当前一级模糊节点下,所述二级模糊节点的祖父节点的子子子节点中所有普通节点均链接到当前二级模糊节点下,作为叶子节点的一级模糊节点储存有同父的所有兄弟普通节点中所储存的数据,作为叶子节点的二级模糊节点储存有同祖父的所有兄弟普通节点中所储存的数据。The fuzzy node is divided into a first-level fuzzy node and a second-level fuzzy node, and each common node of the Trie tree has a first-order fuzzy node, and each first-order fuzzy node has a second-order fuzzy node. All common nodes in the child nodes of the parent node of the first-level fuzzy node are linked to the current level one fuzzy node, and all the ordinary nodes in the child child nodes of the grandparent node of the second-level fuzzy node are linked to the current two Under the fuzzy node, the first-level fuzzy node as the leaf node stores the data stored in all the common nodes of the same parent, and the secondary fuzzy node as the leaf node stores the data stored in all the common nodes of the same grandfather. .
可选的,所述装置还包括:Optionally, the device further includes:
预设数据库生成单元,用于:A default database generation unit for:
获取用于生成所述预设数据库的数据;将所述用于生成所述预设数据库的数据按照Trie树方式存储;在每个Trie树的普通节点下添加一个一级模糊节点,在每个一级模糊节点下添加一个二级模糊节点,将一级模糊节点的父节点的子子节点中的所有普通节点均链接到当前一级模糊节点下,将二级模糊节点的祖父节点的子子子节点中的所有普通节点均链接到当前二级模糊节点下,并令作为叶子节点的一级模糊节点储存有同父的所有兄弟普通节点所储存的数据,作为叶子节点的二级模糊节点储存有同祖父的所有兄弟普通节点所储存的数据,以生成所述预设数据库。Obtaining data for generating the preset database; storing the data for generating the preset database according to a Trie tree manner; adding a first-level fuzzy node under each common node of the Trie tree, at each Adding a second-level fuzzy node under the first-level fuzzy node, linking all the common nodes in the child nodes of the parent node of the first-level fuzzy node to the current first-level fuzzy node, and sub-nodes of the grandparent node of the second-level fuzzy node All the common nodes in the child nodes are linked to the current secondary fuzzy nodes, and the first-level fuzzy nodes that are leaf nodes store the data stored by all the brothers of the same parent, and store them as the secondary fuzzy nodes of the leaf nodes. There is data stored by all the brothers' common nodes of the grandfather to generate the preset database.
可选的,所述匹配单元通过如下方式判断是否为模糊查找:Optionally, the matching unit determines whether the fuzzy search is performed by:
如果所述待检索信息中包括模糊通配符,则当所述查找进行到所述模糊通配符的位置处时,判断为此时进行模糊查找。If the information to be retrieved includes a fuzzy wildcard, when the search proceeds to the position of the fuzzy wildcard, it is determined that the fuzzy search is performed at this time.
可选的,所述输出单元用于:Optionally, the output unit is configured to:
输出所查找到的叶子节点中所存储的数据。 Outputs the data stored in the leaf nodes found.
根据本发明实施例的第三方面,还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时可实现本发明实施例第一方面提供的一种信息匹配方法的各实现方式中的部分或全部步骤。According to a third aspect of the present invention, a computer storage medium is further provided, wherein the computer storage medium can store a program, and when the program is executed, the information matching method provided by the first aspect of the embodiment of the present invention can be implemented. Some or all of the steps in each implementation.
本发明的实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present invention may include the following beneficial effects:
本发明实施例对Trie树的结构进行了改进,在Trie树的普通节点的基础上添加了模糊节点,使之变为更适合模糊匹配的数据结构,使用时可大大减少查找和比较的次数,进而提高了模糊查找的速度。In the embodiment of the present invention, the structure of the Trie tree is improved, and the fuzzy node is added to the common node of the Trie tree to make it become a data structure more suitable for fuzzy matching, and the number of times of searching and comparing can be greatly reduced when used. This improves the speed of fuzzy lookups.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本发明。The above general description and the following detailed description are intended to be illustrative and not restrictive.
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in the specification of FIG
图1是现有技术中Trie树的示意图;1 is a schematic diagram of a Trie tree in the prior art;
图2是现有技术中在Trie树进行查找时的查找顺序示意图;2 is a schematic diagram of a search sequence in a prior art search in a Trie tree;
图3是根据本发明一示例性实施例示出的一种信息匹配方法的流程图;FIG. 3 is a flowchart of an information matching method according to an exemplary embodiment of the present invention; FIG.
图4是根据本发明一示例性实施例示出的带模糊节点的Trie树示意图;4 is a schematic diagram of a Trie tree with a fuzzy node, according to an exemplary embodiment of the present invention;
图5是根据本发明一示例性实施例示出的模糊节点链接示意图;FIG. 5 is a schematic diagram showing a fuzzy node link according to an exemplary embodiment of the present invention; FIG.
图6是现有技术中进行模糊查找的示意图;6 is a schematic diagram of performing fuzzy search in the prior art;
图7是根据本发明一示例性实施例示出的进行模糊查找的示意图;FIG. 7 is a schematic diagram showing performing fuzzy search according to an exemplary embodiment of the present invention; FIG.
图8是根据本发明一示例性实施例示出的一种信息匹配方法的流程图;FIG. 8 is a flowchart of an information matching method according to an exemplary embodiment of the present invention; FIG.
图9是根据本发明一示例性实施例示出的一种信息匹配装置的示意图;FIG. 9 is a schematic diagram of an information matching apparatus according to an exemplary embodiment of the present invention; FIG.
图10是根据本发明一示例性实施例示出的一种信息匹配装置的示意图。FIG. 10 is a schematic diagram of an information matching apparatus according to an exemplary embodiment of the present invention.
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. The following description refers to the same or similar elements in the different figures unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Instead, they are merely examples of devices and methods consistent with aspects of the invention as detailed in the appended claims.
首先对Trie树进行一下简单介绍。假设Trie树中存储的数据是“勇者大 冒险”、“勇者大冲关”、“勇者传说”、“圣域传说”、“神话传说”这些词条,参见图1所示。再假设某时刻用户输入的待查询关键词是“勇者传说”,则检索系统会将“勇者传说”分为4个单字‘勇’、‘者’、‘传’、‘说’,然后从根节点出发,检索‘勇’字,进入该节点,从‘勇’字节点出发,检索‘者’字,进入该节点,如此下去,直至发现全部单字,命中“勇者传说”,搜索结束。该过程可参见图2所示,在图2中,节点旁边的数字代表搜索节点时的顺序,下同。First, a brief introduction to the Trie tree. Suppose the data stored in the Trie tree is "the brave The words "Adventure", "The Brave", "The Legend of the Brave", "The Legend of the Sanctuary", and "Myths and Legends" are shown in Figure 1. It is assumed that the keyword to be queried by the user at a certain moment is "The Legend of the Brave" ", the retrieval system will divide the "Brave Legend" into four single characters 'yong', '者', 'pass', 'say', and then start from the root node, retrieve the word 'yong', enter the node, from ' Starting from the byte point, search for the word 'holder', enter the node, and continue until you find all the words, hit the "Brave Legend", the search ends. The process can be seen in Figure 2, in Figure 2, next to the node The numbers represent the order in which the nodes are searched, the same below.
图3是根据一示例性实施例示出的一种信息匹配方法的流程图。该方法可用于手机、平板电脑、台式机等终端设备,也可用于服务器等。FIG. 3 is a flowchart of an information matching method according to an exemplary embodiment. The method can be used for terminal devices such as mobile phones, tablet computers, desktop computers, and the like, and can also be used for servers and the like.
参见图3所示,该方法可以包括:Referring to FIG. 3, the method may include:
步骤S301,获取输入的待检索信息。Step S301: Acquire the input information to be retrieved.
例如当用户需要检索某信息时,可在手机的搜索框中输入待检索信息,然后点击检索按钮。For example, when the user needs to retrieve a certain information, enter the information to be retrieved in the search box of the mobile phone, and then click the search button.
步骤S302,在预设数据库中按照预设规则查找与所述待检索信息相匹配的数据。Step S302: Search for data matching the to-be-retrieved information according to a preset rule in a preset database.
其中,所述预设数据库中的数据按照Trie树方式存储,并且,所述Trie树中除了普通节点外还包括模糊节点,所述模糊节点用于在需要进行模糊查找时越过同一级中的各兄弟普通节点而直接达到所述同一级中的各兄弟普通节点的子节点;The data in the preset database is stored in a Trie tree manner, and the Trie tree includes a fuzzy node in addition to the common node, and the fuzzy node is used to cross each of the same levels when the fuzzy search needs to be performed. The brothers are ordinary nodes and directly reach the child nodes of the common nodes of the brothers in the same level;
所述预设规则包括:The preset rules include:
按照深度优先算法在所述预设数据库查找与所述待检索信息相匹配的数据,当需要选取下一个节点时,如果此时为模糊查找,则只选取下一个节点中的模糊节点。The data matching the information to be retrieved is searched in the preset database according to the depth-first algorithm. When the next node needs to be selected, if the fuzzy search is performed at this time, only the fuzzy node in the next node is selected.
同一级的各兄弟普通节点可以是同父的子节点,也可以是同祖父的子子节点。The common nodes of the siblings of the same level may be children of the same parent, or children of the same grandfather.
具体的,在本实施例或本发明其他某些实施例中:Specifically, in this embodiment or some other embodiments of the invention:
所述模糊节点分为一级模糊节点和二级模糊节点,所述Trie树的每个普通节点下带有一个一级模糊节点,每个一级模糊节点下带有一个二级模糊节点,所述一级模糊节点的父节点的子子节点中的所有普通节点均链接到当前一级模糊节点下,所述二级模糊节点的祖父节点的子子子节点中的所有普通节点均链接到当前二级模糊节点下,作为叶子节点的一级模糊节点储存有同父的所有兄弟普通节点中所储存的数据,作为叶子节点的二级模糊节点储存有同祖父的所有兄弟普通节点中所储存的数据。 The fuzzy node is divided into a first-level fuzzy node and a second-level fuzzy node, and each common node of the Trie tree has a first-order fuzzy node, and each first-order fuzzy node has a second-order fuzzy node. All common nodes in the child nodes of the parent node of the first-level fuzzy node are linked to the current level one fuzzy node, and all the ordinary nodes in the child child nodes of the grandparent node of the second-level fuzzy node are linked to the current Under the two-level fuzzy node, the first-level fuzzy node as the leaf node stores the data stored in all the common nodes of the same parent, and the secondary fuzzy node as the leaf node stores the information stored in all the common nodes of the same grandfather. data.
作为示例可参见图4~图5所示,在图4中,F1节点为一级模糊节点、F2节点为二级模糊节点。在图5中,F1节点的父节点的子子节点中的所有普通节点均链接到当前F1节点下,成为F1的子节点,F2节点的祖父节点的子子子节点中的所有普通节点均链接到当前F2节点下,成为F2的子节点。For example, as shown in FIG. 4 to FIG. 5, in FIG. 4, the F1 node is a first-order fuzzy node, and the F2 node is a second-order fuzzy node. In FIG. 5, all the ordinary nodes in the child nodes of the parent node of the F1 node are linked to the current F1 node and become the child nodes of F1, and all the ordinary nodes in the child child nodes of the grandparent node of the F2 node are linked. Go to the current F2 node and become a child of F2.
对于图4和图5需要说明的是,理论上每个普通节点下都应该增加一个一级模糊节点,然而实际中有时增加了一级模糊节点后,该模糊节点并无意义,所以在图4和图5中有些普通节点下并未绘出模糊节点。以‘冲’字节点为例,该节点下并无子子节点,所以即使为该节点添加了F1节点,该F1节点下也没有可以链接的普通节点,所以该F1节点也就无存在意义了,故图4和图5中‘冲’字节点下未绘出一级模糊节点。同理,有些一级模糊节点下也并未绘出二级模糊节点。It should be noted in FIG. 4 and FIG. 5 that, in theory, a first-order fuzzy node should be added under each common node. However, in practice, after the first-order fuzzy node is added, the fuzzy node has no meaning, so in FIG. 4 And some common nodes in Figure 5 do not draw fuzzy nodes. Taking the 'punch' byte as an example, there is no child node under this node, so even if an F1 node is added to the node, there is no ordinary node that can be linked under the F1 node, so the F1 node has no meaning. Therefore, the first-order fuzzy nodes are not drawn under the 'punch' byte points in Figures 4 and 5. Similarly, some secondary fuzzy nodes are not drawn under the first-level fuzzy nodes.
步骤S303,输出所查找到的与所述待检索信息相匹配的数据。Step S303, outputting the found data that matches the information to be retrieved.
作为示例,所述输出所查找到的与所述待检索信息相匹配的数据,可以包括:As an example, the data that is found to match the information to be retrieved by the output may include:
输出所查找到的叶子节点中所存储的数据。Outputs the data stored in the leaf nodes found.
下面结合示例对Trie树所存储的数据进行说明。Trie树中所述存储可以为字符串,例如由自然语言构成的词组。数据通常主要存储在叶子节点。例如对于“勇者传说”这一词组,对应四个普通节点,各节点分别用于索引‘勇’‘者’‘传’‘说’,而实际的数据即“勇者传说”这一词条,则存储在叶子节点即‘说’字节点中。另外,因为“勇者”一词也可以单独做一个词条,所以‘者’字节点也可以存储“勇者”一词。The data stored in the Trie tree is described below in conjunction with an example. The storage in the Trie tree can be a string, such as a phrase composed of natural language. Data is usually stored primarily at leaf nodes. For example, for the phrase "Brave Legend", there are four common nodes, each of which is used to index 'yong' 'pass' 'pass' 'say', and the actual data is the term "Brave Legend". Stored in the leaf node is the 'say' byte point. In addition, because the word "brave" can also be used as a single term, the ‘user’ byte can also store the word “勇勇”.
此外,‘说’字节点所存储的数据除了包括“勇者传说”这一词条外,还可以进一步包括其他相关内容,如地址、图片、简介、链接等等,这样用户检索到“勇者传说”时就可以进一步看到该检索结果所关联的其他相关内容了。In addition, the data stored in the 'speak' byte point can include other related content, such as address, picture, profile, link, etc., in addition to the term "Brave Legend", so that the user retrieves the "Brave Legend". "You can further see other related content related to the search results."
在本发明或本发明其他某些实施例中,可以通过如下方式判断是否为模糊查找:In the present invention or some other embodiments of the present invention, it may be determined whether it is a fuzzy search by:
如果所述待检索信息中包括模糊通配符,则当所述查找进行到所述模糊通配符的位置处时,判断为此时进行模糊查找。If the information to be retrieved includes a fuzzy wildcard, when the search proceeds to the position of the fuzzy wildcard, it is determined that the fuzzy search is performed at this time.
例如模糊通配符可以为“*”、“%”等。For example, the fuzzy wildcard can be "*", "%", and the like.
下面结合具体示例对使用深度优先算法的模糊检索过程进行进一步说明:The fuzzy retrieval process using the depth-first algorithm is further described below with reference to specific examples:
假设要检索的内容是“**传说”,其中‘*’代表任意字。如果是在普通Trie树上进行模糊检索,参见图6所示,其过程可以如下:Suppose the content to be retrieved is "** Legend", where ‘*’ stands for any word. If the fuzzy search is performed on the ordinary Trie tree, as shown in Figure 6, the process can be as follows:
1)将“**传说”分为4个单字,即‘*’、‘*’、‘传’、‘说’。 1) Divide the "** legend" into four words, namely ‘*’, ‘*’, ‘pass’, ‘say’.
2)从根节点出发,遍历其所有的子节点,首先进入节点‘勇’。2) Starting from the root node, traversing all its child nodes, first entering the node 'yong'.
3)由于第二个字也是‘*’,从‘勇’节点出发,遍历其所有节点,首先进入节点‘者’。3) Since the second word is also ‘*’, starting from the ‘yong’ node, traversing all of its nodes, first entering the node ‘者’.
4)从‘者’字节点出发,检索‘传’字,进入该节点。4) Starting from the ‘user’ byte point, search for the word 'pass' and enter the node.
5)从‘传’字节点出发,检索‘说’字,进入该节点,发现符合条件的词条“勇者传说”,于是记录下来。5) Starting from the ‘transmission’ byte point, search for the word “say”, enter the node, and find the eligible entry “Brave Legend”, and record it.
6)回到第3)步,继续‘勇’节点的子节点的遍历,重复3)~5)。在这里‘勇’节点下面只有一个子节点,所以遍历结束。6) Return to step 3) and continue the traversal of the child nodes of the 'yong' node, repeating 3) to 5). There is only one child node below the 'yong' node, so the traversal ends.
7)回到第2)步,继续根节点的子节点的遍历,重复2)~6)。在这里根节点下面有三个子节点,要一一遍历。7) Go back to step 2) and continue the traversal of the child nodes of the root node, repeating 2) to 6). There are three child nodes below the root node, which are traversed one by one.
8)统计所有记录下的符合条件的结果(“勇者传说”、“圣域传说”、“神话传说”),搜索结束。8) Statistics of all the recorded results ("Brave Legend", "Sanctuary Legend", "Myths and Legends"), the search ends.
而如果使用本发明中添加了模糊节点的Trie树,参见图7所示,其过程可以如下:However, if the Trie tree in which the fuzzy node is added in the present invention is used, as shown in FIG. 7, the process can be as follows:
a)将“**传说”分为4个单字,即‘*’、‘*’、‘传’、‘说’。a) Divide the “** Legend” into four words, namely ‘*’, ‘*’, ‘pass’, ‘say’.
b)从根节点出发,只进入到它的子节点‘F1’。b) Starting from the root node, only enter its child node 'F1'.
c)由于第二个字也是‘*’,从‘F1’节点出发,进入到它的子节点‘F2’。c) Since the second word is also ‘*’, proceed from the ‘F1’ node to its child node 'F2’.
d)从‘F2’节点出发,检索‘传’字,发现多个节点,遍历这些节点。d) Starting from the 'F2' node, the word 'pass' is retrieved and multiple nodes are found, traversing these nodes.
e)从第一个‘传’字节点出发,检索‘说’字,进入该节点,发现符合条件的词条“勇者传说”,记录下来。e) Starting from the first ‘transmission’ byte, search for the word “say”, enter the node, and find the eligible entry “Brave Legend” and record it.
f)重复第e步,完成遍历。f) Repeat step e to complete the traversal.
g)统计所有找到的符合条件的结果(“勇者传说”、“圣域传说”、“神话传说”),搜索结束。g) Count all the matching results found ("Brave Legend", "Sanctuary Legend", "Myths and Legends"), and the search ends.
通过对比图6和图7,可以明确看出本发明检索速度的优势,因为需要比较的次数大大减少,在需要进行模糊查找时会越过同一级中的各兄弟普通节点而直接达到所述同一级中的各兄弟普通节点的子节点。By comparing FIG. 6 and FIG. 7, the advantage of the retrieval speed of the present invention can be clearly seen, because the number of times of comparison needs to be greatly reduced, and when the fuzzy search is needed, the common nodes of the siblings in the same level are directly reached to reach the same level. The child nodes of the common nodes in each of the brothers.
在现实搜索中,服务器上会储存大量的数据,一般每个节点下面将会由3000左右的子节点。本发明避免两重遍历,优化了约3000x3000倍的速度。这样使得模糊匹配的速度有了显著的提升,使得算法变得实际可用。当有80万数据量的时候,优化前的Trie树模糊匹配,一次搜索大约用时5~6分钟,优化后的模糊匹配算法搜索用时可忽略不计(小于1毫秒)。In the real-time search, a large amount of data is stored on the server, and generally each node will have about 3,000 child nodes. The present invention avoids double traversal and optimizes speeds of approximately 3000 x 3000 times. This results in a significant increase in the speed of fuzzy matching, making the algorithm practically usable. When there is 800,000 data, the Trie tree fuzzy matching before optimization, the search takes about 5 to 6 minutes, and the optimized fuzzy matching algorithm can be ignored (less than 1 millisecond).
此外参见图8所示,在获取输入的待检索信息之前,所述方法还包括:In addition, as shown in FIG. 8, before acquiring the input information to be retrieved, the method further includes:
步骤S801,获取用于生成所述预设数据库的数据。 Step S801, acquiring data for generating the preset database.
步骤S802,将所述用于生成所述预设数据库的数据按照Trie树方式存储。Step S802, storing the data used to generate the preset database according to a Trie tree manner.
步骤S803,在每个Trie树的普通节点下添加一个一级模糊节点,在每个一级模糊节点下添加一个二级模糊节点,将一级模糊节点的父节点的子子节点中的所有普通节点均链接到当前一级模糊节点下,将二级模糊节点的祖父节点的子子子节点中所有普通节点均链接到当前二级模糊节点下,并令作为叶子节点的一级模糊节点储存有同父的所有兄弟普通节点所储存的数据,作为叶子节点的二级模糊节点储存有同祖父的所有兄弟普通节点所储存的数据,以生成所述预设数据库。Step S803, adding a first-level fuzzy node under the common node of each Trie tree, adding a second-level fuzzy node under each first-level fuzzy node, and all ordinary ones in the child nodes of the parent node of the first-level fuzzy node The nodes are all linked to the current first-level fuzzy node, and all the common nodes in the child sub-nodes of the grandfather node of the second-level fuzzy node are linked to the current second-level fuzzy node, and the first-level fuzzy node as the leaf node is stored. The data stored by all the siblings of the same parent, as the secondary fuzzy node of the leaf node, stores the data stored by all the common nodes of the grandfather to generate the preset database.
本发明实施例对Trie树的结构进行了改进,为Trie树的普通节点添加了模糊节点,使之变为更适合模糊匹配的数据结构,使用时可大大减少查找和比较的次数,从而提高了查找速度。The embodiment of the invention improves the structure of the Trie tree, adds a fuzzy node to the common node of the Trie tree, and makes it become a data structure more suitable for fuzzy matching, which can greatly reduce the number of times of searching and comparing, thereby improving the number of times. Find speed.
下述为本发明装置实施例,可以用于执行本发明方法实施例。对于本发明装置实施例中未披露的细节,请参照本发明方法实施例。The following is an embodiment of the apparatus of the present invention, which can be used to carry out the method embodiments of the present invention. For details not disclosed in the embodiment of the device of the present invention, please refer to the method embodiment of the present invention.
图9是根据一示例性实施例示出的一种信息匹配装置的示意图。该装置可用于手机、平板电脑、台式机等终端设备,也可用于服务器等。FIG. 9 is a schematic diagram of an information matching apparatus according to an exemplary embodiment. The device can be used for terminal devices such as mobile phones, tablet computers, desktop computers, and the like, and can also be used for servers and the like.
参见图9所示,该装置可以包括:Referring to Figure 9, the apparatus can include:
获取单元901,用于获取输入的待检索信息。The obtaining
例如当用户需要检索某信息时,可在手机的搜索框中输入待检索信息,然后点击检索按钮。For example, when the user needs to retrieve a certain information, enter the information to be retrieved in the search box of the mobile phone, and then click the search button.
匹配单元902,用于在预设数据库中按照预设规则查找与所述待检索信息相匹配的数据。The
其中,所述预设数据库中的数据按照Trie树方式存储,并且,所述Trie树中除了普通节点外还包括模糊节点,所述模糊节点用于在需要进行模糊查找时越过同一级中的各兄弟普通节点而直接达到所述同一级中的各兄弟普通节点的子节点;The data in the preset database is stored in a Trie tree manner, and the Trie tree includes a fuzzy node in addition to the common node, and the fuzzy node is used to cross each of the same levels when the fuzzy search needs to be performed. The brothers are ordinary nodes and directly reach the child nodes of the common nodes of the brothers in the same level;
所述预设规则包括:The preset rules include:
按照深度优先算法在所述预设数据库查找与所述待检索信息相匹配的数据,当需要选取下一个节点时,如果此时为模糊查找,则只选取下一个节点中的模糊节点。The data matching the information to be retrieved is searched in the preset database according to the depth-first algorithm. When the next node needs to be selected, if the fuzzy search is performed at this time, only the fuzzy node in the next node is selected.
具体的,在本实施例或本发明其他某些实施例中:Specifically, in this embodiment or some other embodiments of the invention:
所述模糊节点分为一级模糊节点和二级模糊节点,所述Trie树的每个普通节点下带有一个一级模糊节点,每个一级模糊节点下带有一个二级模糊节 点,所述一级模糊节点的父节点的子子节点中的所有普通节点均链接到当前一级模糊节点下,所述二级模糊节点的祖父节点的子子子节点中所有普通节点均链接到当前二级模糊节点下,作为叶子节点的一级模糊节点储存有同父的所有兄弟普通节点中所储存的数据,作为叶子节点的二级模糊节点储存有同祖父的所有兄弟普通节点中所储存的数据。The fuzzy node is divided into a first-level fuzzy node and a second-level fuzzy node, and each common node of the Trie tree has a first-order fuzzy node, and each first-order fuzzy node has a second-order fuzzy node. Point, all common nodes in the child nodes of the parent node of the first-level fuzzy node are linked to the current level one fuzzy node, and all common nodes in the child child nodes of the grandparent node of the second-level fuzzy node are linked Under the current secondary fuzzy node, the first-level fuzzy node as the leaf node stores the data stored in all the common nodes of the same parent, and the secondary fuzzy node as the leaf node stores all the common nodes of the same grandfather. Stored data.
输出单元903,用于输出所查找到的与所述待检索信息相匹配的数据。The
作为示例,所述输出单元903具体可以用于:As an example, the
输出所查找到的叶子节点中所存储的数据。Outputs the data stored in the leaf nodes found.
在本发明或本发明其他某些实施例中,所述匹配单元902可以通过如下方式判断是否为模糊查找:In the present invention or some other embodiments of the present invention, the
如果所述待检索信息中包括模糊通配符,则当所述查找进行到所述模糊通配符的位置处时,判断为此时进行模糊查找。If the information to be retrieved includes a fuzzy wildcard, when the search proceeds to the position of the fuzzy wildcard, it is determined that the fuzzy search is performed at this time.
此外,参见图10所示,所述装置还可以包括:In addition, referring to FIG. 10, the apparatus may further include:
预设数据库生成单元904,用于:The preset
获取用于生成所述预设数据库的数据;将所述用于生成所述预设数据库的数据按照Trie树方式存储;在每个Trie树的普通节点下添加一个一级模糊节点,在每个一级模糊节点下添加一个二级模糊节点,将一级模糊节点的父节点的子子节点中的所有普通节点均链接到当前一级模糊节点下,将二级模糊节点的祖父节点的子子子节点中的所有普通节点均链接到当前二级模糊节点下,并令作为叶子节点的一级模糊节点储存有同父的所有兄弟普通节点所储存的数据,作为叶子节点的二级模糊节点储存有同祖父的所有兄弟普通节点所储存的数据,以生成所述预设数据库。Obtaining data for generating the preset database; storing the data for generating the preset database according to a Trie tree manner; adding a first-level fuzzy node under each common node of the Trie tree, at each Adding a second-level fuzzy node under the first-level fuzzy node, linking all the common nodes in the child nodes of the parent node of the first-level fuzzy node to the current first-level fuzzy node, and sub-nodes of the grandparent node of the second-level fuzzy node All the common nodes in the child nodes are linked to the current secondary fuzzy nodes, and the first-level fuzzy nodes that are leaf nodes store the data stored by all the brothers of the same parent, and store them as the secondary fuzzy nodes of the leaf nodes. There is data stored by all the brothers' common nodes of the grandfather to generate the preset database.
本发明实施例对Trie树的结构进行了改进,为Trie树的普通节点添加了模糊节点,使之变为更适合模糊匹配的数据结构,使用时可大大减少查找和比较的次数,从而提高了查找速度。The embodiment of the invention improves the structure of the Trie tree, adds a fuzzy node to the common node of the Trie tree, and makes it become a data structure more suitable for fuzzy matching, which can greatly reduce the number of times of searching and comparing, thereby improving the number of times. Find speed.
关于上述实施例中的装置,其中各个单元\模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。With regard to the apparatus in the above embodiment, the specific manner in which each unit\module performs an operation has been described in detail in the embodiment relating to the method, and will not be explained in detail herein.
本发明实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时可实现图1~图8所涉及实施例提供的一种信息匹配方法的各实现方式中的部分或全部步骤。The embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium can store a program, and when the program is executed, the implementation manners of an information matching method provided by the embodiments of FIG. 1 to FIG. 8 can be implemented. Part or all of the steps.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本发 明的其它实施方案。本申请旨在涵盖本发明的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本发明未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本发明的真正范围和精神由所附的权利要求指出。Those skilled in the art will readily appreciate the present invention after considering the specification and practicing the invention disclosed herein. Other embodiments of the invention. The present application is intended to cover any variations, uses, or adaptations of the present invention, which are in accordance with the general principles of the present invention and include common general knowledge or conventional technical means in the art that are not disclosed in the present invention. . The specification and examples are to be considered as illustrative only,
应当理解的是,本发明并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本发明的范围仅由所附的权利要求来限制。 It is to be understood that the invention is not limited to the details of the details of The scope of the invention is limited only by the appended claims.
Claims (10)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610161559.1 | 2016-03-21 | ||
| CN201610161559.1A CN105843882A (en) | 2016-03-21 | 2016-03-21 | Information matching method and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017161749A1 true WO2017161749A1 (en) | 2017-09-28 |
Family
ID=56588107
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/088684 Ceased WO2017161749A1 (en) | 2016-03-21 | 2016-07-05 | Method and device for information matching |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN105843882A (en) |
| WO (1) | WO2017161749A1 (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111160200A (en) * | 2019-12-23 | 2020-05-15 | 浙江大华技术股份有限公司 | Method and device for establishing passerby library |
| CN114758727A (en) * | 2022-04-26 | 2022-07-15 | 安徽农业大学 | Retrieval method based on carried cache Trie tree accelerated biological gene |
| CN114861608A (en) * | 2022-05-27 | 2022-08-05 | 中国第一汽车股份有限公司 | A data processing method, apparatus, device and storage medium |
| CN115238132A (en) * | 2022-06-24 | 2022-10-25 | 新华三信息安全技术有限公司 | Matching method and device of pattern strings |
| CN117150095A (en) * | 2023-09-12 | 2023-12-01 | 北京云枢创新软件技术有限公司 | Hierarchical tree node search methods, electronic devices and media |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018090253A1 (en) * | 2016-11-16 | 2018-05-24 | 深圳达闼科技控股有限公司 | Message matching method, message matching apparatus, computer program product, and electronic device |
| CN108985672A (en) * | 2017-06-01 | 2018-12-11 | 北京京东尚科信息技术有限公司 | Information output method and device |
| CN108334491B (en) * | 2017-09-08 | 2020-07-31 | 腾讯科技(深圳)有限公司 | Text analysis method and device, computing equipment and storage medium |
| CN109034938B (en) * | 2018-06-11 | 2022-07-05 | 广东因特利信息科技股份有限公司 | Information rapid screening and matching method, device, electronic device and storage medium |
| US20230020080A1 (en) * | 2021-04-12 | 2023-01-19 | Adishesh Kishore | Relationship builder to relate data across multiple entities/nodes |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6594655B2 (en) * | 2001-01-04 | 2003-07-15 | Ezchip Technologies Ltd. | Wildcards in radix- search tree structures |
| CN101083608A (en) * | 2006-05-30 | 2007-12-05 | 华为技术有限公司 | Method for enquiring node information of equipment management tree and its terminal equipment |
| CN101286988A (en) * | 2008-04-18 | 2008-10-15 | 北京启明星辰信息技术股份有限公司 | Parallel multi-mode matching method and system therefor |
| CN101807184A (en) * | 2009-02-16 | 2010-08-18 | 阿尔卡特朗讯 | Method for searching character string with wildcard character and system thereof |
| CN102360385A (en) * | 2011-10-14 | 2012-02-22 | 盛乐信息技术(上海)有限公司 | File searching method and system |
| CN102646115A (en) * | 2012-02-17 | 2012-08-22 | 北京星网锐捷网络技术有限公司 | Method and device for constructing AC (aho-corasick) state machine |
| CN103020052A (en) * | 2011-09-20 | 2013-04-03 | 北京百度网讯科技有限公司 | Method and device for recognizing search demand |
| CN103778124A (en) * | 2012-10-17 | 2014-05-07 | 北大方正集团有限公司 | Tree structure query method and device |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2509496A1 (en) * | 2005-06-06 | 2006-12-06 | 3618633 Canada Inc. | Search-enhanced trie-based syntactic pattern recognition of sequences |
| CN102737060B (en) * | 2011-04-14 | 2017-09-12 | 商业对象软件有限公司 | Searching for generally in geocoding application |
| CN103209126B (en) * | 2012-01-11 | 2015-10-28 | 深圳市东进软件开发有限公司 | A kind of number analysis method and system with fuzzy diagnosis function |
-
2016
- 2016-03-21 CN CN201610161559.1A patent/CN105843882A/en active Pending
- 2016-07-05 WO PCT/CN2016/088684 patent/WO2017161749A1/en not_active Ceased
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6594655B2 (en) * | 2001-01-04 | 2003-07-15 | Ezchip Technologies Ltd. | Wildcards in radix- search tree structures |
| CN101083608A (en) * | 2006-05-30 | 2007-12-05 | 华为技术有限公司 | Method for enquiring node information of equipment management tree and its terminal equipment |
| CN101286988A (en) * | 2008-04-18 | 2008-10-15 | 北京启明星辰信息技术股份有限公司 | Parallel multi-mode matching method and system therefor |
| CN101807184A (en) * | 2009-02-16 | 2010-08-18 | 阿尔卡特朗讯 | Method for searching character string with wildcard character and system thereof |
| CN103020052A (en) * | 2011-09-20 | 2013-04-03 | 北京百度网讯科技有限公司 | Method and device for recognizing search demand |
| CN102360385A (en) * | 2011-10-14 | 2012-02-22 | 盛乐信息技术(上海)有限公司 | File searching method and system |
| CN102646115A (en) * | 2012-02-17 | 2012-08-22 | 北京星网锐捷网络技术有限公司 | Method and device for constructing AC (aho-corasick) state machine |
| CN103778124A (en) * | 2012-10-17 | 2014-05-07 | 北大方正集团有限公司 | Tree structure query method and device |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111160200A (en) * | 2019-12-23 | 2020-05-15 | 浙江大华技术股份有限公司 | Method and device for establishing passerby library |
| CN111160200B (en) * | 2019-12-23 | 2023-06-16 | 浙江大华技术股份有限公司 | Method and device for establishing passerby library |
| CN114758727A (en) * | 2022-04-26 | 2022-07-15 | 安徽农业大学 | Retrieval method based on carried cache Trie tree accelerated biological gene |
| CN114861608A (en) * | 2022-05-27 | 2022-08-05 | 中国第一汽车股份有限公司 | A data processing method, apparatus, device and storage medium |
| CN115238132A (en) * | 2022-06-24 | 2022-10-25 | 新华三信息安全技术有限公司 | Matching method and device of pattern strings |
| CN117150095A (en) * | 2023-09-12 | 2023-12-01 | 北京云枢创新软件技术有限公司 | Hierarchical tree node search methods, electronic devices and media |
| CN117150095B (en) * | 2023-09-12 | 2024-03-22 | 北京云枢创新软件技术有限公司 | Hierarchical tree node searching method, electronic equipment and medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN105843882A (en) | 2016-08-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2017161749A1 (en) | Method and device for information matching | |
| CN112860866B (en) | Semantic retrieval method, device, equipment and storage medium | |
| CN104915340B (en) | Natural language question-answering method and device | |
| CN112163405B (en) | Question generation method and device | |
| US9424294B2 (en) | Method for facet searching and search suggestions | |
| CN111026886B (en) | Multi-round dialogue processing method for professional scene | |
| US10289717B2 (en) | Semantic search apparatus and method using mobile terminal | |
| WO2020082560A1 (en) | Method, apparatus and device for extracting text keyword, as well as computer readable storage medium | |
| CN105956053B (en) | A kind of search method and device based on network information | |
| KR20160007040A (en) | Method and system for searching by using natural language query | |
| JP2021111327A (en) | Method for generating api knowledge graph, system, and non-transitory computer-readable medium | |
| KR102729987B1 (en) | Apparatus, method and computer program for processing inquiry | |
| CN111859950A (en) | A method for automatic generation of lecture notes | |
| WO2017166626A1 (en) | Normalization method, device and electronic equipment | |
| CA2809021C (en) | Systems and methods for lexicon generation | |
| US11468078B2 (en) | Hierarchical data searching using tensor searching, fuzzy searching, and Bayesian networks | |
| Wen et al. | Rtsi: An index structure for multi-modal real-time search on live audio streaming services | |
| CN114595696A (en) | Entity disambiguation method, entity disambiguation apparatus, storage medium, and electronic device | |
| CN109472032A (en) | A kind of determination method, apparatus, server and the storage medium of entity relationship diagram | |
| CN117786047A (en) | Knowledge graph question-answering method based on sub-graph retrieval and query path ranking | |
| JP6115487B2 (en) | Information collecting method, dialogue system, and information collecting apparatus | |
| Jeon et al. | Random forest algorithm for linked data using a parallel processing environment | |
| CN116340457A (en) | Data search method, device, computer equipment, and computer-readable storage medium | |
| CN116991969B (en) | Configurable grammatical relationship retrieval method, system, electronic device and storage medium | |
| Hiregoudar et al. | Speech to SQL generator-a voice based approach |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16895099 Country of ref document: EP Kind code of ref document: A1 |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 16895099 Country of ref document: EP Kind code of ref document: A1 |