CN102915381B - Visual network retrieval based on multi-dimensional semantic presents system and presents control method - Google Patents
Visual network retrieval based on multi-dimensional semantic presents system and presents control method Download PDFInfo
- Publication number
- CN102915381B CN102915381B CN201210473410.9A CN201210473410A CN102915381B CN 102915381 B CN102915381 B CN 102915381B CN 201210473410 A CN201210473410 A CN 201210473410A CN 102915381 B CN102915381 B CN 102915381B
- Authority
- CN
- China
- Prior art keywords
- semantic
- reasoning
- keyword
- unit
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000000007 visual effect Effects 0.000 title claims description 33
- 238000001914 filtration Methods 0.000 claims description 10
- 238000005259 measurement Methods 0.000 claims 1
- 230000011218 segmentation Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000003491 array Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明涉及一种基于多维语义的可视化网络检索呈现系统及呈现控制方法,属于网络检索技术领域。该系统包括查询服务器、语义匹配与推理模块、索引数据库、语义索引结果集、分维规则单元和多维结果呈现单元,该方法利用语义匹配与推理模块对关键词进行语义匹配和推理,索引数据库根据获取的语义匹配和推理结果集建立并保存语义本体索引;多维规则单元根据语义索引结果集中关键词的语义距离,将索引结果集聚类成多维度多层次的数据结果,以利于在用户在基于多维度的候选检索结果呈现形式中,快速地定位到检索的目标结果,从而有效区分同一文本信息的不同语义,提高检索效率。
The invention relates to a multi-dimensional semantics-based visualized network retrieval presentation system and a presentation control method, belonging to the technical field of network retrieval. The system includes a query server, a semantic matching and reasoning module, an index database, a semantic index result set, a fractal rule unit, and a multidimensional result presentation unit. The acquired semantic matching and reasoning result sets establish and save the semantic ontology index; the multi-dimensional rule unit clusters the index result set into multi-dimensional and multi-level data results according to the semantic distance of the keywords in the semantic index result set, so as to benefit users in the In the multi-dimensional candidate retrieval result presentation form, the retrieval target result can be quickly located, so as to effectively distinguish different semantics of the same text information and improve retrieval efficiency.
Description
技术领域technical field
本发明涉及网络检索技术领域,具体网络检索呈现技术领域,具体是指一种基于多维语义的可视化网络检索呈现系统及呈现控制方法。The present invention relates to the technical field of network retrieval, in particular to the technical field of network retrieval presentation, in particular to a multi-dimensional semantics-based visual network retrieval presentation system and presentation control method.
背景技术Background technique
随着检索技术的飞速发展,国内外涌现出如谷歌(Google)、雅虎(Yahoo)、百度(Baidu)等各类成熟的搜索引擎。这些搜索引擎主要基于文本的信息检索技术,为用户提供完备性强、相关性高的信息检索引擎。虽然现有的文本搜索技术能搜索到包含用户的文本查询信息的文件,但是呈现形式主要是按照搜索结果的相关度进行排序,并将结果按照相关程度的大小,以链接结果集的形式返回给用户。这种检索技术最大的缺点是,检索关键词的多义性导致搜索结果集的语义关系千差万别,比如,当用户提交给搜索引擎的搜索关键词为“苹果”时,搜索引擎无法正确判断“苹果”是指水果“苹果”,还是由Steve Jobs创办的“苹果”公司,或者是指法国电影“The Apple”。搜索引擎在毫无上下文相关的情况下,无法准确确定出搜索的“苹果”关键词与哪一种候选内容最相关,所以导致搜索到的结果往往不能满足用户的需求。With the rapid development of retrieval technology, various mature search engines such as Google (Google), Yahoo (Yahoo), and Baidu (Baidu) have emerged at home and abroad. These search engines are mainly based on text-based information retrieval technology, providing users with a complete and highly relevant information retrieval engine. Although the existing text search technology can search for files containing the user's text query information, the presentation form is mainly to sort according to the relevance of the search results, and return the results in the form of a link result set to the user. The biggest shortcoming of this retrieval technology is that the ambiguity of the retrieval keywords leads to a wide variety of semantic relationships in the search result sets. For example, when the search keyword submitted by the user to the search engine is "apple", the search engine cannot correctly judge " refers to the fruit "apple", or to the "Apple" company founded by Steve Jobs, or to the French film "The Apple". In the absence of context, the search engine cannot accurately determine which candidate content is most relevant to the searched "apple" keyword, so the search results often cannot meet the needs of users.
发明内容Contents of the invention
本发明的目的是克服了上述现有技术中的缺点,提供一种通过匹配用户的文本查询信息和文件的索引信息,将检索结果按照语义的逻辑性分层次分维度地呈现给用户,以利于在用户在基于多维度的候选检索结果呈现形式中,快速地定位到检索的目标结果,从而有效区分同一文本的不同语义,提高检索效率,且系统结构简单,成本低廉,方法应用方式简便,应用范围广泛的基于多维语义的可视化网络检索呈现系统及呈现控制方法。The purpose of the present invention is to overcome the shortcomings in the above-mentioned prior art, and provide a method of matching the user's text query information and file index information to present the retrieval results to the user in a layered and dimensional manner according to the logic of semantics, so as to facilitate In the presentation form of candidate retrieval results based on multiple dimensions, users can quickly locate the target results of retrieval, thereby effectively distinguishing different semantics of the same text and improving retrieval efficiency. The system structure is simple, the cost is low, and the method is easy to apply. A multi-dimensional semantic-based visualized network retrieval presentation system and presentation control method with a wide range.
为了实现上述的目的,本发明的基于多维语义的可视化网络检索呈现系统具有如下构成:In order to achieve the above-mentioned purpose, the multi-dimensional semantics-based visual network retrieval and presentation system of the present invention has the following components:
该系统包括查询服务器、语义匹配与推理模块、索引数据库、语义索引结果集、分维规则单元和多维结果呈现单元。其中,查询服务器用以提供用户搜索关键词输入接口;语义匹配与推理模块连接所述的查询服务器,根据相关领域内的知识集合对关键词语义进行匹配和推理;索引数据库分别连接所述的查询服务器和语义匹配与推理模块,用以为搜索关键词提供对应的搜索结果;语义索引结果集连接所述的索引数据库,用以保存与搜索关键词对应的索引结果集;分维规则单元分别连接所述的语义索引结果集和语义匹配与推理模块,根据语义索引结果集中关键词的语义距离,将索引结果集聚类成多个维度上的多个层次数据结果;多维结果呈现单元则连接所述的分维规则单元,用以向用户呈现所述的多个维度上的多个层次数据结果。The system includes a query server, a semantic matching and reasoning module, an index database, a semantic index result set, a fractal rule unit and a multidimensional result presentation unit. Wherein, the query server is used to provide user search keyword input interface; the semantic matching and reasoning module is connected to the query server, and the keyword semantics is matched and reasoned according to the knowledge set in the related field; the index database is respectively connected to the query The server and the semantic matching and reasoning module are used to provide corresponding search results for the search keywords; the semantic index result set is connected to the index database to store the index result sets corresponding to the search keywords; the fractal rule unit is respectively connected to the index database The semantic index result set and the semantic matching and reasoning module described above cluster the index result set into multi-level data results in multiple dimensions according to the semantic distance of keywords in the semantic index result set; the multi-dimensional result presentation unit connects the The fractal dimension rule unit is used to present the multi-level data results on the multi-dimensional to the user.
该基于多维语义的可视化网络检索呈现系统中,所述的语义匹配与推理模块包括标准本体知识库、语义匹配单元和语义推理单元。其中,标准本体知识库存储有相应领域内的本体知识集合;语义匹配单元连接所述的标准本体知识库,根据所述的本体知识集合获得关键词的语义匹配规则,并进行语义匹配;语义推理单元连接所述的标准本体知识库,根据所述的本体知识集合获得关键词的语义推理规则,并进行语义推理。In the multi-dimensional semantic-based visual network retrieval and presentation system, the semantic matching and reasoning module includes a standard ontology knowledge base, a semantic matching unit and a semantic reasoning unit. Wherein, the standard ontology knowledge base stores ontology knowledge sets in corresponding domains; the semantic matching unit connects to the standard ontology knowledge base, obtains semantic matching rules of keywords according to the ontology knowledge sets, and performs semantic matching; semantic reasoning The unit is connected to the standard ontology knowledge base, obtains semantic reasoning rules of keywords according to the ontology knowledge set, and performs semantic reasoning.
本发明还提供一种利用所述的系统基于多维语义实现可视化网络检索呈现控制的方法,该方法包括以下步骤:The present invention also provides a method for realizing visual network retrieval presentation control based on multi-dimensional semantics by using the system, the method includes the following steps:
(1)所述的查询服务器接收到查询关键词,并判断关键词是否是复杂句,若是,则进入步骤(2),若否,则进入步骤(3);(1) The query server receives the query keyword, and judges whether the keyword is a complex sentence, if so, proceeds to step (2), if not, proceeds to step (3);
(2)所述的查询服务器进行分词过滤处理,并向所述的索引数据库输出包含分隔号的关键词字符串,然后进入步骤(3);(2) The query server performs word segmentation and filtering processing, and outputs a keyword string containing a separator to the index database, and then enters step (3);
(3)所述的语义匹配与推理模块对所述的关键词进行语义匹配和推理,并将语义推理结果集发送到所述的索引数据库;(3) The semantic matching and reasoning module performs semantic matching and reasoning on the keywords, and sends the semantic reasoning result set to the index database;
(4)所述的索引数据库根据获取的语义匹配和推理结果集建立并保存语义本体索引,并将语义匹配和推理结果集的索引结果集发送至所述的分维规则单元;(4) The index database establishes and saves the semantic ontology index according to the acquired semantic matching and reasoning result sets, and sends the index result sets of the semantic matching and reasoning result sets to the fractal rule unit;
(5)多维规则单元根据所述的语义索引结果集中关键词的语义距离,将索引结果集聚类成具有多个维度的数据形式,所述的数据形式在各个维度上聚类多个层次的数据结果;(5) The multi-dimensional rule unit clusters the index result set into a data form with multiple dimensions according to the semantic distance of the keywords in the semantic index result set, and the data form clusters multiple levels in each dimension data result;
(6)多维结果呈现单元向用户呈现多个维度上的多个层次数据结果。(6) The multi-dimensional result presentation unit presents the multi-level data results in multiple dimensions to the user.
该基于多维语义实现可视化网络检索呈现控制的方法中,所述的查询服务器进行分词过滤处理,并向所述的索引数据库输出包含分隔号的关键词字符串,具体为:所述的查询服务器根据关键词的不同语言类型分别进行分词和过滤处理,并输出包含分隔号的关键词字符串。In the method for realizing visual network retrieval presentation control based on multi-dimensional semantics, the query server performs word segmentation and filtering, and outputs keyword strings containing separators to the index database, specifically: the query server according to Word segmentation and filtering are performed for different language types of keywords, and keyword strings containing separators are output.
该基于多维语义实现可视化网络检索呈现控制的方法中,所述的语义匹配与推理模块包括标准本体知识库、语义匹配单元和语义推理单元,所述的标准本体知识库存储有相应领域内的本体知识集合;所述的语义匹配单元和所述的语义推理单元均连接所述的标准本体知识库,所述的步骤(3)具体包括以下步骤:In the method for realizing visual network retrieval and presentation control based on multi-dimensional semantics, the semantic matching and reasoning module includes a standard ontology knowledge base, a semantic matching unit and a semantic reasoning unit, and the standard ontology knowledge base stores ontologies in corresponding fields knowledge collection; both the semantic matching unit and the semantic reasoning unit are connected to the standard ontology knowledge base, and the step (3) specifically includes the following steps:
(31)所述的语义匹配与推理模块接收到查询关键词之后,所述的语义匹配单元根据所述的标准本体知识库对关键词进行语义匹配处理,并将语义匹配结果集提交给所述的语义推理单元;(31) After the semantic matching and reasoning module receives the query keyword, the semantic matching unit performs semantic matching processing on the keyword according to the standard ontology knowledge base, and submits the semantic matching result set to the semantic reasoning unit;
(32)所述的语义推理单元对所述的语义匹配结果集进行语义推理处理,得到语义推理结果集,并将所述的语义推理结果集发送至所述的索引数据库。(32) The semantic reasoning unit performs semantic reasoning processing on the semantic matching result set to obtain a semantic reasoning result set, and sends the semantic reasoning result set to the index database.
该基于多维语义实现可视化网络检索呈现控制的方法中,所述的语义匹配处理,具体为:根据本领域特定的关键词集合,将其与查询关键词进行语义相似度计算,实现语义匹配。In the method for realizing visual network retrieval and presentation control based on multi-dimensional semantics, the semantic matching process specifically includes: performing semantic similarity calculation with query keywords according to a specific keyword set in this field to realize semantic matching.
该基于多维语义实现可视化网络检索呈现控制的方法中,所述的语义推理处理,具体为:根据特定领域中的本体知识,得出该领域的推理规则,利用规则对语义匹配结果进行推理,获得语义推理结果集。In the method for realizing visual network retrieval and presentation control based on multi-dimensional semantics, the semantic inference processing is specifically: according to the ontology knowledge in a specific field, the inference rules of the field are obtained, and the rules are used to reason the semantic matching results to obtain Semantic inference result set.
该基于多维语义实现可视化网络检索呈现控制的方法中,所述的步骤(5)具体包括以下步骤:In the method for realizing visual network retrieval presentation control based on multi-dimensional semantics, the step (5) specifically includes the following steps:
(51)所述的多维规则单元计算所述的语义索引结果集中的关键词之间的语义距离;(51) The multi-dimensional rule unit calculates the semantic distance between keywords in the semantic index result set;
(52)所述的多维规则单元根据所述的语义距离将索引结果集聚类成多个维度多个层次的数据结果。(52) The multi-dimensional rule unit clusters the index result set into data results of multiple dimensions and multiple levels according to the semantic distance.
该基于多维语义实现可视化网络检索呈现控制的方法中,所述的步骤(51)具体包括以下步骤:In the method for implementing visual network retrieval presentation control based on multi-dimensional semantics, the step (51) specifically includes the following steps:
(51-1)所述的多维规则单元查找所述的语义索引结果集中的多个关键词的最近的公共祖先节点;(51-1) The multi-dimensional rule unit searches for the nearest common ancestor node of multiple keywords in the semantic index result set;
(51-2)所述的多维规则单元计算各个关键词与所述的最近的公共祖先节点之间的距离;(51-2) The multi-dimensional rule unit calculates the distance between each keyword and the nearest common ancestor node;
(51-3)所述的多维规则单元将各个关键词与所述的最近的公共祖先节点间的距离之和作为语义索引结果集中的关键词之间的语义距离。(51-3) The multi-dimensional rule unit uses the sum of the distances between each keyword and the nearest common ancestor node as the semantic distance between the keywords in the semantic index result set.
该基于多维语义实现可视化网络检索呈现控制的方法中,所述的步骤(52)具体包括以下步骤:In the method for realizing visual network retrieval presentation control based on multi-dimensional semantics, the step (52) specifically includes the following steps:
(52-1)所述的多维规则单元根据所述的关键词之间的语义距离,分析检索关键词和语义距离之间的关系;(52-1) The multi-dimensional rule unit analyzes the relationship between the search keywords and the semantic distance according to the semantic distance between the keywords;
(52-2)所述的多维规则单元对多维数据集中的某一维度进行展开,确定检索结果所属的维度和层次;The multidimensional rule unit described in (52-2) expands a certain dimension in the multidimensional data set, and determines the dimension and level to which the retrieval result belongs;
(52-3)将各个检索结果集合成为具有多个维度多个层次的数据结果。(52-3) Collect each retrieval result into a data result with multiple dimensions and multiple levels.
采用了该发明的基于多维语义的可视化网络检索呈现系统及呈现控制方法,该系统包括查询服务器、语义匹配与推理模块、索引数据库、语义索引结果集、分维规则单元和多维结果呈现单元,从而能够利用语义匹配与推理模块对所述的关键词进行语义匹配和推理,索引数据库根据获取的语义匹配和推理结果集建立并保存语义本体索引;多维规则单元根据语义索引结果集中关键词的语义距离,将索引结果集聚类成多维度多层次的数据结果;最后由多维结果呈现单元呈现给用户,以利于在用户在基于多维度的候选检索结果呈现形式中,快速地定位到检索的目标结果,有效区分同一文本信息的不同语义,提高检索效率,且系统结构简单,成本低廉,方法应用方式简便,应用范围广泛的基于多维语义的可视化网络检索呈现系统及呈现控制方法。The multi-dimensional semantics-based visual network retrieval presentation system and presentation control method of the invention are adopted. The system includes a query server, a semantic matching and reasoning module, an index database, a semantic index result set, a fractal rule unit, and a multi-dimensional result presentation unit. The semantic matching and reasoning module can be used to carry out semantic matching and reasoning on the keywords, and the index database can establish and save the semantic ontology index according to the obtained semantic matching and reasoning result sets; the multidimensional rule unit can use the semantic distance of the keywords in the semantic index result set , to cluster the index result set into multi-dimensional and multi-level data results; finally, the multi-dimensional result presentation unit presents it to the user, so that the user can quickly locate the target result of the retrieval in the presentation form of candidate retrieval results based on multi-dimensional A visual network retrieval presentation system and presentation control method based on multi-dimensional semantics that can effectively distinguish different semantics of the same text information, improve retrieval efficiency, and have simple system structure, low cost, simple method application, and wide application range.
附图说明Description of drawings
图1为本发明的基于多维语义的可视化网络检索呈现系统的结构示意图。FIG. 1 is a schematic structural diagram of the multi-dimensional semantics-based visual network retrieval and presentation system of the present invention.
图2为本发明的基于多维语义实现可视化网络检索呈现控制的方法的具体实施例的流程图。FIG. 2 is a flow chart of a specific embodiment of the method for implementing visual network retrieval presentation control based on multi-dimensional semantics in the present invention.
图3为本发明实施例中多维语义空间的检索呈现模块的流程图。Fig. 3 is a flow chart of the retrieval and presentation module of the multi-dimensional semantic space in the embodiment of the present invention.
图4为本发明中基于多维语义空间的可视化检索呈现系统实施例的时序图。Fig. 4 is a sequence diagram of an embodiment of a visual retrieval and presentation system based on a multi-dimensional semantic space in the present invention.
具体实施方式Detailed ways
为了能够更清楚地理解本发明的技术内容,特举以下实施例详细说明。In order to understand the technical content of the present invention more clearly, the following examples are given in detail.
请参阅图1所示,为本发明的基于多维语义的可视化网络检索呈现系统的结构示意图。Please refer to FIG. 1 , which is a schematic structural diagram of the multi-dimensional semantics-based visual network retrieval and presentation system of the present invention.
在一种实施方式中,该系统包括查询服务器、语义匹配与推理模块、索引数据库、语义索引结果集、分维规则单元和多维结果呈现单元。其中,查询服务器用以提供用户搜索关键词输入接口;语义匹配与推理模块连接所述的查询服务器,根据相关领域内的知识集合对关键词语义进行匹配和推理;索引数据库分别连接所述的查询服务器和语义匹配与推理模块,用以为搜索关键词提供对应的搜索结果;语义索引结果集连接所述的索引数据库,用以保存与搜索关键词对应的索引结果集;分维规则单元分别连接所述的语义索引结果集和语义匹配与推理模块,根据语义索引结果集中关键词的语义距离,将索引结果集聚类成多个维度上的多个层次数据结果;多维结果呈现单元则连接所述的分维规则单元,用以向用户呈现所述的多个维度上的多个层次数据结果。In one embodiment, the system includes a query server, a semantic matching and reasoning module, an index database, a semantic index result set, a fractal rule unit and a multidimensional result presentation unit. Wherein, the query server is used to provide user search keyword input interface; the semantic matching and reasoning module is connected to the query server, and the keyword semantics is matched and reasoned according to the knowledge set in the related field; the index database is respectively connected to the query The server and the semantic matching and reasoning module are used to provide corresponding search results for the search keywords; the semantic index result set is connected to the index database to store the index result sets corresponding to the search keywords; the fractal rule unit is respectively connected to the index database The semantic index result set and the semantic matching and reasoning module described above cluster the index result set into multi-level data results in multiple dimensions according to the semantic distance of keywords in the semantic index result set; the multi-dimensional result presentation unit connects the The fractal dimension rule unit is used to present the multi-level data results on the multi-dimensional to the user.
利用该实施方式所述的系统基于多维语义实现可视化网络检索呈现控制的方法,包括以下步骤:The method for realizing visual network retrieval presentation control based on multi-dimensional semantics using the system described in this embodiment includes the following steps:
(1)所述的查询服务器接收到查询关键词,并判断关键词是否是复杂句,若是,则进入步骤(2),若否,则进入步骤(3);(1) The query server receives the query keyword, and judges whether the keyword is a complex sentence, if so, proceeds to step (2), if not, proceeds to step (3);
(2)所述的查询服务器进行分词过滤处理,并向所述的索引数据库输出包含分隔号的关键词字符串,然后进入步骤(3);(2) The query server performs word segmentation and filtering processing, and outputs a keyword string containing a separator to the index database, and then enters step (3);
(3)所述的语义匹配与推理模块对所述的关键词进行语义匹配和推理,并将语义推理结果集发送到所述的索引数据库;(3) The semantic matching and reasoning module performs semantic matching and reasoning on the keywords, and sends the semantic reasoning result set to the index database;
(4)所述的索引数据库根据获取的语义匹配和推理结果集建立并保存语义本体索引,并将语义匹配和推理结果集的索引结果集发送至所述的分维规则单元;(4) The index database establishes and saves the semantic ontology index according to the acquired semantic matching and reasoning result sets, and sends the index result sets of the semantic matching and reasoning result sets to the fractal rule unit;
(5)多维规则单元根据所述的语义索引结果集中关键词的语义距离,将索引结果集聚类成具有多个维度的数据形式,所述的数据形式在各个维度上聚类多个层次的数据结果;(5) The multi-dimensional rule unit clusters the index result set into a data form with multiple dimensions according to the semantic distance of the keywords in the semantic index result set, and the data form clusters multiple levels in each dimension data result;
(6)多维结果呈现单元向用户呈现多个维度上的多个层次数据结果。(6) The multi-dimensional result presentation unit presents the multi-level data results in multiple dimensions to the user.
其中,步骤(2)中所述的查询服务器进行分词过滤处理,并向所述的索引数据库输出包含分隔号的关键词字符串,具体为:所述的查询服务器根据关键词的不同语言类型分别进行分词和过滤处理,并输出包含分隔号的关键词字符串。Wherein, the query server described in step (2) performs word segmentation and filtering processing, and outputs keyword strings containing separators to the index database, specifically: the query server separates the keyword strings according to different language types of keywords Perform word segmentation and filtering, and output keyword strings containing separators.
在一种较优选的实施方式中,所述的语义匹配与推理模块包括标准本体知识库、语义匹配单元和语义推理单元。其中,标准本体知识库存储有相应领域内的本体知识集合;语义匹配单元连接所述的标准本体知识库,根据所述的本体知识集合获得关键词的语义匹配规则,并进行语义匹配;语义推理单元连接所述的标准本体知识库,根据所述的本体知识集合获得关键词的语义推理规则,并进行语义推理。In a preferred embodiment, the semantic matching and reasoning module includes a standard ontology knowledge base, a semantic matching unit and a semantic reasoning unit. Wherein, the standard ontology knowledge base stores ontology knowledge sets in corresponding domains; the semantic matching unit connects to the standard ontology knowledge base, obtains semantic matching rules of keywords according to the ontology knowledge sets, and performs semantic matching; semantic reasoning The unit is connected to the standard ontology knowledge base, obtains semantic reasoning rules of keywords according to the ontology knowledge set, and performs semantic reasoning.
在利用该较优选的实施方式所述的系统基于多维语义实现可视化网络检索呈现控制的方法中,所述的步骤(3)具体包括以下步骤:In the method for realizing visual network retrieval presentation control based on multi-dimensional semantics using the system described in this preferred implementation manner, the step (3) specifically includes the following steps:
(31)所述的语义匹配与推理模块接收到查询关键词之后,所述的语义匹配单元根据所述的标准本体知识库对关键词进行语义匹配处理,并将语义匹配结果集提交给所述的语义推理单元,所述的语义匹配处理,具体为:根据本领域特定的关键词集合,将其与查询关键词进行语义相似度计算,实现语义匹配;(31) After the semantic matching and reasoning module receives the query keyword, the semantic matching unit performs semantic matching processing on the keyword according to the standard ontology knowledge base, and submits the semantic matching result set to the The semantic reasoning unit, the semantic matching processing, specifically: according to the specific keyword set in this field, carry out semantic similarity calculation with the query keywords to realize semantic matching;
(32)所述的语义推理单元对所述的语义匹配结果集进行语义推理处理,得到语义推理结果集,并将所述的语义推理结果集发送至所述的索引数据库。其中,所述的语义推理处理,具体为:根据特定领域中的本体知识,得出该领域的推理规则,利用规则对语义匹配结果进行推理,获得语义推理结果集。(32) The semantic reasoning unit performs semantic reasoning processing on the semantic matching result set to obtain a semantic reasoning result set, and sends the semantic reasoning result set to the index database. Wherein, the semantic reasoning process specifically includes: according to the ontology knowledge in a specific field, obtain the reasoning rules in the field, use the rules to reason the semantic matching results, and obtain the semantic reasoning result set.
在一种进一步优选的实施方式中,所述的步骤(5)具体包括以下步骤:In a further preferred embodiment, the step (5) specifically includes the following steps:
(51)所述的多维规则单元计算所述的语义索引结果集中的关键词之间的语义距离;(51) The multi-dimensional rule unit calculates the semantic distance between keywords in the semantic index result set;
(52)所述的多维规则单元根据所述的语义距离将索引结果集聚类成多个维度多个层次的数据结果。(52) The multi-dimensional rule unit clusters the index result set into data results of multiple dimensions and multiple levels according to the semantic distance.
在一种更优选的实施方式中,所述的步骤(51)具体包括以下步骤:In a more preferred embodiment, the step (51) specifically includes the following steps:
(51-1)所述的多维规则单元查找所述的语义索引结果集中的多个关键词的最近的公共祖先节点;(51-1) The multi-dimensional rule unit searches for the nearest common ancestor node of multiple keywords in the semantic index result set;
(51-2)所述的多维规则单元计算各个关键词与所述的最近的公共祖先节点之间的距离;(51-2) The multi-dimensional rule unit calculates the distance between each keyword and the nearest common ancestor node;
(51-3)所述的多维规则单元将各个关键词与所述的最近的公共祖先节点间的距离之和作为语义索引结果集中的关键词之间的语义距离。(51-3) The multi-dimensional rule unit uses the sum of the distances between each keyword and the nearest common ancestor node as the semantic distance between the keywords in the semantic index result set.
且所述的步骤(52)具体包括以下步骤:And the step (52) specifically includes the following steps:
(52-1)所述的多维规则单元根据所述的关键词之间的语义距离,分析检索关键词和语义距离之间的关系;(52-1) The multi-dimensional rule unit analyzes the relationship between the search keywords and the semantic distance according to the semantic distance between the keywords;
(52-2)所述的多维规则单元对多维数据集中的某一维度进行展开,确定检索结果所属的维度和层次;The multidimensional rule unit described in (52-2) expands a certain dimension in the multidimensional data set, and determines the dimension and level to which the retrieval result belongs;
(52-3)将各个检索结果集合成为具有多个维度多个层次的数据结果。(52-3) Collect each retrieval result into a data result with multiple dimensions and multiple levels.
在实际应用中,本发明的提供的基于多维语义空间的可视化检索呈现系统中,扩展检索关键词的语义相似性计算,两个关键词之间的语义距离可以理解成两个结点,两个结点之间的语义距离指的是两个结点的最近公共祖先结点分别到这两个结点的路径之和。计算两个结点的最小距离即找到最近的公共祖先结点,然后计算分别到两个结点之间的距离,最后将两个距离相加即为所求。In practical applications, in the multi-dimensional semantic space-based visual retrieval presentation system provided by the present invention, the semantic similarity calculation of the search keywords is extended, and the semantic distance between two keywords can be understood as two nodes, two The semantic distance between nodes refers to the sum of the paths from the nearest common ancestor nodes of two nodes to these two nodes. To calculate the minimum distance between two nodes is to find the nearest common ancestor node, then calculate the distance between the two nodes, and finally add the two distances to get the desired result.
语义聚类算法中,采用多维数组计算检索关键词的语义距离,经过分析检索两个关键词之间的语义关系,可对多维数据集中的某一维度进行展开,进而确定检索结果是在哪几个维度的哪几个层次上的数据结果。In the semantic clustering algorithm, a multidimensional array is used to calculate the semantic distance of the retrieval keywords. After analyzing the semantic relationship between the two keywords, a certain dimension in the multidimensional data set can be expanded to determine the retrieval results. The data results on which levels of a dimension.
图1示意了本发明实现的基于多维语义空间的可视化检索呈现系统原理图,包括查询服务器、标准本体知识库、语义匹配单元、语义推理单元、索引数据库、语义索引结果集、分维规则和多维结果呈现单元。查询服务器是提供用户搜索关键词的接口;标准本体知识库保存该领域内的本体知识集合,为语义匹配单元和语义推理单元提供语义匹配和推理规则;索引数据库为搜索关键词提供对应的搜索结果;语义索引结果集保存了与搜索关键词对应的索引结果集;分维规则单元根据语义索引结果中关键词的语义距离,将索引结果集聚类成具有多个维度的数据形式,多个维度上聚类多个层次的数据结果。Fig. 1 schematically shows the principle diagram of the visual retrieval and presentation system based on multi-dimensional semantic space realized by the present invention, including query server, standard ontology knowledge base, semantic matching unit, semantic reasoning unit, index database, semantic index result set, fractal rules and multi-dimensional Results presentation unit. The query server is an interface that provides users with search keywords; the standard ontology knowledge base stores ontology knowledge collections in this field, and provides semantic matching and reasoning rules for the semantic matching unit and semantic reasoning unit; the index database provides corresponding search results for search keywords The semantic index result set stores the index result set corresponding to the search keyword; the fractal dimension rule unit clusters the index result set into a data form with multiple dimensions according to the semantic distance of the keywords in the semantic index result. The results of clustering multiple levels of data.
图2表示的是本发明的方法的实施例流程图,主要包括如下步骤。Fig. 2 shows a flow chart of an embodiment of the method of the present invention, which mainly includes the following steps.
步骤201,接收查询关键词,并判断输入的关键词是否是复杂句,若是,则进行步骤202;否则,继续进行步骤203,发送到索引数据库。Step 201, receiving query keywords, and judging whether the input keywords are complex sentences, if so, proceed to step 202; otherwise, proceed to step 203, and send to the index database.
步骤202,按查询关键词的不同语言类型分别进行不同的分词、过滤处理,输出中文单词、英文单词和数字串等一系列分隔号的字符串。In step 202, different word segmentation and filtering processes are performed according to different language types of the query keywords, and a series of delimited character strings such as Chinese words, English words, and number strings are output.
步骤203,根据索引数据库的内容,索引得出与查询关键分词相对应的搜索结果集合。Step 203, according to the content of the index database, index to obtain a set of search results corresponding to the query keywords.
步骤204,语义推理:根据特定领域中的本体知识,得出该领域的推理规则,利用规则对描述结果进行推理,得出推理结果集;语义匹配:根据推理结果集和本领域特定的关键词集合进行语义相似度计算和语义匹配。Step 204, Semantic Reasoning: According to the ontology knowledge in a specific field, get the reasoning rules of the field, use the rules to reason the description results, and get the reasoning result set; Semantic matching: according to the reasoning result set and the specific keywords in this field The collection performs semantic similarity calculation and semantic matching.
步骤205,分维规则单元根据语义索引结果中关键词的语义距离,将索引结果集聚类成具有多个维度的数据形式,多个维度上聚类多个层次的数据结果。Step 205, the dimensionality rule unit clusters the index result set into a data form with multiple dimensions according to the semantic distance of keywords in the semantic index result, and clusters data results of multiple levels in multiple dimensions.
步骤206,结果呈现模块,将搜索结果按照多维的数据形式呈现出来。Step 206, the result presentation module presents the search results in a multi-dimensional data form.
图3为本发明实施例中多维语义空间的检索呈现模块的流程图,主要包括如下步骤。Fig. 3 is a flow chart of the retrieval and presentation module of the multi-dimensional semantic space in the embodiment of the present invention, which mainly includes the following steps.
步骤301,根据索引数据库已建立的索引内容,得出与查询关键分词相对应的搜索结果集合。Step 301 , according to the established index content of the index database, a search result set corresponding to the query key word is obtained.
步骤302,语义匹配模块,根据推理结果集和本领域特定的关键词集合进行语义相似度计算和语义匹配。Step 302, the semantic matching module performs semantic similarity calculation and semantic matching according to the inference result set and the field-specific keyword set.
步骤303,语义推理模块,设定本领域的推理规则,利用该规则对描述结果进行推理,得到推理结果集。Step 303, the semantic reasoning module, sets the reasoning rules in this field, uses the rules to reason the description results, and obtains the reasoning result set.
步骤304,计算两个关键词的语义距离,可以假设待求的两个关键词可以表示为两个结点(和),它们的公共祖先结点有如下的性质:公共祖先结点本身及其左右子树中必有“和”结点。于是从头结点开始依次访问它本身、左子树和右子树,其中含有“或”结点,则计数符号加1。当访问结束后发现标记为2时,则说明当前结点以下同时包含“和”结点,即当前结点是目标的最近公共结点,则两个关键词的语义距离即“和”结点分别到最近公共结点的总和。Step 304, calculate the semantic distance of two keywords, it can be assumed that the two keywords to be sought can be expressed as two nodes (and), and their common ancestor nodes have the following properties: the common ancestor node itself and its There must be "and" nodes in the left and right subtrees. Then visit itself, the left subtree and the right subtree sequentially from the head node, if there is an "or" node in it, the count symbol will be increased by 1. When it is found that the mark is 2 after the visit, it means that the current node contains both "and" nodes, that is, the current node is the nearest common node of the target, and the semantic distance between the two keywords is the "and" node Respectively to the sum of the nearest common nodes.
步骤305,分类、聚类搜索结果,采用多维数组计算检索关键词的语义距离,经过分析检索关键词和语义距离之间的关系,可对多维数据集中的某一维度进行展开,进而确定检索结果是在哪几个维度的哪几个层次上的数据结果。Step 305, classifying and clustering the search results, using a multidimensional array to calculate the semantic distance of the retrieval keywords, and after analyzing the relationship between the retrieval keywords and the semantic distance, a certain dimension in the multidimensional data set can be expanded to determine the retrieval results Which dimensions and which levels are the data results.
步骤306,分维呈现检索结果,根据语义索引结果中关键词的语义距离,将索引结果集聚类成具有多个维度的数据形式,多个维度上聚类多个层次的数据结果。In step 306, the retrieval results are presented in subdimensions, and according to the semantic distance of keywords in the semantic index results, the index result sets are clustered into a data form with multiple dimensions, and data results of multiple levels are clustered in multiple dimensions.
图4为本发明中基于多维语义空间的可视化检索呈现系统实施例的时序图,主要包括如下步骤。FIG. 4 is a sequence diagram of an embodiment of a visual retrieval and presentation system based on a multi-dimensional semantic space in the present invention, which mainly includes the following steps.
步骤401,查询服务器向语义扩展模块发出查询请求;Step 401, the query server sends a query request to the semantic extension module;
步骤402和步骤403,语义扩展模块根据标准本体知识库对搜索关键词进行扩展,得到扩展查询请求关键词,并将之发送给索引数据库模块;In step 402 and step 403, the semantic extension module expands the search keyword according to the standard ontology knowledge base, obtains the extended query request keyword, and sends it to the index database module;
步骤404,索引数据库模块,索引得出与查询关键分词相对应的搜索结果集合。Step 404, indexing the database module, and obtaining a search result set corresponding to the query keyword segment.
步骤405,索引数据库模块将搜索结果集合发送给分维呈现模块。Step 405, the index database module sends the search result set to the fractal dimension presentation module.
步骤406,分维呈现模块计算语义距离,检索关键词的语义相似性计算的方法是,将两个结点的最近公共祖先结点分别到这两个结点的路径加起来,所以,计算两个结点的最小距离的关键是要找到最近的公共祖先结点,然后计算分别到两个结点之间的距离,将距离相加即为所求。Step 406, the fractal dimension presentation module calculates the semantic distance, and the method of calculating the semantic similarity of the retrieval keyword is to add up the paths from the nearest common ancestor node of the two nodes to the two nodes, so the calculation of the two The key to the minimum distance of a node is to find the nearest common ancestor node, and then calculate the distance between the two nodes, and add the distance to get the result.
步骤407,分维呈现模块分类、聚类搜索结果,采用多维数组计算检索关键词的语义距离,经过分析检索关键词和语义距离之间的关系,可对多维数据集中的某一维度进行展开,进而确定检索结果是在哪几个维度的哪几个层次上的数据结果。Step 407, presenting module classification and clustering search results in fractal dimensions, using multidimensional arrays to calculate the semantic distance of retrieval keywords, and after analyzing the relationship between retrieval keywords and semantic distances, a certain dimension in the multidimensional data set can be expanded, Then it is determined which dimensions and which levels of data results the retrieval results are.
步骤408,分维呈现模块将搜索结果组织成语义的网络关系,并按照多维度的数据形式显示。In step 408, the fractal-dimensional presentation module organizes the search results into semantic network relationships and displays them in a multi-dimensional data form.
采用了该发明的基于多维语义的可视化网络检索呈现系统及呈现控制方法,该系统包括查询服务器、语义匹配与推理模块、索引数据库、语义索引结果集、分维规则单元和多维结果呈现单元,从而能够利用语义匹配与推理模块对所述的关键词进行语义匹配和推理,索引数据库根据获取的语义匹配和推理结果集建立并保存语义本体索引;多维规则单元根据语义索引结果集中关键词的语义距离,将索引结果集聚类成多维度多层次的数据结果;最后由多维结果呈现单元呈现给用户,以利于在用户在基于多维度的候选检索结果呈现形式中,快速地定位到检索的目标结果,有效区分同一文本信息的不同语义,提高检索效率,且系统结构简单,成本低廉,方法应用方式简便,应用范围广泛的基于多维语义的可视化网络检索呈现系统及呈现控制方法。The multi-dimensional semantics-based visual network retrieval presentation system and presentation control method of the invention are adopted. The system includes a query server, a semantic matching and reasoning module, an index database, a semantic index result set, a fractal rule unit, and a multi-dimensional result presentation unit. The semantic matching and reasoning module can be used to carry out semantic matching and reasoning on the keywords, and the index database can establish and save the semantic ontology index according to the obtained semantic matching and reasoning result sets; the multidimensional rule unit can use the semantic distance of the keywords in the semantic index result set , to cluster the index result set into multi-dimensional and multi-level data results; finally, the multi-dimensional result presentation unit presents it to the user, so that the user can quickly locate the target result of the retrieval in the presentation form of candidate retrieval results based on multi-dimensional A visual network retrieval presentation system and presentation control method based on multi-dimensional semantics that can effectively distinguish different semantics of the same text information, improve retrieval efficiency, and have simple system structure, low cost, simple method application, and wide application range.
在此说明书中,本发明已参照其特定的实施例作了描述。但是,很显然仍可以作出各种修改和变换而不背离本发明的精神和范围。因此,说明书和附图应被认为是说明性的而非限制性的。In this specification, the invention has been described with reference to specific embodiments thereof. However, it is obvious that various modifications and changes can be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded as illustrative rather than restrictive.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210473410.9A CN102915381B (en) | 2012-11-20 | 2012-11-20 | Visual network retrieval based on multi-dimensional semantic presents system and presents control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210473410.9A CN102915381B (en) | 2012-11-20 | 2012-11-20 | Visual network retrieval based on multi-dimensional semantic presents system and presents control method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102915381A CN102915381A (en) | 2013-02-06 |
CN102915381B true CN102915381B (en) | 2015-08-12 |
Family
ID=47613747
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210473410.9A Active CN102915381B (en) | 2012-11-20 | 2012-11-20 | Visual network retrieval based on multi-dimensional semantic presents system and presents control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102915381B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136352B (en) * | 2013-02-27 | 2016-02-03 | 华中师范大学 | Text retrieval system based on double-deck semantic analysis |
CN109241432A (en) * | 2018-09-07 | 2019-01-18 | 云南东巴文信息技术有限公司 | Discrete data acquisition analysis system and method |
CN109582849A (en) * | 2018-12-03 | 2019-04-05 | 浪潮天元通信信息系统有限公司 | A kind of Internet resources intelligent search method of knowledge based map |
CN110532354B (en) * | 2019-08-27 | 2023-01-06 | 腾讯科技(深圳)有限公司 | Content retrieval method and device |
CN112463954B (en) * | 2020-11-11 | 2024-01-02 | 远光软件股份有限公司 | Visual multidimensional data display system and method based on semantic recognition |
CN112487260A (en) * | 2020-12-07 | 2021-03-12 | 上海市研发公共服务平台管理中心 | Instrument project declaration and review expert matching method, device, equipment and medium |
CN113256071A (en) * | 2021-04-26 | 2021-08-13 | 河南天眼查科技有限公司 | Enterprise multidimensional data processing method and device, storage medium and electronic equipment |
CN117764114B (en) * | 2023-12-27 | 2024-10-18 | 暗物质(北京)智能科技有限公司 | High-performance multi-mode large model reasoning system and method |
CN118586491B (en) * | 2024-08-02 | 2024-10-18 | 宁波夏天信息科技有限公司 | AI knowledge base model construction and analysis method based on multi-dimensional association |
CN119807447B (en) * | 2025-03-13 | 2025-05-30 | 北京益邦达科技发展有限公司 | File retrieval method, system, product and readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110055188A1 (en) * | 2009-08-31 | 2011-03-03 | Seaton Gras | Construction of boolean search strings for semantic search |
CN101582073A (en) * | 2008-12-31 | 2009-11-18 | 北京中机科海科技发展有限公司 | Intelligent retrieval system and method based on domain ontology |
CN102663122A (en) * | 2012-04-20 | 2012-09-12 | 北京邮电大学 | Semantic query expansion algorithm based on emergency ontology |
-
2012
- 2012-11-20 CN CN201210473410.9A patent/CN102915381B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN102915381A (en) | 2013-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102915381B (en) | Visual network retrieval based on multi-dimensional semantic presents system and presents control method | |
CN108763333B (en) | Social media-based event map construction method | |
CN103838833B (en) | Text retrieval system based on correlation word semantic analysis | |
Liu et al. | Mining quality phrases from massive text corpora | |
CN105653706B (en) | A kind of multilayer quotation based on literature content knowledge mapping recommends method | |
Wei et al. | A survey of faceted search | |
Wang et al. | A phrase mining framework for recursive construction of a topical hierarchy | |
CN107247745B (en) | A kind of information retrieval method and system based on pseudo-linear filter model | |
CN103617157B (en) | Based on semantic Text similarity computing method | |
CN105045875B (en) | Personalized search and device | |
CN108846029B (en) | Information correlation analysis method based on knowledge graph | |
CN102419778A (en) | Information searching method for mining and clustering sub-topics of query sentences | |
CN108763348B (en) | Classification improvement method for feature vectors of extended short text words | |
CN102253982A (en) | Query suggestion method based on query semantics and click-through data | |
Xie et al. | Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb | |
CN116881436A (en) | Literature retrieval methods, systems, terminals and storage media based on knowledge graphs | |
CN107992608B (en) | An automatic generation method of SPARQL query statement based on keyword context | |
CN115563313A (en) | Semantic retrieval system for literature and books based on knowledge graph | |
CN107590128A (en) | A kind of paper based on high confidence features attribute Hierarchical clustering methods author's disambiguation method of the same name | |
CN112036178A (en) | A Semantic Search Method Related to Distribution Network Entity | |
CN103778206A (en) | Method for providing network service resources | |
CN115248839A (en) | A knowledge system-based long text retrieval method and device | |
Lin et al. | Automatic tagging web services using machine learning techniques | |
Moscato et al. | iwin: A summarizer system based on a semantic analysis of web documents | |
Zhao et al. | Expanding approach to information retrieval using semantic similarity analysis based on WordNet and Wikipedia |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |