[go: up one dir, main page]

WO2018214097A1 - Ksp algorithm-based resource description framework query method and system - Google Patents

Ksp algorithm-based resource description framework query method and system Download PDF

Info

Publication number
WO2018214097A1
WO2018214097A1 PCT/CN2017/085896 CN2017085896W WO2018214097A1 WO 2018214097 A1 WO2018214097 A1 WO 2018214097A1 CN 2017085896 W CN2017085896 W CN 2017085896W WO 2018214097 A1 WO2018214097 A1 WO 2018214097A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
query
keyword
value
vertex
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/085896
Other languages
French (fr)
Chinese (zh)
Inventor
吴定明
石杰明
孟每恩⋅尼克斯
袁帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to PCT/CN2017/085896 priority Critical patent/WO2018214097A1/en
Publication of WO2018214097A1 publication Critical patent/WO2018214097A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • ResourceDescriptionFramework is a general framework for expressing metadata of WEB resources. It uses Uniform Resource Descriptors (URIs) to represent things, and uses simple attributes and attribute values to describe things.
  • RDF represents data as ⁇ Subject, predicate, object>, which is used to identify the object as the subject, the part used to distinguish the different attributes of the subject object is the predicate, and the part of the statement used to distinguish the value of each attribute is called the object. Therefore, the RDF knowledge base can also be regarded as a directed graph, where vertices are resources, properties, words, descriptions, and edges are predicates used to describe the relationship between vertices.
  • the RDF knowledge base can be modeled as a directed graph where the vertices represent entities and the edges represent relationships between entities.
  • the vertices of the spatial coordinates we call the vertices of the spatial coordinates as the positions.
  • v we use v to represent any vertex in the RDF graph and p to represent the position vertex.
  • Each RDF triple corresponds to a directed edge from one entity (subject) to another (object).
  • each entity corresponds to a document, which is represented by , and the document is composed of keywords extracted from the corresponding resources, properties, characters, and descriptions of the entity.
  • the semantic place is a subtree of the RDF graph, which is rooted at the position vertex p and contains all the query keywords. A plurality of semantic locations can be constructed starting from a given position vertex p.
  • the present invention aims to solve the technical problem in the prior art that requires the user to understand the query language and the RDF semantics, otherwise the query cannot be performed, and provides a resource description framework query method and system based on the KSP algorithm.
  • An embodiment of the present invention provides a resource description framework query method based on a KSP algorithm for searching a semantic location of a query keyword on an RDF graph by using a KSP algorithm, and the query method includes the following steps:
  • the root node enters the queue in the preset spatial index, and obtains the location node queue Q;
  • the node e When the node e is the node N, it loops through each node under the node N, and calculates the looseness value of the radius word corresponding to each node e under N. And the sorting score of the corresponding radius word when And inserting the corresponding node e into the location node queue Q and returning the step of searching for the node e in the preset spatial index according to the input query keyword and the location node queue Q until
  • the query result is returned to the user according to the stored result function H k .
  • the present invention further provides an KSP algorithm-based resource description framework query system for searching a semantic location of a query keyword on an RDF graph by using a KSP algorithm, the query system comprising:
  • the first initialization module is configured to initialize the storage result function H k , wherein the storage result function H k is used to save the conditional semantic position QSP, the qualified semantic position QSP is a subtree containing all query keywords, and k is a condition The number of semantic locations QSP;
  • a building module configured to construct a dictionary structure according to the input query keyword and the inverted index table corresponding to the input query keyword, wherein the dictionary structure represents a node that includes the input query keyword;
  • the root node for the preset spatial index enters the queue, and obtains the location node queue Q;
  • the lookup loop module is configured to: according to the input query keyword and the location node queue Q, traverse the preset spatial index to obtain the node e and determine the node e;
  • the node processing module is configured to cycle through each node under the node N when the node e is the node N, and calculate the looseness value of the radius word corresponding to each node e under N And the sorting score of the corresponding radius word when Then, the corresponding node e is inserted into the location node queue Q and enters the lookup loop module until
  • the output result module is configured to return a query result to the user according to the stored result function H k .
  • the technical solution of the present invention has the beneficial effects that the KSP algorithm is user-friendly, the user does not need to master a special query language, and only needs to input the keyword of the query, and the algorithm will return near the query position.
  • ⁇ p,(v 1 ,v 2 ...)> is used to represent a semantic position, where p is the root node and (v 1 , v 2 7) represents all other vertices.
  • the eligible semantic locations QSP Given a query sequence q there may be multiple eligible semantic locations QSP, the eligible semantic locations QSP have the same root node p, but (v 1 , v 2 %) are different.
  • T p is also necessary to calculate the bulk of L (T p).
  • the query result is K TQSPs
  • the scores of the K TQSPs are the smallest of all TQSPs
  • the sorting score f is represented by the function f(L(T p ), S(q, p)), L(T p ) is the subtree T p looseness, and S(q, p) is the q. ⁇ query position and Euclidean distance between p.
  • the KSP algorithm-based resource description framework query method is used to search the semantic position of the query keyword on the RDF graph by using the KSP algorithm.
  • the constructing method includes the following steps:
  • Step S101 initializing the stored result function Hk , wherein the stored result function Hk is used to save the conditional semantic position QSP, the qualified semantic position QSP is a subtree containing all query keywords, and k is an eligible semantic position QSP quantity;
  • Step S102 Looping through each keyword in the query keyword input by the user according to the preset document inverted index table and the preset radius word domain table corresponding to all the query keywords, to obtain the input query keyword. Corresponding inverted index table, and obtaining a value corresponding to each keyword, and loading in a radius word field table corresponding to all preset query keywords;
  • Step S104 initializing the value ⁇ of the monotonic sorting function, wherein the monotonic sorting function indicates sorting the plurality of most compact matching conditions according to the input query keyword search, and the most compact and qualified semantic position is represented by p as the root node.
  • Step S112 determining whether If yes, go to step S113, if no, go to step S114;
  • Step S114 Go to step S115;
  • Step S103 specifically, establishes a dictionary structure Mq. ⁇ , and the structure is ⁇ node, (t 1 , t 2 , . . . ) ⁇ represents a node containing a query keyword.
  • the function GETNEXT indicates that the node e is searched by using an enhanced NN algorithm on the R-tree, and if the node e is a position vertex, The vertex including the spatial position information proceeds to step S107, and if the node e is not the position vertex, the process proceeds to step S111.
  • conditional semantic position QSP in the storage result function Hk is sorted according to the sorting score size of the most compact and qualified semantic position.
  • step S108 the execution function GETSEMANTICPLACE, as shown in FIG. 2, specifically includes:
  • Step S203 the queried keyword q. ⁇ is saved in the digital set B;
  • Step S204 starting from the vertex p, using the BFS (breadth-first-search) method to traverse the RDF graph and the number set B is not empty;
  • Step S205 the obtained embodiment BFS added to the subtree node v in T p;
  • Step S206 searching for a query keyword included in the node v in the dictionary structure
  • Step S209 if yes, obtain the looseness value L(T p ) of the subtree T p according to the number of intersection elements of the query key and the number set B included in the node v and the distance between the node v and the vertex p. ;
  • Step S210 deleting the intersection of the query key and the number set B included in the node v from the digital set B to obtain the current digital set B;
  • the step of creating a preset spatial index is specifically:
  • the query method further includes the steps of: performing an inverted index on a keyword in a document of each node to obtain a preset document inverted index table, where the format of the inverted index table is (keyword, a node), specifically, before step S101, an inverted index is created for a keyword in a document of each node to obtain a preset document inverted index table I, and the format of the inverted index table of the document is (keyword, node) ),As shown in Figure 3.
  • Step S401 traversing the RDF graph by using the BFS method from the vertex p;
  • Step S403 after obtaining the radius word fields of all the leaf nodes, calculating the value of WN(N) according to the order from the leaf node to the non-leaf node, and WN(N) indicating all the position vertices of the non-leaf node N ⁇ p j ⁇ corresponds to the union of WN(p j ), and the non-leaf node N contains a series of position vertices ⁇ p j ⁇ ;
  • Step S404 for the non-leaf node N, ⁇ e i ⁇ represents the node under N, WN(N) is initialized to be empty, and if (t, d g (e i , t)) has no corresponding value in WN(N) Add (t, d g (e i , t)) to the radius word field table corresponding to all the preset query keywords, and if there is a value, the value corresponding to the non-leaf node N in the radius word neighborhood table Updated to min(d g (N,t), d g (e i ,t)), where (t,d g (e i ,t)) is the keyword t corresponding to the radius word neighborhood table Value.
  • the vertex p may be a root node, and if it is a root node, location information is included.
  • the vertex p can be a node containing location information, so it is all possible nodes containing location information, including the root node.
  • the preset radius word domain table corresponding to all the query keywords is established according to a tree, wherein the leaf node p and the non-leaf node N in the tree first calculate the value of the leaf node, and then calculate the value.
  • the value of the leaf node, so it is a leaf node to a non-leaf node is a bottom-up order.
  • the table shown below is the radius word field table corresponding to all the preset query keywords.
  • the initialization is started, so that the radius word field table corresponding to all the preset query keywords is empty, and the step S401 - step S404 is to fill the table.
  • TQSP is constructed according to the KSP algorithm, the following two situations may be encountered: (i) T p containing the input query keyword is not found after traversing the graph, (ii) T p is found, but T p f [theta] is greater than the ranking score (current have been found in the K-th ranking score T p). Among them, TQSP is the case with the least loose QSP.
  • WN(p) is defined, and for the position vertex p, its WN(p) represents the set of the shortest distance from p to each query key t i ⁇ (t i , d g (p, t i )) ⁇ , where d g (p, t i ) ⁇ ⁇ , represents the shortest distance from the root node p to the vertex containing the query key t i .
  • WN(N) for node R of R-tree, contains a series of position vertices ⁇ p j ⁇ , WN(N) is a series of ⁇ (t i , d g (N, t i )) ⁇ , where WN(N) is a union of WN(p j ) corresponding to all position vertices ⁇ p j ⁇ under node N, where for each keyword t i , Obviously d g (N, t i ) ⁇ ⁇ .
  • Lemma 2 Because P is represented by a radius looseness TQSP word root node of T p, the radius for a given word ranking score T p of the query sequence corresponding to q ( ⁇ -bound on the ranking score ) may be expressed as And
  • Lemma 4 Because Represented at node N p is the root node in a T p of a radius corresponding to the word TQSP looseness, for a given query sequence q, at the node N p is the radius at the root of all the word loose T p Degree can be expressed as Where S(q,N) represents the smallest spatial distance between q and N, and
  • deletion rule 2 According to Lemma 2, we get the deletion rule 2, and according to Lemma 4, the deletion rule 3 is obtained to improve the efficiency of the algorithm.
  • the specific deletion rules are as follows:
  • Delete rule 2 For a given query sequence q, ⁇ represents the ranking score of the kth candidate TQSP. The ⁇ -bound on the ranking score of the TQSP representing T p with p as the root node. when When T p is not the result we want to check, p can be deleted.
  • Delete rule 3 For a given query sequence q, ⁇ represents the ranking score of the kth candidate TQSP, The ⁇ -bound on the ranking score of the TQSP representing the T p at the node N with p as the root node. Then, under the N node, no node satisfies the condition, and N can be deleted.
  • the present invention also provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of the above method.
  • a KSP algorithm query system based on a resource description framework for searching a semantic position of a query keyword on an RDF graph by using a KSP algorithm.
  • the query system includes:
  • the first initialization module 51 is configured to initialize the storage result function H k , wherein the storage result function H k is used to save the conditional semantic position QSP, and the qualified semantic position QSP is a subtree containing all the query keywords, where k is consistent The number of semantic locations of the conditional QSP;
  • the loop traversing module 52 is configured to cycle through each keyword in the query keyword input by the user according to the preset document inverted index table and the preset radius word domain table corresponding to all the query keywords, and obtain an input.
  • the building module 53 is configured to construct a dictionary structure according to the input query keyword and the inverted index table corresponding to the input query keyword, where the dictionary structure represents a node that includes the input query keyword;
  • the second initialization module 54 is configured to initialize the value ⁇ of the monotonic sorting function, wherein the monotonic sorting function indicates that the most compact and qualified semantic positions searched according to the input query keyword are sorted, and the most compact and qualified semantics The position is expressed as the semantic position with the least looseness of the root node with p;
  • a generating queue module 55 the root node for the preset spatial index enters the queue, and obtains the location node queue Q;
  • the lookup loop module 56 is configured to traverse the preset spatial index according to the input query keyword and the location node queue Q Obtaining node e and judging node e;
  • the vertex processing module 57 determines whether the node e is a non-conforming condition node when the node e is a vertex including the spatial location information, and terminates the current loop if the node e is not in compliance with the conditional node, and enters the next loop, when the node e
  • the score f is inserted into the result function H k and updates the value ⁇ of the monotonic sort function;
  • the node processing module 58 is configured to loop through each node under the node N when the node e is the node N, and calculate the looseness value of the radius word corresponding to each node e under N. And the sorting score of the corresponding radius word when Then, the corresponding node e is inserted into the location node queue Q and enters the lookup loop module until
  • the output result module 59 is configured to return a query result to the user according to the storage result function H k .
  • conditional semantic position QSP in the storage result function Hk is sorted according to the sorting score size of the most compact and qualified semantic position.
  • the vertex processing module 57 is further configured to:
  • T p represents a subtree containing p as a vertex and containing all query keywords
  • the keyword q. ⁇ of the query is saved to the number set B;
  • the formula for calculating the value of the looseness L(T p ) of the subtree T p according to the number of intersection elements of the query key and the number set B of the node v and the distance between the node v and the vertex p is obtained. as follows:
  • the data containing the coordinate information is extracted from the RDF data to obtain a preset spatial index. Because the RDF graph data is relatively large, in order to improve the speed of the query, the data containing the coordinate information is first extracted from the RDF data, and the preset spatial index R-tree is created, so that the query speed can be effectively improved.
  • the creation module is further used to:
  • the value of WN(N) is calculated in the order from the leaf node to the non-leaf node, and WN(N) represents all the position vertices ⁇ p j ⁇ under the non-leaf node N.
  • the union of WN(p j ), the non-leaf node N contains a series of position vertices ⁇ p j ⁇ ;
  • ⁇ e i ⁇ represents a node under N
  • WN(N) is initialized to be empty. If (t, d g (e i , t)) has no corresponding value in WN(N), then (t , d g (e i , t)) is added to the radius word field table corresponding to all the preset query keywords, and if there is a value, the value corresponding to the non-leaf node N in the radius word neighborhood table is updated to min (d g (N, t), d g (e i , t)), where (t, d g (e i , t)) is the value corresponding to the keyword t in the radius word neighborhood table.
  • the vertex p may be a root node, and if it is a root node, location information is included. Vertex p can be a containing bit The node that sets the information, so it is all possible nodes that contain location information, including the root node.
  • the preset radius word domain table corresponding to all the query keywords is established according to a tree, wherein the leaf node p and the non-leaf node N in the tree first calculate the value of the leaf node, and then calculate the value.
  • the value of the leaf node, so it is a leaf node to a non-leaf node is a bottom-up order.
  • the table shown below is the radius word field table corresponding to all the preset query keywords.
  • the initialization is started, so that the radius word field table corresponding to all the preset query keywords is empty, and the working process of creating the module is to fill the table.
  • abbey aneient, catholic, roman, history is the keyword in the radius word field table.
  • e i is p 1 , p 2 in the above table, that is, d g (e i , t) is the value corresponding to d g (p 1 , t i ) in the table.
  • Min(d g (N,t), d g (e i ,t)) is the value of d g (N, t i ) in the table to remove the minimum value of the column.
  • the column where catholic is located has a value of 1,0 so the value of min(d g (N,t), d g (e i ,t)) is 0, which is the value of d g (N,t i ).
  • TQSP is constructed according to the KSP algorithm, the following two situations may be encountered: (i) T p containing the input query keyword is not found after traversing the graph, (ii) T p is found, but T p f [theta] is greater than the ranking score (current have been found in the K-th ranking score T p). Among them, TQSP is the case with the least loose QSP.
  • WN(p) is defined, and for the position vertex p, its WN(p) represents the set of the shortest distance from p to each query key t i ⁇ (t i , d g (p, t i )) ⁇ , where d g (p, t i ) ⁇ ⁇ , represents the shortest distance from the root node p to the vertex containing the query key t i .
  • the radius word domain table I ⁇ corresponding to the node may be created, that is, the preset radius word domain table ( ⁇ -radius word neighborhood table) corresponding to all the query keywords is used to improve Algorithm efficiency.
  • Lemma 2 Because P is represented by a radius looseness TQSP word root node of T p, the radius for a given word ranking score T p of the query sequence corresponding to q ( ⁇ -bound on the ranking score ) may be expressed as And
  • Lemma 4 Because Represented at node N p is the root node in a T p of a radius corresponding to the word TQSP looseness, for a given query sequence q, at the node N p is the radius at the root of all the word loose T p Degree can be expressed as Where S(q,N) represents the smallest spatial distance between q and N, and
  • deletion rule 2 According to Lemma 2, we get the deletion rule 2, and according to Lemma 4, the deletion rule 3 is obtained to improve the efficiency of the algorithm.
  • the specific deletion rules are as follows:
  • Delete rule 2 For a given query sequence q, ⁇ represents the ranking score of the kth candidate TQSP. The ⁇ -bound on the ranking score of the TQSP representing T p with p as the root node. when When T p is not the result we want to check, p can be deleted.
  • Delete rule 3 For a given query sequence q, ⁇ represents the ranking score of the kth candidate TQSP, The ⁇ -bound on the ranking score of the TQSP representing the T p at the node N with p as the root node. Then, under the N node, no node satisfies the condition, and N can be deleted.
  • the resource description framework query system based on KSP algorithm mainly implements the search of keywords on the graph and the search of keywords on RDF data: due to the friendliness of keyword retrieval, not only users retrieve network data, but also retrieve XML documents. , relational databases, and graphs.
  • the search algorithm converts queries into searches on feature spaces, such as paths, frequent patterns, and sequences. This search algorithm pays more attention to the structure of the graph than to the semantic content of the graph.
  • the query of the keywords on the graph determines a set of densely linked nodes in the graph by utilizing both the content and the link structure. Due to the re-implementation of these two kinds of information, the overall quality of the results can be improved.
  • the search for keywords on RDF data can also provide query efficiency because RDF data is a special type of graph data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A KSP algorithm-based resource description framework query method, configured to employ the KSP algorithm to search for a semantic position of a query keyword in an RDF graph. The query method is user-friendly because a user does not need to master a specialized query language and simply needs to input a query keyword. The query method returns a subtree containing all inputted query keywords and near a query position.

Description

一种基于KSP算法的资源描述框架查询方法和系统Resource description framework query method and system based on KSP algorithm 技术领域Technical field

本发明涉及语义网数据检索技术,尤其涉及一种基于KSP算法的资源描述框架查询方法和系统。The invention relates to a semantic network data retrieval technology, in particular to a resource description framework query method and system based on a KSP algorithm.

背景技术Background technique

资源描述框架(ResourceDescriptionFramework,RDF)是用于表达WEB资源的元数据的通用框架,它使用统一资源描述符(URI)来表示事物,用简单的属性和属性值来描述事物,RDF把数据表示为<主体,谓词,客体>,其中用于标识事物为主体,用于区分主语对象各个不同属性的那部分为谓词,陈述中用于区分各个属性的值的部分叫做客体。因此RDF知识库也可以看做是一个有向图,其中顶点是资源、性质、文字、描述,边是谓词用来描述顶点之间的关系。ResourceDescriptionFramework (RDF) is a general framework for expressing metadata of WEB resources. It uses Uniform Resource Descriptors (URIs) to represent things, and uses simple attributes and attribute values to describe things. RDF represents data as <Subject, predicate, object>, which is used to identify the object as the subject, the part used to distinguish the different attributes of the subject object is the predicate, and the part of the statement used to distinguish the value of each attribute is called the object. Therefore, the RDF knowledge base can also be regarded as a directed graph, where vertices are resources, properties, words, descriptions, and edges are predicates used to describe the relationship between vertices.

RDF知识库可以建模成一个有向图,其中顶点表示实体,边表示实体之间关系。在RDF图中我们称存在空间坐标的顶点为位置顶点(places)。我们用v表示RDF图中的任意顶点,用p表示位置顶点。每一个RDF三元组对应一条从一个实体(主体)到另一个实体(客体)的有向边。为了实现关键字的搜索,每一个实体都对应一个文档,用ψ表示,该文档是从该实体对应的资源、性质、文字、描述中提取的关键字组成。语义位置(semantic place)是RDF图的一颗子树,该子树以位置顶点p为根,且包含所有的查询关键字。从一个给定的位置顶点p出发可以构造多个语义位置。The RDF knowledge base can be modeled as a directed graph where the vertices represent entities and the edges represent relationships between entities. In the RDF graph, we call the vertices of the spatial coordinates as the positions. We use v to represent any vertex in the RDF graph and p to represent the position vertex. Each RDF triple corresponds to a directed edge from one entity (subject) to another (object). In order to implement keyword search, each entity corresponds to a document, which is represented by ,, and the document is composed of keywords extracted from the corresponding resources, properties, characters, and descriptions of the entity. The semantic place is a subtree of the RDF graph, which is rooted at the position vertex p and contains all the query keywords. A plurality of semantic locations can be constructed starting from a given position vertex p.

现有的RDF数据是使用结构化查询语言(Structured Query Language)进行访问,如SPARQL(Simple Protocol and RDF Query Language)。但是标准的SPARQL查询需要用户完全了解语言本身,并且了解数据域。因此SPARQL限制数据访问主要是数据域专家,因为它对普通用户是不友好的,也就是说对RDF数据进行查询时,需要用户懂得查询语言和RDF语义,否则无法进行查询。Existing RDF data is accessed using a Structured Query Language, such as SPARQL (Simple Protocol and RDF Query Language). But standard SPARQL queries require the user to fully understand the language itself and understand the data domain. Therefore, SPARQL restricts data access mainly as a data domain expert because it is unfriendly to ordinary users. That is to say, when querying RDF data, the user needs to understand the query language and RDF semantics, otherwise the query cannot be performed.

发明内容Summary of the invention

本发明旨在解决现有技术中需要用户懂得查询语言和RDF语义否则无法进行查询的技术问题,提供一种基于KSP算法的资源描述框架查询方法和系统。The present invention aims to solve the technical problem in the prior art that requires the user to understand the query language and the RDF semantics, otherwise the query cannot be performed, and provides a resource description framework query method and system based on the KSP algorithm.

本发明的实施例提供一种基于KSP算法的资源描述框架查询方法,用于利用KSP算法在RDF图上搜索查询关键字的语义位置,所述查询方法包括以下步骤:An embodiment of the present invention provides a resource description framework query method based on a KSP algorithm for searching a semantic location of a query keyword on an RDF graph by using a KSP algorithm, and the query method includes the following steps:

初始化存放结果函数Hk,其中存放结果函数Hk用于保存符合条件的语义位置QSP,符合条件的语义位置QSP为包含所有查询关键字的子树,k为符合条件的语义位置QSP的数量;Initializing the result function H k , wherein the result function H k is used to save the conditional semantic position QSP, the qualified semantic position QSP is a subtree containing all the query keywords, and k is the number of eligible semantic positions QSP;

根据预设的文档倒排索引表和预设的所有查询关键字对应的半径字领域表,对用户输入的查询关键字中的每个关键字进行循环遍历,得到输入的查询关键字对应的倒排索引表,以及得到每个关键字对应的值,并加载在预设的所有查询关键字对应的半径字领域表中;According to the preset document inverted index table and the radius word field table corresponding to all the preset query keywords, each keyword in the query keyword input by the user is looped through, and the input query keyword is inverted. Arranging the index table, and obtaining the value corresponding to each keyword, and loading it in the radius word field table corresponding to all the preset query keywords;

根据所述输入的查询关键字和输入的查询关键字对应的倒排索引表,构建字典结构,其中所述字典结构表示含有所述输入的查询关键字的节点;Constructing a dictionary structure according to the input query keyword and the inverted index table corresponding to the input query keyword, wherein the dictionary structure represents a node containing the input query keyword;

初始化单调排序函数的值θ,其中单调排序函数表示对根据输入的查询关键字查找的多个最紧凑的符合条件的语义位置进行排序,最紧凑的符合条件的语义位置表示为以p为根节 点的松散度最小的符合条件的语义位置;Initializing the value θ of the monotonic sort function, wherein the monotonic sort function represents sorting the plurality of most compact and eligible semantic positions searched according to the input query key, the most compact and eligible semantic position being represented as p root node The semantic location of the point with the least degree of looseness;

预设的空间索引中根节点进入队列,得到位置节点队列Q;The root node enters the queue in the preset spatial index, and obtains the location node queue Q;

根据输入的查询关键字和位置节点队列Q,遍历预设的空间索引得到节点e,并对节点e进行判断;According to the input query keyword and the location node queue Q, traversing the preset spatial index to obtain the node e, and determining the node e;

当节点e为包含空间位置信息的顶点,判断e是否为不符合条件节点,若e为不符合条件节点则结束本次循环,进入下次循环,当节点e为包含空间位置信息的顶点,且判断节点e不是不符合条件节点时,则执行函数GETSEMANTICPLACE,得到符合条件的语义位置的子树Tp和子树Tp的松散度值L(Tp),并判断是否为L(Tp)==+∞,如果是,则结束本次循环,如果否,计算松散度值L(Tp)的排序分数f,并将松散度值L(Tp)和对应排序分数f插入存放结果函数Hk且更新单调排序函数的值θ;When the node e is a vertex containing the spatial location information, it is determined whether e is a non-conforming conditional node. If e is a non-conforming conditional node, the current loop is terminated, and the next loop is entered, when the node e is a vertex containing the spatial location information, and When it is judged that the node e is not a non-conforming condition node, the function GETSEMANTICPLACE is executed to obtain the looseness value L(T p ) of the subtree T p and the subtree T p of the semantic position that meet the condition, and determine whether it is L(T p )= = + ∞, if so, the end of this cycle, and if not, calculating bulk value L (T p) a ranking score f, and the bulk value L (T p) and inserted into the corresponding ranking score for the result function H f k and update the value θ of the monotonic sorting function;

当节点e为节点N时,循环遍历节点N下的每一个节点,计算N下每个节点e对应的半径字的松散度值

Figure PCTCN2017085896-appb-000001
和对应半径字的排序分数
Figure PCTCN2017085896-appb-000002
Figure PCTCN2017085896-appb-000003
时,则把对应节点e插入位置节点队列Q并返回所述根据输入的查询关键字和位置节点队列Q,在预设的空间索引中查找节点e的步骤,直到
Figure PCTCN2017085896-appb-000004
When the node e is the node N, it loops through each node under the node N, and calculates the looseness value of the radius word corresponding to each node e under N.
Figure PCTCN2017085896-appb-000001
And the sorting score of the corresponding radius word
Figure PCTCN2017085896-appb-000002
when
Figure PCTCN2017085896-appb-000003
And inserting the corresponding node e into the location node queue Q and returning the step of searching for the node e in the preset spatial index according to the input query keyword and the location node queue Q until
Figure PCTCN2017085896-appb-000004

根据存放结果函数Hk向用户返回查询结果。The query result is returned to the user according to the stored result function H k .

本发明还提供一种实施例的基于KSP算法的资源描述框架查询系统,用于利用KSP算法在RDF图上搜索查询关键字的语义位置,所述查询系统包括:The present invention further provides an KSP algorithm-based resource description framework query system for searching a semantic location of a query keyword on an RDF graph by using a KSP algorithm, the query system comprising:

第一初始化模块,用于初始化存放结果函数Hk,其中存放结果函数Hk用于保存符合条件的语义位置QSP,符合条件的语义位置QSP为包含所有查询关键字的子树,k为符合条件的语义位置QSP的数量;The first initialization module is configured to initialize the storage result function H k , wherein the storage result function H k is used to save the conditional semantic position QSP, the qualified semantic position QSP is a subtree containing all query keywords, and k is a condition The number of semantic locations QSP;

循环遍历模块,用于根据预设的文档倒排索引表和预设的所有查询关键字对应的半径字领域表,对用户输入的查询关键字中的每个关键字进行循环遍历,得到输入的查询关键字对应的倒排索引表,以及得到每个关键字对应的值,并加载在预设的所有查询关键字对应的半径字领域表中;The loop traversal module is configured to cycle through each keyword in the query keyword input by the user according to the preset document inverted index table and the preset radius word domain table corresponding to all the query keywords, and obtain the input Querying the inverted index table corresponding to the keyword, and obtaining the value corresponding to each keyword, and loading the radius word field table corresponding to all the preset query keywords;

构建模块,用于根据所述输入的查询关键字和输入的查询关键字对应的倒排索引表,构建字典结构,其中所述字典结构表示含有所述输入的查询关键字的节点;a building module, configured to construct a dictionary structure according to the input query keyword and the inverted index table corresponding to the input query keyword, wherein the dictionary structure represents a node that includes the input query keyword;

第二初始化模块,用于初始化单调排序函数的值θ,其中单调排序函数表示对根据输入的查询关键字查找的多个最紧凑的符合条件的语义位置进行排序,最紧凑的符合条件的语义位置表示为以p为根节点的松散度最小的符合条件的语义位置;a second initialization module for initializing the value θ of the monotonic sort function, wherein the monotonic sort function represents sorting the plurality of most compact and eligible semantic positions searched according to the input query keyword, the most compact and eligible semantic position Expressed as the semantic location with the least looseness of the root node with p as the root;

生成队列模块,用于预设的空间索引的根节点进入队列,得到位置节点队列Q;Generating a queue module, the root node for the preset spatial index enters the queue, and obtains the location node queue Q;

查找循环模块,用于根据输入的查询关键字和位置节点队列Q,遍历预设的空间索引得到节点e并对节点e进行判断;The lookup loop module is configured to: according to the input query keyword and the location node queue Q, traverse the preset spatial index to obtain the node e and determine the node e;

顶点处理模块,用于当节点e为包含空间位置信息的顶点时,判断节点e是否为不符合条件节点,若节点e为不符合条件节点则结束本次循环,进入下次循环,当节点e为包含空间位置信息的顶点,且节点e不是不符合条件节点,则执行函数GETSEMANTICPLACE,得到符 合条件的语义位置的子树Tp和子树Tp的松散度值L(Tp),并判断是否为L(Tp)==+∞,如果是,则结束本次循环,如果否,计算松散度值L(Tp)的排序分数f,并将松散度值L(Tp)和对应排序分数f插入存放结果函数Hk且更新单调排序函数的值θ;The vertex processing module is configured to determine whether the node e is a non-conforming condition node when the node e is a vertex including the spatial location information, and if the node e is not in the conditional node, the current loop is ended, and the next loop is entered, when the node e For the vertex containing the spatial position information, and the node e is not the non-conforming conditional node, the function GETSEMANTICPLACE is executed to obtain the looseness value L(T p ) of the subtree T p and the subtree T p of the semantic position that meet the condition, and determine whether of L (T p) == + ∞ , if so, the end of this cycle, and if not, calculating bulk value L (T p) sorted fraction F, and the bulk value L (T p) and the corresponding ordering The score f is inserted into the result function H k and updates the value θ of the monotonic sort function;

节点处理模块,用于节点e为节点N时,循环遍历节点N下的每一个节点,计算N下每个节点e对应的半径字的松散度值

Figure PCTCN2017085896-appb-000005
和对应半径字的排序分数
Figure PCTCN2017085896-appb-000006
Figure PCTCN2017085896-appb-000007
时,则把对应节点e插入位置节点队列Q并进入查找循环模块,直到
Figure PCTCN2017085896-appb-000008
The node processing module is configured to cycle through each node under the node N when the node e is the node N, and calculate the looseness value of the radius word corresponding to each node e under N
Figure PCTCN2017085896-appb-000005
And the sorting score of the corresponding radius word
Figure PCTCN2017085896-appb-000006
when
Figure PCTCN2017085896-appb-000007
Then, the corresponding node e is inserted into the location node queue Q and enters the lookup loop module until
Figure PCTCN2017085896-appb-000008

输出结果模块,用于根据存放结果函数Hk向用户返回查询结果。The output result module is configured to return a query result to the user according to the stored result function H k .

本发明还提供一种实施例的计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述方法的步骤。The present invention also provides a computer readable storage medium of an embodiment having stored thereon a computer program that, when executed by a processor, implements the steps of the above method.

本发明的技术方案与现有技术相比,有益效果在于:该KSP算法对用户是友好的,用户不需要掌握专门的查询语言,只需要输入查询的关键字,算法将返回在查询位置附近,包含所有关键字的子树。Compared with the prior art, the technical solution of the present invention has the beneficial effects that the KSP algorithm is user-friendly, the user does not need to master a special query language, and only needs to input the keyword of the query, and the algorithm will return near the query position. A subtree containing all the keywords.

附图说明DRAWINGS

图1是本发明基于KSP算法的资源描述框架查询方法一种实施例的流程图。1 is a flow chart of an embodiment of a resource description framework query method based on the KSP algorithm of the present invention.

图2是本发明基于KSP算法的资源描述框架查询方法另一种实施例的流程图。2 is a flow chart of another embodiment of a resource description framework query method based on the KSP algorithm of the present invention.

图3是本发明文档倒排索引表一种实施例的示意图。3 is a schematic diagram of an embodiment of a document inverted index table of the present invention.

图4是本发明所有查询关键字对应的半径字领域表的创建方法一种实施例的流程图。4 is a flow chart of an embodiment of a method for creating a radius word domain table corresponding to all query keywords of the present invention.

图5是本发明基于KSP算法的资源描述框架查询系统一种实施例的结构示意图。FIG. 5 is a schematic structural diagram of an embodiment of a resource description framework query system based on a KSP algorithm according to the present invention.

具体实施方式detailed description

下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.

具体的,KSP算法中的查询序列q由三部分组成,包括查询位置q.λ,查询关键字q.ψ,和语义位置数量k。Specifically, the query sequence q in the KSP algorithm is composed of three parts, including a query position q.λ, a query key q.ψ, and a semantic position number k.

具体的,对于一个给定kSP查询序列q,和一个RDF图G=<V,E>,其中V表示RDF图的顶点,E表示RDF图的边,符合条件的语义位置QSP是一棵树Tp=<V′,E′>,其中Tp的根节点为p,并且满足

Figure PCTCN2017085896-appb-000009
Specifically, for a given kSP query sequence q, and an RDF graph G=<V, E>, where V represents the vertex of the RDF graph, E represents the edge of the RDF graph, and the eligible semantic location QSP is a tree T p = <V', E'>, where the root node of T p is p and satisfies
Figure PCTCN2017085896-appb-000009

为了方便介绍,用<p,(v1,v2...)>表示一个语义位置,其中p是根节点,(v1,v2...)表示其他所有的顶点。给定一个查询序列q可能存在多个符合条件的语义位置QSP,符合条件的语义位置QSP有相同的根节点p,但(v1,v2...)不同。因此还需要计算Tp的松散度L(Tp)。 For convenience of introduction, <p,(v 1 ,v 2 ...)> is used to represent a semantic position, where p is the root node and (v 1 , v 2 ...) represents all other vertices. Given a query sequence q there may be multiple eligible semantic locations QSP, the eligible semantic locations QSP have the same root node p, but (v 1 , v 2 ...) are different. Thus T p is also necessary to calculate the bulk of L (T p).

具体的,对于一个给定QSP Tp=<V′,E′>,令

Figure PCTCN2017085896-appb-000010
表示从根节点p到包含关键字ti的节点v的最短距离,其中ti∈q.ψ,d(p,v)为p到v的最短距离,所以Tp的松散度值L(Tp)为:
Figure PCTCN2017085896-appb-000011
如果松散度越小,根节点和其他节点覆盖了所有输入的查询关键字的相关性越高。因此对于一个给定的位置顶点p,以位置顶点p为根节点,我们要找的是最紧凑的符合条件的语义位置TQSP,表示以p为根节点的松散度值最小的QSP。Specifically, for a given QSP T p = <V', E'>,
Figure PCTCN2017085896-appb-000010
Represents the shortest distance from the root node p to the node v containing the key t i , where t i ∈q.ψ, d(p, v) is the shortest distance from p to v, so the looseness value of T p is L(T p ) is:
Figure PCTCN2017085896-appb-000011
If the looseness is smaller, the correlation between the root node and other nodes covering all of the input query keywords is higher. So for a given position vertex p, with the position vertex p as the root node, we are looking for the most compact and qualified semantic position TQSP, which represents the QSP with the smallest looseness value with p as the root node.

另外,在RDF图上,对于一个给定的kSP的查询序列q,查询结果为K个TQSP,且这K个TQSP的分数(Ranking Score)是所有TQSP中最小的,松散度L(Tp)的排序分数f,用函数f(L(Tp),S(q,p))表示,L(Tp)为子树Tp松散度,S(q,p)为q.λ查询位置和p之间的欧式距离。函数f(L(Tp),S(q,p))可以为任意的单调排序函数,f(L(Tp),S(q,p))=L(Tp)×S(q,p)。In addition, on the RDF graph, for a given kSP query sequence q, the query result is K TQSPs, and the scores of the K TQSPs are the smallest of all TQSPs, and the looseness L(T p ) The sorting score f is represented by the function f(L(T p ), S(q, p)), L(T p ) is the subtree T p looseness, and S(q, p) is the q.λ query position and Euclidean distance between p. The function f(L(T p ), S(q, p)) can be any monotonic ordering function, f(L(T p ), S(q,p))=L(T p )×S(q, p).

本发明一个实施例的基于KSP算法的资源描述框架查询方法,用于利用KSP算法在RDF图上搜索查询关键字的语义位置,如图1所示,所述构造方法包括以下步骤:The KSP algorithm-based resource description framework query method is used to search the semantic position of the query keyword on the RDF graph by using the KSP algorithm. As shown in FIG. 1 , the constructing method includes the following steps:

步骤S101,初始化存放结果函数Hk,其中存放结果函数Hk用于保存符合条件的语义位置QSP,符合条件的语义位置QSP为包含所有查询关键字的子树,k为符合条件的语义位置QSP的数量;Step S101, initializing the stored result function Hk , wherein the stored result function Hk is used to save the conditional semantic position QSP, the qualified semantic position QSP is a subtree containing all query keywords, and k is an eligible semantic position QSP quantity;

步骤S102,根据预设的文档倒排索引表和预设的所有查询关键字对应的半径字领域表,对用户输入的查询关键字中的每个关键字进行循环遍历,得到输入的查询关键字对应的倒排索引表,以及得到每个关键字对应的值,并加载在预设的所有查询关键字对应的半径字领域表中;Step S102: Looping through each keyword in the query keyword input by the user according to the preset document inverted index table and the preset radius word domain table corresponding to all the query keywords, to obtain the input query keyword. Corresponding inverted index table, and obtaining a value corresponding to each keyword, and loading in a radius word field table corresponding to all preset query keywords;

步骤S103,根据所述输入的查询关键字和输入的查询关键字对应的倒排索引表,构建字典结构,其中所述字典结构表示含有所述输入的查询关键字的节点;Step S103, constructing a dictionary structure according to the input query keyword and the inverted index table corresponding to the input query keyword, wherein the dictionary structure represents a node that includes the input query keyword;

步骤S104,初始化单调排序函数的值θ,其中单调排序函数表示对根据输入的查询关键字查找的多个最紧凑的符合条件进行排序,最紧凑的符合条件的语义位置表示为以p为根节点的松散度最小的符合条件的语义位置;Step S104, initializing the value θ of the monotonic sorting function, wherein the monotonic sorting function indicates sorting the plurality of most compact matching conditions according to the input query keyword search, and the most compact and qualified semantic position is represented by p as the root node. The semantic position of the smallest looseness;

步骤S105,预设的空间索引的根节点进入队列,得到位置节点队列Q;Step S105, the root node of the preset spatial index enters the queue, and obtains the location node queue Q;

步骤S106,根据输入的查询关键字和位置节点队列Q,遍历预设的空间索引得到节点e,并对节点e进行判断,当节点e为包含空间位置信息的顶点,进入步骤S107,当节点e为节点N,进入步骤S111;Step S106, according to the input query keyword and the location node queue Q, traverse the preset spatial index to obtain the node e, and judge the node e. When the node e is a vertex containing the spatial location information, the process proceeds to step S107, where the node e Step N1, proceed to step S111;

步骤S107,判断节点e是否为不符合条件节点,如果是,则结束本次循环,进入下次循环即返回步骤S106,如果否,进入步骤S108;Step S107, it is determined whether the node e is a non-conforming conditional node, if yes, the current loop is terminated, the next loop is returned to step S106, and if not, the process proceeds to step S108;

步骤S108,执行函数GETSEMANTICPLACE,得到符合条件的语义位置的子树Tp和子树Tp的松散度值L(Tp); Step S108, executing a function GETSEMANTICPLACE, obtaining a looseness value L(T p ) of the subtree T p and the subtree T p of the semantic position that meet the condition;

步骤S109,判断是否为L(Tp)==+∞,如果是,结束本次循环即返回步骤S106,如果否,进入步骤S110;Step S109, it is determined whether L (T p ) == + ∞, if yes, the end of the current loop returns to step S106, and if not, proceeds to step S110;

步骤S110;计算松散度值L(Tp)的排序分数f,并将松散度值L(Tp)和对应排序分数f插入存放结果函数Hk且更新单调排序函数的值θ,进入步骤S115;Step S110; loose calculated value L (T p) a ranking score f, and the bulkiness value θ L (T p) and inserted into the corresponding ranking score f for the result and updates monotonic function H k sort function, the process proceeds to step S115 ;

步骤S111,循环遍历每一个节点N,计算节点e对应的半径字的松散度值

Figure PCTCN2017085896-appb-000012
和节点e对应半径字的排序分数
Figure PCTCN2017085896-appb-000013
Step S111, looping through each node N to calculate the looseness value of the radius word corresponding to the node e
Figure PCTCN2017085896-appb-000012
And node e correspond to the ranking score of the radius word
Figure PCTCN2017085896-appb-000013

步骤S112,判断是否

Figure PCTCN2017085896-appb-000014
如果是,进入步骤S113,如果否,进入步骤S114;Step S112, determining whether
Figure PCTCN2017085896-appb-000014
If yes, go to step S113, if no, go to step S114;

步骤S113,当

Figure PCTCN2017085896-appb-000015
时,节点e符合条件,将对应的节点e插入位置节点队列Q,并返回S106;Step S113, when
Figure PCTCN2017085896-appb-000015
When the node e meets the condition, insert the corresponding node e into the location node queue Q, and return to S106;

步骤S114,

Figure PCTCN2017085896-appb-000016
进入步骤S115;Step S114,
Figure PCTCN2017085896-appb-000016
Go to step S115;

步骤S115,根据存放结果函数Hk向用户返回查询结果,也就是说,向用户返回包含所有输入的查询关键字的子树,并且该子树的根节点靠近查询位置。Step S115, returning the query result to the user according to the deposit result function Hk , that is, returning a subtree containing all the input query keywords to the user, and the root node of the subtree is close to the query position.

步骤S101,具体为,初始化

Figure PCTCN2017085896-appb-000017
Hk中的元素按照f(L(Tp),S(q,p))排序;Hk中存放的是QSP,即存放最终的结果,其中QSP为包含所有查询关键字的子树。Step S101, specifically, initializing
Figure PCTCN2017085896-appb-000017
H k elements in accordance with f (L (T p), S (q, p)) sorting; k stored in the QSP is H, i.e., the final result is stored, wherein QSP subtree containing all of the query keywords.

步骤S102,具体为:循环遍历查询关键字q.ψ中的每个关键字ti,做一下处理:首先从预设的文档倒排索引表I中,查找关键字ti对一个的值,并保存,接着从预设的所有查询关键字对应的半径字领域表即α-radius word neighborhood表Iα中,加载关键字ti所对应的值,并保存。Step S102 is specifically: looping through each keyword t i in the query keyword q.ψ, and performing a process: first searching for the value of the keyword t i from the preset document inverted index table I, And saving, and then loading the value corresponding to the keyword t i from the preset radius word field table corresponding to all the query keywords, that is, the α-radius word neighborhood table I α , and saving.

步骤S103,具体为,建立一个字典结构Mq.ψ,结构为{节点,(t1,t2,...)}表示含有查询关键字的节点。Step S103, specifically, establishes a dictionary structure Mq.ψ , and the structure is {node, (t 1 , t 2 , . . . )} represents a node containing a query keyword.

步骤S104,具体为,初始化θ=+∞;θ为Ranking Score f的值。In step S104, specifically, θ=+∞ is initialized; θ is the value of Ranking Score f.

在步骤S105中,对于预设的空间索引是在查询之前建立好的,所以用于输入不同的查询关键字对应的空间索引是相同的。另外,位置节点队列Q中保存多个位置节点,所述多个位置节点是符合查询要求的,其中位置节点包含位置信息,而普通节点不包含位置信息。In step S105, the preset spatial index is established before the query, so the spatial index corresponding to input different query keywords is the same. In addition, the location node queue Q stores a plurality of location nodes, where the plurality of location nodes meet the query requirements, wherein the location node includes location information, and the ordinary node does not include location information.

步骤S106,具体为:查找循环条件为e=GETNEXT(Q,R,q),函数GETNEXT表示在R-tree上使用增强NN算法(Incremental NN Algorithm)查找节点e,若节点e是位置顶点,即包含空间位置信息的顶点,则进入步骤S107,若节点e不是位置顶点则进入步骤S111。Step S106, specifically: the search loop condition is e=GETNEXT(Q, R, q), and the function GETNEXT indicates that the node e is searched by using an enhanced NN algorithm on the R-tree, and if the node e is a position vertex, The vertex including the spatial position information proceeds to step S107, and if the node e is not the position vertex, the process proceeds to step S111.

当节点e是位置顶点,做以下处理:首先若节点e不符合删除规则1,则跳出本次循环,进入下次循环;再次回到步骤S106,开始新的循环;接着,执行函数GETSEMANTICPLACE,得到Tp 的值;若L(Tp)==+∞,说明没有找到,则跳出本次循环,进入下次循环;接着,计算L(Tp)的Ranking Score,并将L(Tp)和对应的Ranking Score f插入到Hk中,即Hk.add(Tp,f);最后,更新θ的值,进入步骤S115。When the node e is a position vertex, the following processing is performed: first, if the node e does not meet the deletion rule 1, the current loop is skipped and the next loop is entered; again, returning to step S106, a new loop is started; then, the function GETSEMANTICPLACE is executed to obtain value T p; if L (T p) == + ∞ , description is not found, out of this cycle, enters the next cycle; Next, L is calculated (T p) of the Ranking Score, and L (T p) And the corresponding Ranking Score f is inserted into H k , that is, H k .add(T p , f); finally, the value of θ is updated, and the flow proceeds to step S115.

当节点e是节点N时,循环遍历N下的每一个节点e,做以下操作;针对每个节点:首先,计算节点e的α-bound on the looseness

Figure PCTCN2017085896-appb-000018
删除规则2,接着计算节点e的α-bound on the ranking score
Figure PCTCN2017085896-appb-000019
删除规则3,最后根据删除规则2,3可以判定,当
Figure PCTCN2017085896-appb-000020
是,对应的节点e符合条件,可以插入队列Q,即
Figure PCTCN2017085896-appb-000021
to Q,直到
Figure PCTCN2017085896-appb-000022
进入步骤S115。When the node e is the node N, loop through each node e under N to do the following operations; for each node: First, calculate the α-bound on the looseness of the node e
Figure PCTCN2017085896-appb-000018
Delete rule 2, and then calculate the α-bound on the ranking score of node e
Figure PCTCN2017085896-appb-000019
Delete rule 3, and finally judge according to the deletion rule 2, 3, when
Figure PCTCN2017085896-appb-000020
Yes, the corresponding node e meets the condition and can be inserted into the queue Q, that is,
Figure PCTCN2017085896-appb-000021
To Q, until
Figure PCTCN2017085896-appb-000022
Go to step S115.

具体的,所述存放结果函数Hk中保存符合条件的语义位置QSP按照最紧凑的符合条件的语义位置的排序分数大小进行排序。Specifically, the conditional semantic position QSP in the storage result function Hk is sorted according to the sorting score size of the most compact and qualified semantic position.

在具体实施中,步骤S108,所述执行函数GETSEMANTICPLACE,如图2所示,具体包括:In a specific implementation, in step S108, the execution function GETSEMANTICPLACE, as shown in FIG. 2, specifically includes:

步骤S201,初始化子树Tp,其中,Tp表示以p为顶点,包含所有查询关键字的一颗子树;Step S201, the initialization sub-tree T p, where, T p p expressed as a vertex, a sub-tree containing all the query keywords;

步骤S202,初始化子树Tp的松散度值L(Tp)以使L(Tp)=1;Step S202, initializing the looseness value L(T p ) of the subtree T p such that L(T p )=1;

步骤S203,查询的关键字q.ψ保存到数字集B中;Step S203, the queried keyword q.ψ is saved in the digital set B;

步骤S204,从顶点p开始使用BFS(breadth-first-search)方式遍历RDF图且数字集B不为空;Step S204, starting from the vertex p, using the BFS (breadth-first-search) method to traverse the RDF graph and the number set B is not empty;

步骤S205,把BFS方式得到的节点v添加到子树Tp中;Step S205, the obtained embodiment BFS added to the subtree node v in T p;

步骤S206,在所述字典结构中查找节点v包含的查询关键字;Step S206, searching for a query keyword included in the node v in the dictionary structure;

步骤S207,判断节点v包含的查询关键字和数字集B的交集是否为空,如果否,进入步骤S208,如果是,进入步骤S209;Step S207, it is determined whether the intersection of the query keyword and the number set B included in the node v is empty, if not, proceeds to step S208, and if yes, proceeds to step S209;

步骤S208,如果否,输出L(Tp)=+∞和Tp=NULL;Step S208, if no, output L(T p )=+∞ and T p =NULL;

步骤S209,如果是,根据节点v包含的查询关键字和数字集B的交集中元素的个数以及节点v和顶点p之间的距离的得到子树Tp的松散度值L(Tp);Step S209, if yes, obtain the looseness value L(T p ) of the subtree T p according to the number of intersection elements of the query key and the number set B included in the node v and the distance between the node v and the vertex p. ;

步骤S210,从数字集B中删除节点v包含的查询关键字和数字集B的交集得到当前的数字集B;Step S210, deleting the intersection of the query key and the number set B included in the node v from the digital set B to obtain the current digital set B;

步骤S211,当前的数字集B是否为空,如果否,返回步骤S204,如果是,进入步骤S208, 也就是说,直到数字集B为空,输出L(Tp)=+∞和Tp=NULL。Step S211, whether the current number set B is empty, if not, returns to step S204, and if yes, proceeds to step S208, that is, until the number set B is empty, and outputs L(T p )=+∞ and T p = NULL.

具体的,步骤S209的计算公式如下:Specifically, the calculation formula of step S209 is as follows:

L(Tp)+=|B∩v.ψq|×d(p,v);L(T p )+=|B∩v.ψ q |×d(p,v);

其中,|B∩v.ψq|表示B和v.ψq的交集中元素的个数,d(p,v)为位置顶点p到节点v的最短距离。Where |B∩v.ψ q | represents the number of elements in the intersection of B and v.ψ q , and d(p, v) is the shortest distance from the position vertex p to the node v.

也就是说,KSP算法通过执行SP和GETSEMANTICPLACE两个函数以输出输入的查询关键字的返回结果,SP函数内部调用GETSEMANTICPLACE函数,函数SP(q,R,G,I,Ia)的内容具体为,步骤S101至步骤S115的过程。函数GETSEMANTICPLACE(q.ψ,p,G,Mq.ψ)的内容具体为,步骤S201至步骤S211的过程,该函数的作用是输出Tp和L(Tp),其中q表示查询序列,R表示预设的空间索引,G表示RDF图,I表示预设的文档倒排索引表,Iα表示预设的所有查询关键字对应的半径字领域表。That is to say, the KSP algorithm executes the SP and GETSEMANTICPLACE functions to output the returned result of the input query key. The SP function internally calls the GETSEMANTICPLACE function, and the content of the function SP(q, R, G, I, I a ) is specifically The process from step S101 to step S115. The content of the function GETSEMANTICPLACE(q.ψ, p, G, M q.ψ ) is specifically the process of step S201 to step S211, and the function of the function is to output T p and L(T p ), where q represents a query sequence. R denotes a preset spatial index, G denotes an RDF graph, I denotes a preset document inverted index table, and I α denotes a preset radius word domain table corresponding to all the query keywords.

在具体实施中,所述查询方法还包括以下步骤:创建预设的空间索引。具体的,在步骤S101之前,创建预设的空间索引。In a specific implementation, the query method further includes the following steps: creating a preset spatial index. Specifically, before step S101, a preset spatial index is created.

所述创建预设的空间索引的步骤,具体为:The step of creating a preset spatial index is specifically:

从RDF数据中提取含有坐标信息的数据,得到预设的空间索引。因为RDF图数据比较大,为了提高查询的速度,首先从RDF数据中提取含有坐标信息的数据,创建预设的空间索引R-tree使得查询的速度可以得到有效的提高。The data containing the coordinate information is extracted from the RDF data to obtain a preset spatial index. Because the RDF graph data is relatively large, in order to improve the speed of the query, the data containing the coordinate information is first extracted from the RDF data, and the preset spatial index R-tree is created, so that the query speed can be effectively improved.

在具体实施中,所述查询方法还包括以下步骤:对每个节点的文档中的关键字建立倒排索引以得到预设的文档倒排索引表,倒排索引表的格式为(关键字,节点),具体的,在步骤S101之前,对每个节点的文档中的关键字建立倒排索引以得到预设的文档倒排索引表I,文档倒排索引表的格式为(关键字,节点),如图3所示。In a specific implementation, the query method further includes the steps of: performing an inverted index on a keyword in a document of each node to obtain a preset document inverted index table, where the format of the inverted index table is (keyword, a node), specifically, before step S101, an inverted index is created for a keyword in a document of each node to obtain a preset document inverted index table I, and the format of the inverted index table of the document is (keyword, node) ),As shown in Figure 3.

在具体实施中,如图4所示,所述查询方法还包括以下步骤:In a specific implementation, as shown in FIG. 4, the query method further includes the following steps:

步骤S401,从顶点p开始使用BFS方式遍历RDF图;Step S401, traversing the RDF graph by using the BFS method from the vertex p;

步骤S402,当遍历到节点v时,遍历节点v的文档中的关键字t,若(t,d(p,v))在WN(p)中没有出现过,则将(t,d(p,v))添加到预设的所有查询关键字对应的半径字领域表中,其中WN(p)表示从p到每一个查询关键字ti的最短距离的集合{(ti,dg(p,ti))},dg(p,ti)≤α表示从根节点p到包含查询关键字ti的顶点的最短的距离,(t,d(p,v))表示顶点v对应的文档信息,包含查询关键字t;Step S402, when traversing to the node v, traversing the keyword t in the document of the node v, if (t, d(p, v)) does not appear in WN(p), then (t, d(p) , v)) added to the radius word field table corresponding to all the preset query keywords, where WN(p) represents the set of the shortest distance from p to each query key t i {(t i ,d g ( p, t i ))}, d g (p, t i ) ≤ α represents the shortest distance from the root node p to the vertex containing the query key t i , and (t, d(p, v)) represents the vertex v Corresponding document information, including a query keyword t;

步骤S403,当得到所有的叶子节点的半径字领域后,按照从叶子节点到非叶子节点的顺序,计算WN(N)的值,WN(N)表示非叶子节点N下所有的位置顶点{pj}对应的WN(pj)的联合,非叶子节点N下面包含一系列的位置顶点{pj}; Step S403, after obtaining the radius word fields of all the leaf nodes, calculating the value of WN(N) according to the order from the leaf node to the non-leaf node, and WN(N) indicating all the position vertices of the non-leaf node N {p j } corresponds to the union of WN(p j ), and the non-leaf node N contains a series of position vertices {p j };

步骤S404,对于非叶子节点N,{ei}表示N下的节点,WN(N)初始化为空,若(t,dg(ei,t))在WN(N)没有相应的值则将(t,dg(ei,t))添加到预设的所有查询关键字对应的半径字领域表中,若有值则将所述半径字邻域表中非叶子节点N对应的值更新为min(dg(N,t),dg(ei,t)),其中,(t,dg(ei,t))为关键字t在所述半径字邻域表中对应的值。Step S404, for the non-leaf node N, {e i } represents the node under N, WN(N) is initialized to be empty, and if (t, d g (e i , t)) has no corresponding value in WN(N) Add (t, d g (e i , t)) to the radius word field table corresponding to all the preset query keywords, and if there is a value, the value corresponding to the non-leaf node N in the radius word neighborhood table Updated to min(d g (N,t), d g (e i ,t)), where (t,d g (e i ,t)) is the keyword t corresponding to the radius word neighborhood table Value.

具体的,顶点p可以是根节点,如果是根节点,则包含位置信息。顶点p可以是包含位置信息的节点,所以它是包含位置信息的所有可能节点,包括根节点。另外,所述预设的所有查询关键字对应的半径字领域表:是根据树建立起来的,其中树中的有叶子节点p,和非叶子节点N,首先算叶子节点的值,然后算非叶子节点的值,所以是叶子节点到非叶子节点,是一种自下到上的顺序。比如,以下所示表格为所述预设的所有查询关键字对应的半径字领域表。Specifically, the vertex p may be a root node, and if it is a root node, location information is included. The vertex p can be a node containing location information, so it is all possible nodes containing location information, including the root node. In addition, the preset radius word domain table corresponding to all the query keywords is established according to a tree, wherein the leaf node p and the non-leaf node N in the tree first calculate the value of the leaf node, and then calculate the value. The value of the leaf node, so it is a leaf node to a non-leaf node, is a bottom-up order. For example, the table shown below is the radius word field table corresponding to all the preset query keywords.

q.ψQ.ψ abbeyAbbey ...... ancientAncient catholicCatholic romanRoman historyHistory ...... dg(p1,ti)d g (p 1 , t i ) 00 ...... 11 11 11 -- ...... dg(p2,ti)d g (p 2 , t i ) -- ...... -- 00 00 11 ...... dg(N,ti)d g (N, t i ) 00 ...... 11 00 00 11 ......

具体的,开始时初始化,使得所述预设的所有查询关键字对应的半径字领域表为空,步骤S401-步骤S404就是填充这张表。Specifically, the initialization is started, so that the radius word field table corresponding to all the preset query keywords is empty, and the step S401 - step S404 is to fill the table.

其中:abbey,ancient,catholic,roman,history为所述半径字领域表中关键字。(t,d(p,v)):表示顶点v的文档信息中,有查询关键字t,则对应的值就会插入到半径字邻域表中。(t,dg(ei,t)):表示关键字t,在半径字邻域表中对应的值。此处的ei就是上表中的p1,p2,即dg(ei,t)就是表中dg(p1,ti)对应的值。min(dg(N,t),dg(ei,t)):就是表中dg(N,ti)的值去掉所在列的最小值。比如,catholic所在的列,有值1,0所以min(dg(N,t),dg(ei,t))的值为0,也就是dg(N,ti)的值。Where: abbey, ancient, catholic, roman, history is the keyword in the radius word field table. (t, d(p, v)): In the document information indicating the vertex v, if there is a query keyword t, the corresponding value is inserted into the radius word neighborhood table. (t, d g (e i , t)): represents the value of the keyword t in the radius word neighborhood table. Here, e i is p 1 , p 2 in the above table, that is, d g (e i , t) is the value corresponding to d g (p 1 , t i ) in the table. Min(d g (N,t), d g (e i ,t)): is the value of d g (N, t i ) in the table to remove the minimum value of the column. For example, the column where catholic is located has a value of 1,0 so the value of min(d g (N,t), d g (e i ,t)) is 0, which is the value of d g (N,t i ).

由于在根据KSP算法构建TQSP的时候可能会遇到以下两种情况:(i)遍历完这个图后仍未找到包含输入查询关键字的Tp,(ii)找到了Tp,但是Tp的排序分数f大于θ(当前已经找到的第K个Tp的排序分数)。其中,TQSP为松散度最小的QSP针对情况。Since TQSP is constructed according to the KSP algorithm, the following two situations may be encountered: (i) T p containing the input query keyword is not found after traversing the graph, (ii) T p is found, but T p f [theta] is greater than the ranking score (current have been found in the K-th ranking score T p). Among them, TQSP is the case with the least loose QSP.

对于(i)的情况,在RDF图中,让

Figure PCTCN2017085896-appb-000023
表示以p为根节点子树不能包含所有的查询关键字。对于给定的查询关键字序列q.ψ,若
Figure PCTCN2017085896-appb-000024
此时的p节点为不符合条件节点。 For the case of (i), in the RDF diagram, let
Figure PCTCN2017085896-appb-000023
Indicates that p is the root node. The subtree cannot contain all the query keywords. For a given query keyword sequence q.ψ, if
Figure PCTCN2017085896-appb-000024
The p node at this time is a non-compliant node.

针对(ii)情况,为了提高算法的效率,定义WN(p),对位置顶点p,它的WN(p)表示从p到每一个查询关键字ti的最短距离的集合{(ti,dg(p,ti))},其中dg(p,ti)≤α,表示从根节点p到包含查询关键字ti的顶点的最短的距离。For the case of (ii), in order to improve the efficiency of the algorithm, WN(p) is defined, and for the position vertex p, its WN(p) represents the set of the shortest distance from p to each query key t i {(t i , d g (p, t i ))}, where d g (p, t i ) ≤ α, represents the shortest distance from the root node p to the vertex containing the query key t i .

根据上述中对一个点的字邻域描述,我们可以得出对节点N的字邻域的定义,如对R-tree中的顶点N。定义WN(N),对于R-tree的节点N下面包含一系列的位置顶点{pj},WN(N)为一系列的{(ti,dg(N,ti))},其中WN(N)是节点N下所有的位置顶点{pj}对应的WN(pj)的联合,其中对于每个关键字ti

Figure PCTCN2017085896-appb-000025
显然dg(N,ti)≤α。Based on the word neighborhood description of a point in the above, we can derive the definition of the word neighborhood of node N, such as the vertex N in the R-tree. Defining WN(N), for node R of R-tree, contains a series of position vertices {p j }, WN(N) is a series of {(t i , d g (N, t i ))}, where WN(N) is a union of WN(p j ) corresponding to all position vertices {p j } under node N, where for each keyword t i ,
Figure PCTCN2017085896-appb-000025
Obviously d g (N, t i ) ≤ α.

根据定义WN(p)和定义WN(N),可以创建节点对应的半径字领域表Iα即预设的所有查询关键字对应的半径字领域表(α-radius word neighborhood表),用于提高算法效率。According to the definition WN(p) and the definition WN(N), the radius word domain table I α corresponding to the node may be created, that is, the preset radius word domain table (α-radius word neighborhood table) corresponding to all the query keywords is used to improve Algorithm efficiency.

进一步,为了提高算法的效率,引理1:由于WN(p)表示位置定点p的半径字领域(α-radius word neighborhood)。对于给定的查询关键字q.ψ={t1,...,tj,...tm},为了不失一般性,假设第j个关键字在WN(p)已经有值,则以p为根节点的Tp的TQSP的半径字松散度(α-bound on the looseness)可表示为

Figure PCTCN2017085896-appb-000026
并且
Figure PCTCN2017085896-appb-000027
Further, in order to improve the efficiency of the algorithm, Lemma 1: Since WN(p) represents the alpha-radius word neighborhood of the position fix p. For a given query key q.ψ={t 1 ,...,t j ,...t m }, for the sake of generality, assume that the jth keyword already has a value in WN(p), Then the radius-word looseness (α-bound on the looseness) of T TSP of T p with p as the root node can be expressed as
Figure PCTCN2017085896-appb-000026
and
Figure PCTCN2017085896-appb-000027

引理2:由于

Figure PCTCN2017085896-appb-000028
表示以p为根节点的Tp的TQSP的半径字松散度,则对于给定的查询序列q对应的Tp的半径字排序分数(α-bound on the ranking score)可表示为
Figure PCTCN2017085896-appb-000029
Figure PCTCN2017085896-appb-000030
Lemma 2: Because
Figure PCTCN2017085896-appb-000028
P is represented by a radius looseness TQSP word root node of T p, the radius for a given word ranking score T p of the query sequence corresponding to q (α-bound on the ranking score ) may be expressed as
Figure PCTCN2017085896-appb-000029
And
Figure PCTCN2017085896-appb-000030

引理3:由于WN(N)表示顶点N的半径字领域(α-radius word neighborhood),查询关键字q.ψ={t1...,tj...,tm},为了不失一般性,假设第j个关键字在WN(N)中已经有相应的值,则在N节点下以p为根节点的Tp对应的所有的TQSP可表示为

Figure PCTCN2017085896-appb-000031
并且
Figure PCTCN2017085896-appb-000032
Lemma 3: Since WN(N) represents the alpha-radius word neighborhood of the vertex N, the query key q.ψ={t 1 ...,t j ...,t m }, in order not to loss of generality, assume that the j-th keyword in the WN (N) has a corresponding value, the node N p is the root node corresponding to T p can be expressed as all TQSP
Figure PCTCN2017085896-appb-000031
and
Figure PCTCN2017085896-appb-000032

引理4:由于

Figure PCTCN2017085896-appb-000033
表示以在节点N下以p为根节点的Tp的TQSP对应的半径字松散度,对于给定的查询序列q,则在节点N下以p为根节点的所有的Tp的半径字松散度可表示为
Figure PCTCN2017085896-appb-000034
其中S(q,N)表示q和N之间最小的空间距离,且
Figure PCTCN2017085896-appb-000035
Lemma 4: Because
Figure PCTCN2017085896-appb-000033
Represented at node N p is the root node in a T p of a radius corresponding to the word TQSP looseness, for a given query sequence q, at the node N p is the radius at the root of all the word loose T p Degree can be expressed as
Figure PCTCN2017085896-appb-000034
Where S(q,N) represents the smallest spatial distance between q and N, and
Figure PCTCN2017085896-appb-000035

根据引理2我们得出删除规则2,根据引理4得出删除规则3,来提高算法的效率,具体删除规则如下:According to Lemma 2, we get the deletion rule 2, and according to Lemma 4, the deletion rule 3 is obtained to improve the efficiency of the algorithm. The specific deletion rules are as follows:

删除规则2:对于给定的查询序列q,θ表示第k个候选TQSP的ranking score。

Figure PCTCN2017085896-appb-000036
表示以p为根节点的Tp的TQSP的α-bound on the ranking score。当
Figure PCTCN2017085896-appb-000037
时Tp不是我们要查的结果,p可以被删除。Delete rule 2: For a given query sequence q, θ represents the ranking score of the kth candidate TQSP.
Figure PCTCN2017085896-appb-000036
The α-bound on the ranking score of the TQSP representing T p with p as the root node. when
Figure PCTCN2017085896-appb-000037
When T p is not the result we want to check, p can be deleted.

删除规则3:对于给定的查询序列q,θ表示第k个候选TQSP的ranking score,

Figure PCTCN2017085896-appb-000038
表示节点N下以p为根节点的Tp的TQSP的α-bound on the ranking score,若
Figure PCTCN2017085896-appb-000039
则在N节点下任何节点都不满足条件,N可以被删除。Delete rule 3: For a given query sequence q, θ represents the ranking score of the kth candidate TQSP,
Figure PCTCN2017085896-appb-000038
The α-bound on the ranking score of the TQSP representing the T p at the node N with p as the root node.
Figure PCTCN2017085896-appb-000039
Then, under the N node, no node satisfies the condition, and N can be deleted.

基于KSP算法的资源描述框架查询方法,主要是实现图上关键字的搜索和RDF数据上关键字的搜索:由于关键字检索对用户的友好性,不仅用户检索网络数据,而且用于检索XML文档,关系型数据库,和图。传统上图的搜索算法将查询转化为在特征空间上的搜索,例如路径,频繁模式,和序列。这种搜索算法更多的关注图的结构而不是图的语义内容。图上关键字的查询通过利用内容和链接结构两者来确定图中一组密集链接的节点。由于这两种信息的重新实施,可提高结果的整体质量。而RDF数据上关键字的搜索,由于RDF数据是一种特殊类型的图数据也可以提供查询的效率。The resource description framework query method based on KSP algorithm mainly implements the search of keywords on the graph and the search of keywords on RDF data: due to the friendliness of keyword retrieval, not only users retrieve network data, but also retrieve XML documents. , relational databases, and graphs. Traditionally, the search algorithm converts queries into searches on feature spaces, such as paths, frequent patterns, and sequences. This search algorithm pays more attention to the structure of the graph than to the semantic content of the graph. The query of the keywords on the graph determines a set of densely linked nodes in the graph by utilizing both the content and the link structure. Due to the re-implementation of these two kinds of information, the overall quality of the results can be improved. The search for keywords on RDF data can also provide query efficiency because RDF data is a special type of graph data.

本发明还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述方法的步骤。The present invention also provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of the above method.

本发明的一实施例中提供一种基于资源描述框架的KSP算法查询系统,用于利用KSP算法在RDF图上搜索查询关键字的语义位置,如图5所示,所述查询系统包括:In an embodiment of the present invention, a KSP algorithm query system based on a resource description framework is provided for searching a semantic position of a query keyword on an RDF graph by using a KSP algorithm. As shown in FIG. 5, the query system includes:

第一初始化模块51,用于初始化存放结果函数Hk,其中存放结果函数Hk用于保存符合条件的语义位置QSP,符合条件的语义位置QSP为包含所有查询关键字的子树,k为符合条件的语义位置QSP的数量;The first initialization module 51 is configured to initialize the storage result function H k , wherein the storage result function H k is used to save the conditional semantic position QSP, and the qualified semantic position QSP is a subtree containing all the query keywords, where k is consistent The number of semantic locations of the conditional QSP;

循环遍历模块52,用于根据预设的文档倒排索引表和预设的所有查询关键字对应的半径字领域表,对用户输入的查询关键字中的每个关键字进行循环遍历,得到输入的查询关键字对应的倒排索引表,以及得到每个关键字对应的值,并加载在预设的所有查询关键字对应的半径字领域表中;The loop traversing module 52 is configured to cycle through each keyword in the query keyword input by the user according to the preset document inverted index table and the preset radius word domain table corresponding to all the query keywords, and obtain an input. The inverted index table corresponding to the query keyword, and the value corresponding to each keyword, and loaded in the radius word field table corresponding to all the preset query keywords;

构建模块53,用于根据所述输入的查询关键字和输入的查询关键字对应的倒排索引表,构建字典结构,其中所述字典结构表示含有所述输入的查询关键字的节点;The building module 53 is configured to construct a dictionary structure according to the input query keyword and the inverted index table corresponding to the input query keyword, where the dictionary structure represents a node that includes the input query keyword;

第二初始化模块54,用于初始化单调排序函数的值θ,其中单调排序函数表示对根据输入的查询关键字查找的多个最紧凑的符合条件的语义位置进行排序,最紧凑的符合条件的语义位置表示为以p为根节点的松散度最小的符合条件的语义位置;The second initialization module 54 is configured to initialize the value θ of the monotonic sorting function, wherein the monotonic sorting function indicates that the most compact and qualified semantic positions searched according to the input query keyword are sorted, and the most compact and qualified semantics The position is expressed as the semantic position with the least looseness of the root node with p;

生成队列模块55,用于预设的空间索引的根节点进入队列,得到位置节点队列Q;a generating queue module 55, the root node for the preset spatial index enters the queue, and obtains the location node queue Q;

查找循环模块56,用于根据输入的查询关键字和位置节点队列Q,遍历预设的空间索引 得到节点e,并对节点e进行判断;The lookup loop module 56 is configured to traverse the preset spatial index according to the input query keyword and the location node queue Q Obtaining node e and judging node e;

顶点处理模块57,于当节点e为包含空间位置信息的顶点时,判断节点e是否为不符合条件节点,若节点e为不符合条件节点则结束本次循环,进入下次循环,当节点e为包含空间位置信息的顶点,且节点e不是不符合条件节点,则执行函数GETSEMANTICPLACE,得到符合条件的语义位置的子树Tp和子树Tp的松散度值L(Tp),并判断是否为L(Tp)==+∞,如果是,则结束本次循环,如果否,计算松散度值L(Tp)的排序分数f,并将松散度值L(Tp)和对应排序分数f插入存放结果函数Hk且更新单调排序函数的值θ;The vertex processing module 57 determines whether the node e is a non-conforming condition node when the node e is a vertex including the spatial location information, and terminates the current loop if the node e is not in compliance with the conditional node, and enters the next loop, when the node e For the vertex containing the spatial position information, and the node e is not the non-conforming conditional node, the function GETSEMANTICPLACE is executed to obtain the looseness value L(T p ) of the subtree T p and the subtree T p of the semantic position that meet the condition, and determine whether of L (T p) == + ∞ , if so, the end of this cycle, and if not, calculating bulk value L (T p) sorted fraction F, and the bulk value L (T p) and the corresponding ordering The score f is inserted into the result function H k and updates the value θ of the monotonic sort function;

节点处理模块58,用于节点e为节点N时,循环遍历节点N下的每一个节点,计算N下每个节点e对应的半径字的松散度值

Figure PCTCN2017085896-appb-000040
和对应半径字的排序分数
Figure PCTCN2017085896-appb-000041
Figure PCTCN2017085896-appb-000042
时,则把对应节点e插入位置节点队列Q并进入查找循环模块,直到
Figure PCTCN2017085896-appb-000043
The node processing module 58 is configured to loop through each node under the node N when the node e is the node N, and calculate the looseness value of the radius word corresponding to each node e under N.
Figure PCTCN2017085896-appb-000040
And the sorting score of the corresponding radius word
Figure PCTCN2017085896-appb-000041
when
Figure PCTCN2017085896-appb-000042
Then, the corresponding node e is inserted into the location node queue Q and enters the lookup loop module until
Figure PCTCN2017085896-appb-000043

输出结果模块59,用于根据存放结果函数Hk向用户返回查询结果。The output result module 59 is configured to return a query result to the user according to the storage result function H k .

具体的,所述存放结果函数Hk中保存符合条件的语义位置QSP按照最紧凑的符合条件的语义位置的排序分数大小进行排序。Specifically, the conditional semantic position QSP in the storage result function Hk is sorted according to the sorting score size of the most compact and qualified semantic position.

在具体实施中,顶点处理模块57还用于:In a specific implementation, the vertex processing module 57 is further configured to:

初始化子树Tp,其中,Tp表示以p为顶点,包含所有查询关键字的一颗子树;Initializing a subtree T p , where T p represents a subtree containing p as a vertex and containing all query keywords;

初始化子树Tp的松散度L(Tp)以使L(Tp)=1;Initializing the looseness L(T p ) of the subtree T p such that L(T p )=1;

查询的关键字q.ψ保存到数字集B中;The keyword q.ψ of the query is saved to the number set B;

从顶点p开始使用BFS(breadth-first-search)方式遍历RDF图且数字集B不为空;Beginning with the vertex p using the BFS (breadth-first-search) method to traverse the RDF graph and the number set B is not empty;

把BFS方式得到的节点v添加到子树Tp中;BFS way to get added to the subtree of node v in T p;

在所述字典结构中查找节点v包含的查询关键字;Finding a query keyword included in the node v in the dictionary structure;

判断节点v包含的查询关键字和数字集B的交集是否为空;Determining whether the intersection of the query key and the number set B included in the node v is empty;

如果否,输出L(Tp)=+∞和Tp=NULL;If no, the output L(T p )=+∞ and T p =NULL;

如果是,根据节点v包含的查询关键字和数字集B的交集中元素的个数以及节点v和顶点p之间的距离的,得到子树Tp的松散度值L(Tp);If yes, according to the number of intersection elements of the query keyword and the number set B of the node v and the distance between the node v and the vertex p, the looseness value L(T p ) of the subtree T p is obtained;

从数字集B中删除节点v包含的查询关键字和数字集B的交集得到当前的数字集B;Deleting the intersection of the query key and the number set B contained in the node v from the digital set B to obtain the current digital set B;

当前的数字集B是否为空,如果否,执行从顶点p开始使用BFS(breadth-first-search)方式遍历RDF图且数字集B不为空的内容; Whether the current number set B is empty, and if not, performing content that traverses the RDF graph from the vertex p using the BFS (breadth-first-search) method and the number set B is not empty;

如果是,输出L(Tp)=+∞和Tp=NULL,也就是说,直到数字集B为空,输出L(Tp)=+∞和Tp=NULL。If so, the outputs L(T p )=+∞ and T p =NULL, that is, until the set of numbers B is empty, the outputs L(T p )=+∞ and T p =NULL.

具体的,根据节点v包含的查询关键字和数字集B的交集中元素的个数以及节点v和顶点p之间的距离的得到子树Tp的松散度L(Tp)的值计算公式如下:Specifically, the formula for calculating the value of the looseness L(T p ) of the subtree T p according to the number of intersection elements of the query key and the number set B of the node v and the distance between the node v and the vertex p is obtained. as follows:

L(Tp)+=|B∩v.ψq|×d(p,v);L(T p )+=|B∩v.ψ q |×d(p,v);

其中,|B∩v.ψq|表示B和v.ψq的交集中元素的个数,d(p,v)为顶点p到节点v的最短距离。Where |B∩v.ψ q | represents the number of elements in the intersection of B and v.ψ q , and d(p, v) is the shortest distance from vertex p to node v.

在具体实施中,所述查询系统还包括创建模块,用于创建预设的空间索引。具体的,创建模块还用于:In a specific implementation, the query system further includes a creating module for creating a preset spatial index. Specifically, the creation module is also used to:

从RDF数据中提取含有坐标信息的数据,得到预设的空间索引。因为RDF图数据比较大,为了提高查询的速度,首先从RDF数据中提取含有坐标信息的数据,创建预设的空间索引R-tree使得查询的速度可以得到有效的提高。The data containing the coordinate information is extracted from the RDF data to obtain a preset spatial index. Because the RDF graph data is relatively large, in order to improve the speed of the query, the data containing the coordinate information is first extracted from the RDF data, and the preset spatial index R-tree is created, so that the query speed can be effectively improved.

在具体实施中,创建模块还用于:对每个节点的文档中的关键字建立倒排索引以得到预设的文档倒排索引表,倒排索引表的格式为(关键字,节点),如图3所示。In a specific implementation, the creating module is further configured to: perform an inverted index on a keyword in a document of each node to obtain a preset reverse index table of the document, where the format of the inverted index table is (keyword, node), As shown in Figure 3.

在具体实施中,如图4所示,创建模块还用于:In a specific implementation, as shown in FIG. 4, the creation module is further used to:

从位置顶点p开始使用BFS方式遍历RDF图;Traversing the RDF graph using the BFS method starting from the position vertex p;

当遍历到节点v时,遍历节点v的文档中的关键字t,若(t,d(p,v))在WN(p)中没有出现过,则将(t,d(p,v))添加到预设的所有查询关键字对应的半径字领域表中,其中WN(p)表示从p到每一个查询关键字ti的最短距离的集合{(ti,dg(p,ti))},dg(p,ti)≤α表示从根节点p到包含查询关键字ti的顶点的最短的距离,(t,d(p,v))表示顶点v对应的文档信息,包含查询关键字t;When traversing to node v, traverse the keyword t in the document of node v, if (t,d(p,v)) does not appear in WN(p), then (t,d(p,v) ) is added to the radius word field table corresponding to all the preset query keywords, where WN(p) represents the set of the shortest distance from p to each query key t i {(t i ,d g (p,t i ))}, d g (p, t i ) ≤ α represents the shortest distance from the root node p to the vertex containing the query key t i , and (t, d(p, v)) represents the document corresponding to the vertex v Information, including the query keyword t;

得到所有的叶子节点的半径字领域后,按照从叶子节点到非叶子节点的顺序,计算WN(N)的值,WN(N)表示非叶子节点N下所有的位置顶点{pj}对应的WN(pj)的联合,非叶子节点N下面包含一系列的位置顶点{pj};After obtaining the radius word fields of all the leaf nodes, the value of WN(N) is calculated in the order from the leaf node to the non-leaf node, and WN(N) represents all the position vertices {p j } under the non-leaf node N. The union of WN(p j ), the non-leaf node N contains a series of position vertices {p j };

对于非叶子节点N,{ei}表示N下的节点,WN(N)初始化为空,若(t,dg(ei,t))在WN(N)没有相应的值则将(t,dg(ei,t))添加到预设的所有查询关键字对应的半径字领域表中,若有值则将所述半径字邻域表中非叶子节点N对应的值更新为min(dg(N,t),dg(ei,t)),其中,(t,dg(ei,t))为关键字t在所述半径字邻域表中对应的值。For a non-leaf node N, {e i } represents a node under N, and WN(N) is initialized to be empty. If (t, d g (e i , t)) has no corresponding value in WN(N), then (t , d g (e i , t)) is added to the radius word field table corresponding to all the preset query keywords, and if there is a value, the value corresponding to the non-leaf node N in the radius word neighborhood table is updated to min (d g (N, t), d g (e i , t)), where (t, d g (e i , t)) is the value corresponding to the keyword t in the radius word neighborhood table.

具体的,顶点p可以是根节点,如果是根节点,则包含位置信息。顶点p可以是包含位 置信息的节点,所以它是包含位置信息的所有可能节点,包括根节点。另外,所述预设的所有查询关键字对应的半径字领域表:是根据树建立起来的,其中树中的有叶子节点p,和非叶子节点N,首先算叶子节点的值,然后算非叶子节点的值,所以是叶子节点到非叶子节点,是一种自下到上的顺序。比如,以下所示表格为所述预设的所有查询关键字对应的半径字领域表。Specifically, the vertex p may be a root node, and if it is a root node, location information is included. Vertex p can be a containing bit The node that sets the information, so it is all possible nodes that contain location information, including the root node. In addition, the preset radius word domain table corresponding to all the query keywords is established according to a tree, wherein the leaf node p and the non-leaf node N in the tree first calculate the value of the leaf node, and then calculate the value. The value of the leaf node, so it is a leaf node to a non-leaf node, is a bottom-up order. For example, the table shown below is the radius word field table corresponding to all the preset query keywords.

q.ψQ.ψ abbeyAbbey ...... ancientAncient catholicCatholic romanRoman historyHistory ...... dg(p1,ti)d g (p 1 , t i ) 00 ...... 11 11 11 -- ...... dg(p2,ti)d g (p 2 , t i ) -- ...... -- 00 00 11 ...... dg(N,ti)d g (N, t i ) 00 ...... 11 00 00 11 ......

具体的,开始时初始化,使得所述预设的所有查询关键字对应的半径字领域表为空,创建模块的工作过程就是填充这张表。Specifically, the initialization is started, so that the radius word field table corresponding to all the preset query keywords is empty, and the working process of creating the module is to fill the table.

其中:abbey,aneient,catholic,roman,history为所述半径字领域表中关键字。(t,d(p,v)):表示顶点v的文档信息中,有查询关键字t,则对应的值就会插入到半径字邻域表中。(t,dg(ei,t)):表示关键字t,在半径字邻域表中对应的值。此处的ei就是上表中的p1,p2,即dg(ei,t)就是表中dg(p1,ti)对应的值。min(dg(N,t),dg(ei,t)):就是表中dg(N,ti)的值去掉所在列的最小值。比如,catholic所在的列,有值1,0所以min(dg(N,t),dg(ei,t))的值为0,也就是dg(N,ti)的值。Among them: abbey, aneient, catholic, roman, history is the keyword in the radius word field table. (t, d(p, v)): In the document information indicating the vertex v, if there is a query keyword t, the corresponding value is inserted into the radius word neighborhood table. (t, d g (e i , t)): represents the value of the keyword t in the radius word neighborhood table. Here, e i is p 1 , p 2 in the above table, that is, d g (e i , t) is the value corresponding to d g (p 1 , t i ) in the table. Min(d g (N,t), d g (e i ,t)): is the value of d g (N, t i ) in the table to remove the minimum value of the column. For example, the column where catholic is located has a value of 1,0 so the value of min(d g (N,t), d g (e i ,t)) is 0, which is the value of d g (N,t i ).

由于在根据KSP算法构建TQSP的时候可能会遇到以下两种情况:(i)遍历完这个图后仍未找到包含输入查询关键字的Tp,(ii)找到了Tp,但是Tp的排序分数f大于θ(当前已经找到的第K个Tp的排序分数)。其中,TQSP为松散度最小的QSP针对情况。Since TQSP is constructed according to the KSP algorithm, the following two situations may be encountered: (i) T p containing the input query keyword is not found after traversing the graph, (ii) T p is found, but T p f [theta] is greater than the ranking score (current have been found in the K-th ranking score T p). Among them, TQSP is the case with the least loose QSP.

对于(i)的情况,在RDF图中,让

Figure PCTCN2017085896-appb-000044
表示以p为根节点子树不能包含所有的查询关键字。对于给定的查询关键字序列q.ψ,若
Figure PCTCN2017085896-appb-000045
此时的p节点为不符合条件节点。For the case of (i), in the RDF diagram, let
Figure PCTCN2017085896-appb-000044
Indicates that p is the root node. The subtree cannot contain all the query keywords. For a given query keyword sequence q.ψ, if
Figure PCTCN2017085896-appb-000045
The p node at this time is a non-compliant node.

针对(ii)情况,为了提高算法的效率,定义WN(p),对位置顶点p,它的WN(p)表示从p到每一个查询关键字ti的最短距离的集合{(ti,dg(p,ti))},其中dg(p,ti)≤α,表示从根节点p到包含查询关键字ti的顶点的最短的距离。For the case of (ii), in order to improve the efficiency of the algorithm, WN(p) is defined, and for the position vertex p, its WN(p) represents the set of the shortest distance from p to each query key t i {(t i , d g (p, t i ))}, where d g (p, t i ) ≤ α, represents the shortest distance from the root node p to the vertex containing the query key t i .

根据上述中对一个点的字邻域描述,我们可以得出对节点N的字邻域的定义,如对R-tree中的顶点N。定义WN(N),对于R-tree的节点N下面包含一系列的位置顶点{pj},WN(N)为一 系列的{(ti,dg(N,ti))},其中WN(N)是节点N下所有的位置顶点{pj}对应的WN(pj)的联合,其中对于每个关键字ti

Figure PCTCN2017085896-appb-000046
显然dg(N,ti)≤α。Based on the word neighborhood description of a point in the above, we can derive the definition of the word neighborhood of node N, such as the vertex N in the R-tree. Defining WN(N), for node R of R-tree, contains a series of position vertices {p j }, WN(N) is a series of {(t i , d g (N, t i ))}, where WN(N) is a union of WN(p j ) corresponding to all position vertices {p j } under node N, where for each keyword t i ,
Figure PCTCN2017085896-appb-000046
Obviously d g (N, t i ) ≤ α.

根据定义WN(p)和定义WN(N),可以创建节点对应的半径字领域表Iα即预设的所有查询关键字对应的半径字领域表(α-radius word neighborhood表),用于提高算法效率。According to the definition WN(p) and the definition WN(N), the radius word domain table I α corresponding to the node may be created, that is, the preset radius word domain table (α-radius word neighborhood table) corresponding to all the query keywords is used to improve Algorithm efficiency.

进一步,为了提高算法的效率,引理1:由于WN(p)表示顶点p的半径字领域(α-radius word neighborhood)。对于给定的查询关键字q.ψ={t1,...,tj,...tm},为了不失一般性,假设第j个关键字在WN(p)已经有值,则以p为根节点的Tp的TQSP的半径字松散度(α-bound on the looseness)可表示为

Figure PCTCN2017085896-appb-000047
并且
Figure PCTCN2017085896-appb-000048
Further, in order to improve the efficiency of the algorithm, Lemma 1: Since WN(p) represents the alpha-radius word neighborhood of the vertex p. For a given query key q.ψ={t 1 ,...,t j ,...t m }, for the sake of generality, assume that the jth keyword already has a value in WN(p), Then the radius-word looseness (α-bound on the looseness) of T TSP of T p with p as the root node can be expressed as
Figure PCTCN2017085896-appb-000047
and
Figure PCTCN2017085896-appb-000048

引理2:由于

Figure PCTCN2017085896-appb-000049
表示以p为根节点的Tp的TQSP的半径字松散度,则对于给定的查询序列q对应的Tp的半径字排序分数(α-bound on the ranking score)可表示为
Figure PCTCN2017085896-appb-000050
Figure PCTCN2017085896-appb-000051
Lemma 2: Because
Figure PCTCN2017085896-appb-000049
P is represented by a radius looseness TQSP word root node of T p, the radius for a given word ranking score T p of the query sequence corresponding to q (α-bound on the ranking score ) may be expressed as
Figure PCTCN2017085896-appb-000050
And
Figure PCTCN2017085896-appb-000051

引理3:由于WN(N)表示顶点N的半径字领域(α-radius word neighborhood),查询关键字q.ψ={t1...,tj...,tm},为了不失一般性,假设第j个关键字在WN(N)中已经有相应的值,则在N节点下以p为根节点的Tp对应的所有的TQSP可表示为

Figure PCTCN2017085896-appb-000052
并且
Figure PCTCN2017085896-appb-000053
Lemma 3: Since WN(N) represents the alpha-radius word neighborhood of the vertex N, the query key q.ψ={t 1 ...,t j ...,t m }, in order not to loss of generality, assume that the j-th keyword in the WN (N) has a corresponding value, the node N p is the root node corresponding to T p can be expressed as all TQSP
Figure PCTCN2017085896-appb-000052
and
Figure PCTCN2017085896-appb-000053

引理4:由于

Figure PCTCN2017085896-appb-000054
表示以在节点N下以p为根节点的Tp的TQSP对应的半径字松散度,对于给定的查询序列q,则在节点N下以p为根节点的所有的Tp的半径字松散度可表示为
Figure PCTCN2017085896-appb-000055
其中S(q,N)表示q和N之间最小的空间距离,且
Figure PCTCN2017085896-appb-000056
Lemma 4: Because
Figure PCTCN2017085896-appb-000054
Represented at node N p is the root node in a T p of a radius corresponding to the word TQSP looseness, for a given query sequence q, at the node N p is the radius at the root of all the word loose T p Degree can be expressed as
Figure PCTCN2017085896-appb-000055
Where S(q,N) represents the smallest spatial distance between q and N, and
Figure PCTCN2017085896-appb-000056

根据引理2我们得出删除规则2,根据引理4得出删除规则3,来提高算法的效率,具体删除规则如下:According to Lemma 2, we get the deletion rule 2, and according to Lemma 4, the deletion rule 3 is obtained to improve the efficiency of the algorithm. The specific deletion rules are as follows:

删除规则2:对于给定的查询序列q,θ表示第k个候选TQSP的ranking score。

Figure PCTCN2017085896-appb-000057
表示以p为根节点的Tp的TQSP的α-bound on the ranking score。当
Figure PCTCN2017085896-appb-000058
时Tp不是我们要查的结果,p可以被删除。 Delete rule 2: For a given query sequence q, θ represents the ranking score of the kth candidate TQSP.
Figure PCTCN2017085896-appb-000057
The α-bound on the ranking score of the TQSP representing T p with p as the root node. when
Figure PCTCN2017085896-appb-000058
When T p is not the result we want to check, p can be deleted.

删除规则3:对于给定的查询序列q,θ表示第k个候选TQSP的ranking score,

Figure PCTCN2017085896-appb-000059
表示节点N下以p为根节点的Tp的TQSP的α-bound on the ranking score,若
Figure PCTCN2017085896-appb-000060
则在N节点下任何节点都不满足条件,N可以被删除。Delete rule 3: For a given query sequence q, θ represents the ranking score of the kth candidate TQSP,
Figure PCTCN2017085896-appb-000059
The α-bound on the ranking score of the TQSP representing the T p at the node N with p as the root node.
Figure PCTCN2017085896-appb-000060
Then, under the N node, no node satisfies the condition, and N can be deleted.

基于KSP算法的资源描述框架查询系统,主要是实现图上关键字的搜索和RDF数据上关键字的搜索:由于关键字检索对用户的友好性,不仅用户检索网络数据,而且用于检索XML文档,关系型数据库,和图。传统上图的搜索算法将查询转化为在特征空间上的搜索,例如路径,频繁模式,和序列。这种搜索算法更多的关注图的结构而不是图的语义内容。图上关键字的查询通过利用内容和链接结构两者来确定图中一组密集链接的节点。由于这两种信息的重新实施,可提高结果的整体质量。而RDF数据上关键字的搜索,由于RDF数据是一种特殊类型的图数据也可以提供查询的效率。The resource description framework query system based on KSP algorithm mainly implements the search of keywords on the graph and the search of keywords on RDF data: due to the friendliness of keyword retrieval, not only users retrieve network data, but also retrieve XML documents. , relational databases, and graphs. Traditionally, the search algorithm converts queries into searches on feature spaces, such as paths, frequent patterns, and sequences. This search algorithm pays more attention to the structure of the graph than to the semantic content of the graph. The query of the keywords on the graph determines a set of densely linked nodes in the graph by utilizing both the content and the link structure. Due to the re-implementation of these two kinds of information, the overall quality of the results can be improved. The search for keywords on RDF data can also provide query efficiency because RDF data is a special type of graph data.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.

尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。 Although the embodiments of the present invention have been shown and described, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the invention. The embodiments are subject to variations, modifications, substitutions and variations.

Claims (10)

一种基于KSP算法的资源描述框架查询方法,其特征在于:用于利用KSP算法在RDF图上搜索查询关键字的语义位置,所述查询方法包括以下步骤:A resource description framework query method based on KSP algorithm is characterized in that: for searching a semantic position of a query keyword on an RDF graph by using a KSP algorithm, the query method comprises the following steps: 初始化存放结果函数Hk,其中存放结果函数Hk用于保存符合条件的语义位置QSP,符合条件的语义位置QSP为包含所有查询关键字的子树,k为符合条件的语义位置QSP的数量;Initializing the result function H k , wherein the result function H k is used to save the conditional semantic position QSP, the qualified semantic position QSP is a subtree containing all the query keywords, and k is the number of eligible semantic positions QSP; 根据预设的文档倒排索引表和预设的所有查询关键字对应的半径字领域表,对用户输入的查询关键字中的每个关键字进行循环遍历,得到输入的查询关键字对应的倒排索引表,以及得到每个关键字对应的值,并加载在预设的所有查询关键字对应的半径字领域表中;According to the preset document inverted index table and the radius word field table corresponding to all the preset query keywords, each keyword in the query keyword input by the user is looped through, and the input query keyword is inverted. Arranging the index table, and obtaining the value corresponding to each keyword, and loading it in the radius word field table corresponding to all the preset query keywords; 根据所述输入的查询关键字和输入的查询关键字对应的倒排索引表,构建字典结构,其中所述字典结构表示含有所述输入的查询关键字的节点;Constructing a dictionary structure according to the input query keyword and the inverted index table corresponding to the input query keyword, wherein the dictionary structure represents a node containing the input query keyword; 初始化单调排序函数的值θ,其中单调排序函数表示对根据输入的查询关键字查找的多个最紧凑的符合条件的语义位置进行排序,最紧凑的符合条件的语义位置表示为以p为根节点的松散度最小的符合条件的语义位置;Initializing the value θ of the monotonic sort function, wherein the monotonic sort function represents sorting the plurality of most compact and eligible semantic positions searched according to the input query key, the most compact and eligible semantic position being represented by p as the root node The semantic position of the smallest looseness; 预设的空间索引中根节点进入队列,得到位置节点队列Q;The root node enters the queue in the preset spatial index, and obtains the location node queue Q; 根据输入的查询关键字和位置节点队列Q,遍历预设的空间索引得到节点e,并对节点e进行判断;According to the input query keyword and the location node queue Q, traversing the preset spatial index to obtain the node e, and determining the node e; 当节点e为包含空间位置信息的顶点,判断e是否为不符合条件节点,若e为不符合条件节点则结束本次循环,进入下次循环,当节点e为包含空间位置信息的顶点,且判断节点e不是不符合条件节点时,则执行函数GETSEMANTICPLACE,得到符合条件的语义位置的子树Tp和子树Tp的松散度值L(Tp),并判断是否为L(Tp)==+∞,如果是,则结束本次循环,如果否,计算松散度值L(Tp)的排序分数f,并将松散度值L(Tp)和对应排序分数f插入存放结果函数Hk且更新单调排序函数的值θ;When the node e is a vertex containing the spatial location information, it is determined whether e is a non-conforming conditional node. If e is a non-conforming conditional node, the current loop is terminated, and the next loop is entered, when the node e is a vertex containing the spatial location information, and When it is judged that the node e is not a non-conforming condition node, the function GETSEMANTICPLACE is executed to obtain the looseness value L(T p ) of the subtree T p and the subtree T p of the semantic position that meet the condition, and determine whether it is L(T p )= = + ∞, if so, the end of this cycle, and if not, calculating bulk value L (T p) a ranking score f, and the bulk value L (T p) and inserted into the corresponding ranking score for the result function H f k and update the value θ of the monotonic sorting function; 当节点e为节点N时,循环遍历节点N下的每一个节点,计算N下每个节点e对应的半径字的松散度值
Figure PCTCN2017085896-appb-100001
和对应半径字的排序分数
Figure PCTCN2017085896-appb-100002
Figure PCTCN2017085896-appb-100003
时,则把对应节点e插入位置节点队列Q并返回所述根据输入的查询关键字和位置节点队列Q,在预设的空间索引中查找节点e的步骤,直到
Figure PCTCN2017085896-appb-100004
When the node e is the node N, it loops through each node under the node N, and calculates the looseness value of the radius word corresponding to each node e under N.
Figure PCTCN2017085896-appb-100001
And the sorting score of the corresponding radius word
Figure PCTCN2017085896-appb-100002
when
Figure PCTCN2017085896-appb-100003
And inserting the corresponding node e into the location node queue Q and returning the step of searching for the node e in the preset spatial index according to the input query keyword and the location node queue Q until
Figure PCTCN2017085896-appb-100004
根据存放结果函数Hk向用户返回查询结果。The query result is returned to the user according to the stored result function H k .
如权利要求1所述的查询方法,其特征在于:所述执行函数GETSEMANTICPLACE,具体包括:The query method according to claim 1, wherein the execution function GETSEMANTICPLACE includes: 初始化子树Tp,其中,Tp表示以p为位置顶点,包含所有查询关键字的一颗子树;Initializing the subtree T p , where T p represents a subtree containing all the query keywords with p as the position vertex; 初始化子树Tp的松散度值L(Tp)以使L(Tp)=1;Initializing the looseness value L(T p ) of the subtree T p such that L(T p )=1; 查询的关键字q.ψ保存到数字集B中;The keyword q.ψ of the query is saved to the number set B; 从位置顶点p开始使用BFS方式遍历RDF图且数字集B不为空; The RDF graph is traversed from the position vertex p using the BFS method and the number set B is not empty; 把BFS方式得到的节点v添加到子树Tp中以得到子树TpBFS way to get added to the subtree of node v T p to obtain a sub-tree T p; 在所述字典结构中查找节点v包含的查询关键字;Finding a query keyword included in the node v in the dictionary structure; 判断节点v包含的查询关键字和数字集B的交集是否为空;Determining whether the intersection of the query key and the number set B included in the node v is empty; 如果否,输出L(Tp)=+∞和Tp=NULL;If no, the output L(T p )=+∞ and T p =NULL; 如果是,根据节点v包含的查询关键字和数字集B的交集中元素的个数以及节点v和位置顶点p之间的距离,得到子树Tp的松散度值L(Tp),以及从数字集B中删除节点v包含的查询关键字和数字集B的交集,返回到从位置顶点p开始使用BFS方式遍历RDF图且数字集B不为空的步骤,直到数字集B为空,输出L(Tp)=+∞和Tp=NULL。If yes, according to the number of intersection elements of the query keyword and the number set B included in the node v and the distance between the node v and the position vertex p, the looseness value L(T p ) of the subtree T p is obtained, and The intersection of the query key and the number set B contained in the node v is deleted from the number set B, and returns to the step of traversing the RDF graph using the BFS method from the position vertex p and the number set B is not empty until the number set B is empty. The output L(T p )=+∞ and T p =NULL. 如权利要求2所述的查询方法,其特征在于:根据节点v包含的查询关键字和数字集B的交集中元素的个数以及节点v和位置顶点p之间的距离的得到子树Tp的松散度值L(Tp)的计算公式如下:Query method according to claim 2, wherein: the sub-tree T p obtained according to the distance between the node and the number and position of the vertex v p v node contains the intersection query key element of the set B and digital The formula for calculating the looseness value L(T p ) is as follows: L(Tp)+=|B∩v.ψq|×d(p,v);L(T p )+=|B∩v.ψ q |×d(p,v); 其中,|B∩v.ψq|表示B和v.ψq的交集中元素的个数,d(p,v)为p到v的最短距离。Where |B∩v.ψ q | represents the number of elements in the intersection of B and v.ψ q , and d(p, v) is the shortest distance from p to v. 如权利要求1所述的查询方法,其特征在于:所述存放结果函数Hk中保存符合条件的语义位置QSP按照最紧凑的符合条件的语义位置的排序分数大小进行排序。The query method according to claim 1, wherein the conditional semantic position QSP in the deposit result function Hk is sorted according to the sorting score size of the most compact and eligible semantic position. 如权利要求1所述的查询方法,其特征在于:所述查询方法还包括以下步骤:创建预设的空间索引。The query method according to claim 1, wherein the query method further comprises the step of: creating a preset spatial index. 如权利要求5所述的查询方法,其特征在于:所述创建预设的空间索引的步骤,具体为:The method of claim 5, wherein the step of creating a preset spatial index is specifically: 从RDF数据中提取含有坐标信息的数据,得到预设的空间索引。The data containing the coordinate information is extracted from the RDF data to obtain a preset spatial index. 如权利要求1所述的查询方法,其特征在于:所述查询方法还包括以下步骤:对每个节点的文档中的关键字建立倒排索引以得到预设的文档倒排索引表,倒排索引表的格式为(关键字,节点)。The query method according to claim 1, wherein the query method further comprises the steps of: creating an inverted index for keywords in a document of each node to obtain a preset inverted index table of documents, and inverting The format of the index table is (keyword, node). 如权利要求1所述的查询方法,其特征在于:所述查询方法还包括以下步骤:The query method according to claim 1, wherein the query method further comprises the following steps: 从位置顶点p开始使用BFS方式遍历RDF图;Traversing the RDF graph using the BFS method starting from the position vertex p; 当遍历到节点v时,遍历节点v的文档中的关键字,若(t,d(p,v))在WN(p)中没有出现过,则将(t,d(p,v))添加到预设的所有查询关键字对应的半径字领域表中,其中WN(p)表示从位置顶点p到每一个查询关键字ti的最短距离的集合{(ti,dg(p,ti))},dg(p,ti)≤α表示从位置顶点p到包含查询关键字ti的顶点的最短的距离,(t,d(p,v))表示顶点v对应的文档信息,包 含查询关键字t;When traversing to node v, traverse the keywords in the document of node v, if (t,d(p,v)) does not appear in WN(p), then (t,d(p,v)) Add to the radius word field table corresponding to all the preset query keywords, where WN(p) represents the set of the shortest distance from the position vertex p to each query key t i {(t i ,d g (p, t i ))}, d g (p, t i ) ≤ α represents the shortest distance from the position vertex p to the vertex containing the query key t i , and (t, d(p, v)) represents the vertex v corresponding Document information, including the query keyword t; 当得到所有的叶子节点的半径字领域后,按照从叶子节点到非叶子节点的顺序,计算WN(N)的值,WN(N)表示非叶子节点N下所有的位置顶点{pj}对应的WN(pj)的联合,非叶子节点N下面包含一系列的位置顶点{pj};After obtaining the radius word fields of all the leaf nodes, the value of WN(N) is calculated in the order from the leaf node to the non-leaf node, and WN(N) represents all the position vertices {p j } corresponding to the non-leaf node N. The union of WN(p j ), the non-leaf node N contains a series of position vertices {p j }; 对于非叶子节点N,{ei}表示N下的节点,WN(N)初始化为空,若(t,dg(ei,t))在WN(N)没有相应的值则将(t,dg(ei,t))添加到预设的所有查询关键字对应的半径字领域表中,若有值则将所述半径字邻域表中非叶子节点N对应的值更新为min(dg(N,t),dg(ei,t)),其中,(t,dg(ei,t))为关键字t在所述半径字邻域表中对应的值。For a non-leaf node N, {e i } represents a node under N, and WN(N) is initialized to be empty. If (t, d g (e i , t)) has no corresponding value in WN(N), then (t , d g (e i , t)) is added to the radius word field table corresponding to all the preset query keywords, and if there is a value, the value corresponding to the non-leaf node N in the radius word neighborhood table is updated to min (d g (N, t), d g (e i , t)), where (t, d g (e i , t)) is the value corresponding to the keyword t in the radius word neighborhood table. 一种基于KSP算法的资源描述框架查询系统,其特征在于:用于利用KSP算法在RDF图上搜索查询关键字的语义位置,所述查询系统包括:A resource description framework query system based on KSP algorithm is characterized in that: for searching a semantic position of a query keyword on an RDF graph by using a KSP algorithm, the query system comprises: 第一初始化模块,用于初始化存放结果函数Hk,其中存放结果函数Hk用于保存符合条件的语义位置QSP,符合条件的语义位置QSP为包含所有查询关键字的子树,k为符合条件的语义位置QSP的数量;The first initialization module is configured to initialize the storage result function H k , wherein the storage result function H k is used to save the conditional semantic position QSP, the qualified semantic position QSP is a subtree containing all query keywords, and k is a condition The number of semantic locations QSP; 循环遍历模块,用于根据预设的文档倒排索引表和预设的所有查询关键字对应的半径字领域表,对用户输入的查询关键字中的每个关键字进行循环遍历,得到输入的查询关键字对应的倒排索引表,以及得到每个关键字对应的值,并加载在预设的所有查询关键字对应的半径字领域表中;The loop traversal module is configured to cycle through each keyword in the query keyword input by the user according to the preset document inverted index table and the preset radius word domain table corresponding to all the query keywords, and obtain the input Querying the inverted index table corresponding to the keyword, and obtaining the value corresponding to each keyword, and loading the radius word field table corresponding to all the preset query keywords; 构建模块,用于根据所述输入的查询关键字和输入的查询关键字对应的倒排索引表,构建字典结构,其中所述字典结构表示含有所述输入的查询关键字的节点;a building module, configured to construct a dictionary structure according to the input query keyword and the inverted index table corresponding to the input query keyword, wherein the dictionary structure represents a node that includes the input query keyword; 第二初始化模块,用于初始化单调排序函数的值θ,其中单调排序函数表示对根据输入的查询关键字查找的多个最紧凑的符合条件的语义位置进行排序,最紧凑的符合条件的语义位置表示为以p为根节点的松散度最小的符合条件的语义位置;a second initialization module for initializing the value θ of the monotonic sort function, wherein the monotonic sort function represents sorting the plurality of most compact and eligible semantic positions searched according to the input query keyword, the most compact and eligible semantic position Expressed as the semantic location with the least looseness of the root node with p as the root; 生成队列模块,用于预设的空间索引的根节点进入队列,得到位置节点队列Q;Generating a queue module, the root node for the preset spatial index enters the queue, and obtains the location node queue Q; 查找循环模块,用于根据输入的查询关键字和位置节点队列Q,遍历预设的空间索引得到节点e并对节点e进行判断;The lookup loop module is configured to: according to the input query keyword and the location node queue Q, traverse the preset spatial index to obtain the node e and determine the node e; 顶点处理模块,用于当节点e为包含空间位置信息的顶点时,判断节点e是否为不符合条件节点,若节点e为不符合条件节点则结束本次循环,进入下次循环,当节点e为包含空间位置信息的顶点,且节点e不是不符合条件节点,则执行函数GETSEMANTICPLACE,得到符合条件的语义位置的子树Tp和子树Tp的松散度值L(Tp),并判断是否为L(Tp)==+∞,如果是,则结束本次循环,如果否,计算松散度值L(Tp)的排序分数f,并将松散度值L(Tp)和对应排序分数f插入存放结果函数Hk且更新单调排序函数的值θ;The vertex processing module is configured to determine whether the node e is a non-conforming condition node when the node e is a vertex including the spatial location information, and if the node e is not in the conditional node, the current loop is ended, and the next loop is entered, when the node e For the vertex containing the spatial position information, and the node e is not the non-conforming conditional node, the function GETSEMANTICPLACE is executed to obtain the looseness value L(T p ) of the subtree T p and the subtree T p of the semantic position that meet the condition, and determine whether of L (T p) == + ∞ , if so, the end of this cycle, and if not, calculating bulk value L (T p) sorted fraction F, and the bulk value L (T p) and the corresponding ordering The score f is inserted into the result function H k and updates the value θ of the monotonic sort function; 节点处理模块,用于节点e为节点N时,循环遍历节点N下的每一个节点,计算N下每个节点e对应的半径字的松散度值
Figure PCTCN2017085896-appb-100005
和对应半径字的排序分数
Figure PCTCN2017085896-appb-100006
Figure PCTCN2017085896-appb-100007
时, 则把对应节点e插入位置节点队列Q并进入查找循环模块,直到
Figure PCTCN2017085896-appb-100008
The node processing module is configured to cycle through each node under the node N when the node e is the node N, and calculate the looseness value of the radius word corresponding to each node e under N
Figure PCTCN2017085896-appb-100005
And the sorting score of the corresponding radius word
Figure PCTCN2017085896-appb-100006
when
Figure PCTCN2017085896-appb-100007
Then, the corresponding node e is inserted into the location node queue Q and enters the lookup loop module until
Figure PCTCN2017085896-appb-100008
输出结果模块,用于根据存放结果函数Hk向用户返回查询结果。The output result module is configured to return a query result to the user according to the stored result function H k .
一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-8任意一项所述方法的步骤。 A computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the steps of the method of any of claims 1-8.
PCT/CN2017/085896 2017-05-25 2017-05-25 Ksp algorithm-based resource description framework query method and system Ceased WO2018214097A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/085896 WO2018214097A1 (en) 2017-05-25 2017-05-25 Ksp algorithm-based resource description framework query method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/085896 WO2018214097A1 (en) 2017-05-25 2017-05-25 Ksp algorithm-based resource description framework query method and system

Publications (1)

Publication Number Publication Date
WO2018214097A1 true WO2018214097A1 (en) 2018-11-29

Family

ID=64395136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/085896 Ceased WO2018214097A1 (en) 2017-05-25 2017-05-25 Ksp algorithm-based resource description framework query method and system

Country Status (1)

Country Link
WO (1) WO2018214097A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070073734A1 (en) * 2003-11-28 2007-03-29 Canon Kabushiki Kaisha Method of constructing preferred views of hierarchical data
CN101241502A (en) * 2008-03-13 2008-08-13 复旦大学 Keyword Search Clustering Method for XML Documents Based on Semantic Distance Model
CN101571866A (en) * 2009-05-27 2009-11-04 复旦大学 Keyword retrieval method and keyword retrieval device aiming at extensible marked language database
CN102163218A (en) * 2011-03-28 2011-08-24 武汉大学 Graph-index-based graph database keyword vicinity searching method
CN104361108A (en) * 2014-11-27 2015-02-18 广西师范学院 Correlation query method of data space

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070073734A1 (en) * 2003-11-28 2007-03-29 Canon Kabushiki Kaisha Method of constructing preferred views of hierarchical data
CN101241502A (en) * 2008-03-13 2008-08-13 复旦大学 Keyword Search Clustering Method for XML Documents Based on Semantic Distance Model
CN101571866A (en) * 2009-05-27 2009-11-04 复旦大学 Keyword retrieval method and keyword retrieval device aiming at extensible marked language database
CN102163218A (en) * 2011-03-28 2011-08-24 武汉大学 Graph-index-based graph database keyword vicinity searching method
CN104361108A (en) * 2014-11-27 2015-02-18 广西师范学院 Correlation query method of data space

Similar Documents

Publication Publication Date Title
US7246108B2 (en) Reusing optimized query blocks in query processing
CN104866593B (en) A kind of database search method of knowledge based collection of illustrative plates
CN101685444B (en) System and method for realizing metadata search
CN105706078A (en) Automatic definition of entity collections
CN105210058A (en) Graph query processing using plurality of engines
CN104239513A (en) Semantic retrieval method oriented to field data
CN113946600A (en) Data query method, data query device, computer equipment and medium
CN105630881A (en) Data storage method and query method for RDF (Resource Description Framework)
CN106156271A (en) Related information directory system based on distributed storage and foundation thereof and using method
CN105138588B (en) A kind of database overlap scheme abstraction generating method propagated based on multi-tag
CN107193882A (en) Why not query answer methods based on figure matching on RDF data
CN116108245B (en) Graph data query method and query engine
CN108694208A (en) Method and apparatus for constructs database
CN105868366A (en) Concept space navigation method based on concept association
CN116108194A (en) Search engine method, system, storage medium and electronic device based on knowledge map
Kim et al. Supporting set-valued joins in NoSQL using MapReduce
JP5470082B2 (en) Information storage search method and information storage search program
CN102270201B (en) Multi-dimensional indexing method and device for network files
CN107229704A (en) A kind of resource description framework querying method and system based on KSP algorithms
US11144552B2 (en) Supporting joins in directed acyclic graph knowledge bases
CN114911826A (en) A method and system for retrieving linked data
CN106933844B (en) Construction method of reachability query index facing large-scale RDF data
CN115982390B (en) A method for industrial chain construction and iterative expansion development
Medina et al. Evaluation of indexing strategies for possibilistic queries based on indexing techniques available in traditional RDBMS
WO2018214097A1 (en) Ksp algorithm-based resource description framework query method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17910532

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05/03//2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17910532

Country of ref document: EP

Kind code of ref document: A1