DE102021203300A1

DE102021203300A1 - Computer-implemented method for keyword searches in a knowledge graph

Info

Publication number: DE102021203300A1
Application number: DE102021203300.8A
Authority: DE
Inventors: Evgeny Kharlamov; Yuxuan Shi; Trung Kien TRAN
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2022-10-06
Also published as: CN115146022A

Abstract

Computerimplementiertes Verfahren (300), Einrichtung (200) und ein Computerprogramm für Schlüsselwortsuche in einer Datenmenge, wobei Daten der Datenmenge durch einen Wissensgraphen (100) dargestellt werden, wobei der Wissensgraph (100) Knoten, die Entitäten der Datenmenge darstellen, und Kanten, die Beziehungen zwischen den Entitäten darstellen, umfasst, wobei das Verfahren (300) die folgenden Schritte umfasst:Empfangen (302) einer Schlüsselwortabfrage, umfassend eine Menge von Schlüsselwörtern;Inrangfolgebringen (304) von Knoten des Wissensgraphen basierend auf einem kürzesten Pfad zu Schlüsselwortknoten, wobei ein Schlüsselwortknoten ein Knoten ist, der mit zumindest einem Schlüsselwort der Schlüsselwortabfrage übereinstimmt;Auswählen (306) der kleinsten Menge von Schlüsselwortknoten unter allen Mengen von Schlüsselwortknoten, umfassend zumindest einen Schlüsselwortknoten für jedes Schlüsselwort der Menge von Schlüsselwörtern.Bestimmen (308), basierend auf der kleinsten Menge von Schlüsselwortknoten und basierend auf einer Anzahl von in Rangfolge gebrachten Knoten, einer optimalen Pfadmenge, wobei die optimale Pfadmenge Pfade im Wissensgraphen (100) umfasst, die Knoten verbinden, die mit den Schlüsselwörtern der Schlüsselwortabfrage übereinstimmen und wobei die optimale Pfadmenge optimal bezüglich minimaler Kosten der Pfadmenge ist, wobei die Kosten auf einer Linearkombination des Gesamtgewichts von Knoten der optimalen Pfadmenge und paarweisen semantischen Abständen zwischen den Knoten der optimalen Pfadmenge basieren, undExtrahieren (310) einer Antwort auf die Schlüsselwortabfrage aus der optimalen Pfadmenge.Computer-implemented method (300), device (200) and a computer program for keyword searches in a dataset, data of the dataset being represented by a knowledge graph (100), the knowledge graph (100) having nodes representing entities of the dataset and edges representing Representing relationships between the entities, the method (300) comprising the steps of:receiving (302) a keyword query comprising a set of keywords;ranking (304) nodes of the knowledge graph based on a shortest path to keyword nodes, wherein a keyword node is a node that matches at least one keyword of the keyword query;selecting (306) the smallest set of keyword nodes among all sets of keyword nodes comprising at least one keyword node for each keyword of the set of keywords.determining (308) based on the smallest quantity of sc keyword nodes and based on a number of ranked nodes, an optimal path set, the optimal path set including paths in the knowledge graph (100) connecting nodes that match the keywords of the keyword query, and the optimal path set optimal in terms of minimum cost of the path set where the cost is based on a linear combination of the total weight of nodes of the optimal path set and pairwise semantic distances between the nodes of the optimal path set, and extracting (310) an answer to the keyword query from the optimal path set.

Description

Hintergrundbackground

Die Erfindung bezieht sich auf eine Einrichtung und ein Verfahren für Schlüsselwortsuche in einer Datenmenge, wobei Daten der Datenmenge durch einen Wissensgraphen dargestellt werden. Ein Ergebnis der Suche kann automatisch bestimmt werden durch Finden eines Untergraphen, der eine Kostenfunktion optimiert.The invention relates to a device and a method for keyword searches in a data set, data of the data set being represented by a knowledge graph. A result of the search can be automatically determined by finding a subgraph that optimizes a cost function.

Offenbarungepiphany

Eine Ausführungsform bezieht sich auf ein computerimplementiertes Verfahren für Schlüsselwortsuche in einer Datenmenge, wobei Daten der Datenmenge durch einen Wissensgraphen dargestellt werden, wobei der Wissensgraph Knoten, die Entitäten der Datenmenge darstellen, und Kanten, die Beziehungen zwischen den Entitäten darstellen, umfasst, wobei das Verfahren die folgenden Schritte umfasst:

Empfangen einer Schlüsselwortabfrage, umfassend eine Menge von Schlüsselwörtern;
Inrangfolgebringen von Knoten des Wissensgraphen basierend auf einem kürzesten Pfad zu Schlüsselwortknoten, wobei ein Schlüsselwortknoten ein Knoten ist, der mit zumindest einem Schlüsselwort der Schlüsselwortabfrage übereinstimmt;
Auswählen der kleinsten Menge von Schlüsselwortknoten unter allen Mengen von Schlüsselwortknoten, umfassend zumindest einen Schlüsselwortknoten für jedes Schlüsselwort der Menge von Schlüsselwörtern;
Bestimmen, basierend auf der kleinsten Menge von Schlüsselwortknoten und basierend auf einer Anzahl von in Rangfolge gebrachten Knoten, einer optimalen Pfadmenge, wobei die optimale Pfadmenge Pfade im Wissensgraphen umfasst, die Knoten verbinden, die mit den Schlüsselwörtern der Schlüsselwortabfrage übereinstimmen und wobei die optimale Pfadmenge optimal bezüglich minimaler Kosten der Pfadmenge ist, wobei die Kosten auf einer Linearkombination des Gesamtgewichts von Knoten der optimalen Pfadmenge und paarweisen semantischen Abständen zwischen den Knoten der optimalen Pfadmenge basieren, und
Extrahieren einer Antwort auf die Schlüsselwortabfrage aus der optimalen Pfadmenge.

One embodiment relates to a computer-implemented method for keyword searches in a dataset, data of the dataset being represented by a knowledge graph, the knowledge graph comprising nodes representing entities of the dataset and edges representing relationships between the entities, the method includes the following steps:

receiving a keyword query comprising a set of keywords;
ranking nodes of the knowledge graph based on a shortest path to keyword nodes, a keyword node being a node that matches at least one keyword of the keyword query;
selecting the smallest set of keyword nodes among all sets of keyword nodes, comprising at least one keyword node for each keyword of the set of keywords;
Determine, based on the smallest set of keyword nodes and based on a number of ranked nodes, an optimal path set, where the optimal path set includes paths in the knowledge graph connecting nodes that match the keywords of the keyword query and where the optimal path set is optimal with respect to the minimum cost of the path set, where the cost is based on a linear combination of the total weight of nodes of the optimal path set and pairwise semantic distances between the nodes of the optimal path set, and
Extracting an answer to the keyword query from the optimal path set.

Die kleinste Menge von Schlüsselwortknoten ist die kleinste Menge unter allen Mengen von Schlüsselwortknoten, die zumindest einen Schlüsselwortknoten für jedes Schlüsselwort der Menge von Schlüsselwörtern umfasst.The smallest set of keyword nodes is the smallest set among all sets of keyword nodes that includes at least one keyword node for each keyword of the set of keywords.

Basierend auf der kleinsten Menge von Schlüsselwortknoten und basierend auf einer Anzahl von in Rangfolge gebrachten Knoten wird eine optimale Pfadmenge bestimmt. Die Anzahl von in Rangfolge gebrachten Knoten kann angegeben sein. Die Anzahl kann beispielsweise in Abhängigkeit von der Größe des Wissensgraphen variieren.An optimal path set is determined based on the smallest set of keyword nodes and based on a number of ranked nodes. The number of nodes ranked may be specified. The number can vary depending on the size of the knowledge graph, for example.

Eine Pfadmenge wird als eine Menge von Pfaden betrachtet, die einen gemeinsamen Wurzelknoten mit Knoten verbinden, die mit den Schlüsselwörtern der Schlüsselwortabfrage übereinstimmen.A path set is considered to be a set of paths connecting a common root node to nodes matching the keywords of the keyword query.

Die optimale Pfadmenge wird als die Menge von Pfaden mit minimalen Kosten betrachtet.The optimal path set is considered to be the set of paths with minimum cost.

Die Antwort auf die Schlüsselwortabfrage basiert auf der optimalen Pfadmenge. Daher ist die Antwort selbst optimal hinsichtlich der minimalen Kosten.The answer to the keyword query is based on the optimal path set. Therefore, the answer itself is optimal in terms of minimum cost.

Die Kosten der optimalen Pfadmenge basieren auf einer Linearkombination des Gesamtgewichts von Knoten der optimalen Pfadmenge und paarweisen semantischen Abständen zwischen den Knoten der optimalen Pfadmenge.The cost of the optimal path set is based on a linear combination of the total weight of nodes of the optimal path set and pairwise semantic distances between the nodes of the optimal path set.

Durch Bestimmen der optimalen Pfadmenge basierend auf der kleinsten Menge von Schlüsselwortknoten und basierend auf einer Anzahl von in Rangfolge gebrachten Knoten verarbeitet das Verfahren nicht jeden Knoten im Wissensgraphen, sondern nur vielversprechende Knoten. Daher kann die Berechnung der Antwort effizient beschleunigt werden, während der Berechnungsaufwand verringert wird.By determining the optimal path set based on the smallest set of keyword nodes and based on a number of ranked nodes, the method does not process every node in the knowledge graph, only promising nodes. Therefore, the calculation of the response can be speeded up efficiently while reducing the amount of calculation.

Gemäß einer Ausführungsform stellt das Gesamtgewicht eines Knotens Salienz dar, und/oder der paarweise semantische Abstand zwischen zwei Knoten stellt eine semantische Zusammengehörigkeit dar. Gewichte werden in dem Wissensgraphen zu seinen Knoten und/oder Kanten zugeordnet. Semantische Abstände werden in dem Wissensgraphen zu Paaren seiner Knoten zugeordnet. Die Gewichte und semantischen Abstände werden vorzugsweise vorab berechnet und vorab gespeichert, beispielsweise in einem Hauptspeicher oder einer Datenbank, beispielsweise zusammen mit dem Wissensgraphen.According to one embodiment, the total weight of a node represents salience and/or the pairwise semantic distance between two nodes represents semantic belonging. Weights are assigned in the knowledge graph to its nodes and/or edges. Semantic distances are associated in the knowledge graph with pairs of its nodes. The weights and semantic distances are preferably pre-computed and pre-stored, e.g. in a main memory or a database, e.g. together with the knowledge graph.

Auffälligere (salientere) Graphenelemente umfassen kleinere Gewichte. Weniger auffällige Graphenelemente umfassen größere Gewichte. Semantischer Abstand sollte nicht mit Graphenabstand verwechselt werden. Tatsächlich können zwei Entitäten, die in der Graphenstruktur nahe beieinander sind, semantisch weit voneinander entfernt sein, z. B. zu nicht verwandten Themen im Wissensgraphen gehören. Stärker zusammengehörende Graphenelemente umfassen einen kleinen semantischen Abstand. Weniger zusammengehörende Graphenelemente umfassen einen größeren semantischen Abstand. Durch Einbeziehen des semantischen Abstands in das Verfahren kann die semantische Zusammengehörigkeit einer Antwort verbessert werden.More conspicuous (salient) graph elements include smaller weights. Less conspicuous graph elements include larger weights. Semantic spacing should not be confused with graph spacing. In fact, two entities that are close to each other in the graph structure can be semantically far apart, e.g. B. belong to unrelated topics in the knowledge graph. Graph elements that belong more closely together include a small semantic distance. Fewer related graph elements encompass a larger semantic distance. By including semantic distance in the method, the semantic coherence of a response can be improved.

Das Verfahren löst das Problem des Berechnens von semantisch zusammengehörenden Antworten auf Schlüsselwortabfragen in einer effizienten Weise.The method solves the problem of computing semantically related answers to keyword queries in an efficient manner.

Gemäß einer Ausführungsform umfasst der Schritt des Bestimmens einer optimalen Pfadmenge einen iterativen Prozess basierend auf dem Bestimmen einer lokal optimalen Pfadmenge, wobei die lokal optimale Pfadmenge Pfade im Wissensgraphen umfasst, die den Wurzelknoten mit Knoten verbinden, die mit den Schlüsselwörtern der Schlüsselwortabfrage übereinstimmen, und wobei die lokal optimale Pfadmenge optimal hinsichtlich minimaler Kosten der Pfadmenge für den entsprechenden Wurzelknoten ist, wobei der iterative Prozess die folgenden Schritte umfasst:

Bestimmen einer Gesamtmindestlänge eines Pfades der lokal optimalen Pfadmenge durch Bestimmen, für jeden Schlüsselwortknoten, des kürzesten Pfades vom entsprechenden Wurzelknoten;
Bestimmen einer unteren Grenze der Kosten der lokal optimalen Pfadmenge; Bestimmen der Kosten jedes Pfades, der größer als die Gesamtmindestpfadlänge ist;
Vergleichen der Kosten jedes Pfades, der größer als die Gesamtmindestpfadlänge ist, mit der unteren Grenze;
Erhalten der Pfadmenge mit den minimalen Kosten als die lokal optimale Pfadmenge.

According to one embodiment, the step of determining an optimal path set comprises an iterative process based on determining a locally optimal path set, where the locally optimal path set includes paths in the knowledge graph that connect the root node to nodes that match the keywords of the keyword query, and where the locally optimal path set is optimal in terms of minimum cost of the path set for the corresponding root node, where the iterative process includes the following steps:

determining an overall minimum length of a path of the locally optimal path set by determining, for each keyword node, the shortest path from the corresponding root node;
determining a lower bound on the cost of the locally optimal path set; determining the cost of each path that is greater than the total minimum path length;
comparing the cost of each path greater than the total minimum path length to the lower bound;
Obtain the minimum cost path set as the locally optimal path set.

Gemäß einer Ausführungsform umfasst der Schritt des Bestimmens einer optimalen Pfadmenge Folgendes:

Erhalten der lokal optimalen Pfadmenge mit den minimalen Kosten als die optimale Pfadmenge.

According to one embodiment, the step of determining an optimal path set includes:

Obtain the locally optimal path set with the minimum cost as the optimal path set.

Gemäß einer Ausführungsform wird der Schritt des Bestimmens, für einen Wurzelknoten des Wissensgraphen, einer lokal optimalen Pfadmenge mit zumindest einer Kürzungsstrategie verbessert.According to one embodiment, the step of determining, for a root node of the knowledge graph, a locally optimal set of paths is improved with at least one pruning strategy.

Gemäß einer Ausführungsform umfasst der Schritt des Extrahierens einer Antwort aus der optimalen Pfadmenge Folgendes:

Zusammenführen der Pfade der optimalen Pfadmenge in einen Untergraphen des Wissensgraphen.

According to one embodiment, the step of extracting an answer from the optimal path set includes:

Merging the paths of the optimal path set into a subgraph of the knowledge graph.

Gemäß einer Ausführungsform umfasst der Schritt des Extrahierens einer Antwort aus der optimalen Pfadmenge ferner Folgendes:

Entfernen unnötiger Knoten und Kanten aus dem Untergraphen.

According to one embodiment, the step of extracting an answer from the optimal path set further comprises:

Remove unnecessary nodes and edges from the subgraph.

Weitere Ausführungsformen beziehen sich auf eine Einrichtung zur Schlüsselwortsuche in einer Datenmenge, wobei Daten der Datenmenge durch einen Wissensgraphen dargestellt werden, wobei der Wissensgraph Knoten, die Entitäten der Datenmenge darstellen, und Kanten, die Beziehungen zwischen den Entitäten darstellen, umfasst, wobei die Einrichtung einen Eingang umfasst, der dazu ausgelegt ist, eine Schlüsselwortabfrage zu empfangen, die eine Menge von Schlüsselwörtern umfasst, und dazu ausgelegt ist, Schlüsselwörter auf Knoten des Wissensgraphen abzubilden,
wobei die Einrichtung ferner einen Prozessor umfasst, wobei der Prozessor ausgelegt ist zum Inrangfolgebringen der Knoten des Wissensgraphen basierend auf einem kürzesten Pfad zu Schlüsselwortknoten, wobei ein Schlüsselwortknoten ein Knoten ist, der mit zumindest einem Schlüsselwort der Schlüsselwortabfrage übereinstimmt; Auswählen der kleinsten Menge von Schlüsselwortknoten unter allen Mengen von Schlüsselwortknoten, umfassend zumindest einen Schlüsselwortknoten für jedes Schlüsselwort der Menge von Schlüsselwörtern; Bestimmen, basierend auf der kleinsten Menge von Schlüsselwortknoten, einer optimalen Pfadmenge, wobei die optimale Pfadmenge Pfade im Wissensgraphen umfasst, die Knoten verbinden, die mit den Schlüsselwörtern der Schlüsselwortabfrage übereinstimmen, und wobei die optimale Pfadmenge optimal hinsichtlich minimaler Kosten der Pfadmenge ist, wobei die Kosten auf einer Linearkombination des Gesamtgewichts von Knoten der optimalen Pfadmenge und paarweisen semantischen Abständen zwischen den Knoten der optimalen Pfadmenge basieren, und Extrahieren einer Antwort auf die Schlüsselwortabfrage aus der optimalen Pfadmenge, und wobei die Einrichtung einen Ausgang umfasst, der dazu ausgelegt ist, die Antwort auf ein Ergebnis der Schlüsselwortabfrage abzubilden, und dazu ausgelegt ist, das Ergebnis auszugeben.Further embodiments relate to a device for keyword searches in a data set, data in the data set being represented by a knowledge graph, the knowledge graph comprising nodes representing entities in the data set and edges representing relationships between the entities, the device comprising a comprises an input configured to receive a keyword query comprising a set of keywords and configured to map keywords to nodes of the knowledge graph,
the device further comprising a processor, the processor being configured to rank the nodes of the knowledge graph based on a shortest path to keyword nodes, a keyword node being a node associated with at least one keyword of the keywordab question matches; selecting the smallest set of keyword nodes among all sets of keyword nodes, comprising at least one keyword node for each keyword of the set of keywords; Determine, based on the smallest set of keyword nodes, an optimal path set, where the optimal path set includes paths in the knowledge graph that connect nodes that match the keywords of the keyword query, and where the optimal path set is optimal in terms of minimum cost of the path set, where the cost is based on a linear combination of the total weight of nodes of the optimal path set and pairwise semantic distances between the nodes of the optimal path set, and extracting an answer to the keyword query from the optimal path set, and wherein the facility comprises an output adapted to the answer to a result of the keyword query and is adapted to output the result.

Gemäß einer Ausführungsform ist die Einrichtung dazu ausgelegt, Schritte des Verfahrens entsprechend den beschriebenen Ausführungsformen auszuführen.According to one embodiment, the device is designed to carry out steps of the method according to the described embodiments.

Weitere Ausführungsformen beziehen sich auf ein Computerprogramm für Schlüsselwortsuche in einer Datenmenge, umfassend computerlesbare Anweisungen, die, wenn durch einen Computer ausgeführt, den Computer veranlassen, Schritte des beschriebenen Verfahrens auszuführen.Further embodiments relate to a computer program for keyword searching in a data set, comprising computer-readable instructions which, when executed by a computer, cause the computer to carry out steps of the described method.

Weitere Ausführungsformen sind aus der folgenden Beschreibung und der Zeichnung ableitbar. In der Zeichnung:

1 stellt einen beispielhaften Wissensgraphen dar;
2 stellt Aspekte einer Einrichtung für Schlüsselwortsuche dar, und
3 stellt Aspekte eines Verfahrens für Schlüsselwortsuche dar.

Further embodiments can be derived from the following description and the drawing. In the drawing:

1 depicts an example knowledge graph;
2 illustrates aspects of a keyword search facility, and
3 illustrates aspects of a method for keyword searching.

1 stellt einen beispielhaften Wissensgraphen, WG, 100 dar. 1 represents an example knowledge graph, WG, 100.

Der WG 100 umfasst einen ersten Knoten 102, einen zweiten Knoten 104, einen dritten Knoten 106, einen vierten Knoten 108, einen fünften Knoten 110, einen sechsten Knoten 112, einen siebten Knoten 114, einen achten Knoten 116, einen neunten Knoten 118, einen zehnten Knoten 120, einen elften Knoten 122 und einen zwölften Knoten 124. Eine Kante 126 des WG 100 beginnt beim Knoten 104 und endet beim Knoten 102. Eine Kante 128 des WG 100 beginnt beim Knoten 106 und endet beim Knoten 104. Eine Kante 130 des WG 100 beginnt beim Knoten 106 und endet beim Knoten 108. Eine Kante 132 des WG 100 beginnt beim Knoten 108 und endet beim Knoten 110. Weitere Kanten des WG 100 sind Kante 134 zwischen dem Knoten 104 und dem Knoten 112, Kante 136 zwischen dem Knoten 108 und dem Knoten 114, Kante 138 zwischen dem Knoten 108 und dem Knoten 116, Kante 140 zwischen dem Knoten 118 und dem Knoten 112, Kante 142 zwischen dem Knoten 112 und dem Knoten 120, Kante 144 zwischen dem Knoten 114 und dem Knoten 122, die Kante 146 zwischen dem Knoten 114 und dem Knoten 124, Kante 148 zwischen dem Knoten 114 und dem Knoten 116 und Kante 150 zwischen dem Knoten 120 und dem Knoten 122.The WG 100 includes a first node 102, a second node 104, a third node 106, a fourth node 108, a fifth node 110, a sixth node 112, a seventh node 114, an eighth node 116, a ninth node 118, a tenth node 120, an eleventh node 122 and a twelfth node 124. An edge 126 of the WG 100 begins at node 104 and ends at node 102. An edge 128 of the WG 100 begins at node 106 and ends at node 104. An edge 130 of the WG 100 begins at node 106 and ends at node 108. An edge 132 of WG 100 begins at node 108 and ends at node 110. Other edges of WG 100 are edge 134 between node 104 and node 112, edge 136 between the node 108 and node 114, edge 138 between node 108 and node 116, edge 140 between node 118 and node 112, edge 142 between node 112 and node 120, edge 144 between node 114 and node 122, the Edge 146 between node 114 and node 124, edge 148 between node 114 and node 116, and edge 150 between node 120 and node 122.

Der WG 100 kann mehr oder weniger Knoten und/oder mehr oder weniger Kanten umfassen. Im Beispiel stellt der WG 100 beispielhaftes Wissen dar. Zur Analyse von Daten aus anderen Gebieten, insbesondere technischen Gebieten, können entsprechende WGen verwendet werden.The WG 100 may include more or fewer nodes and/or more or fewer edges. In the example, WG 100 represents exemplary knowledge. Corresponding WGs can be used to analyze data from other areas, in particular technical areas.

In dem Beispiel werden die Informationen auf die Knoten und die Kanten entsprechend der folgenden Abbildung von Knotenreferenznummern auf Schlüsselwort und der folgenden Abbildung von Kantenreferenzzeichen auf Schlüsselwort abgebildet.

102: Republican Party (Republikanische Partei)
104: George H. W. Bush
106: Anne Hutchinson
108: Franklin D. Roosevelt
110: John Aspinwall Roosevelt
112: Barbara Bush
114: James Roosevelt
116: Democratic Party (Demokratische Partei)
118: The After Party: The last Party 3
120: Mercedes
122: Germany (Deutschland)
124: World War II (Zweiter Weltkrieg)
126: party (Partei)
128: descendant (Nachfahre)
130: descendant (Nachfahre)
132: child (Kind)
134: grandchild (Enkelkind)
136: child (Kind)
138: party (Partei)
140: starring (Hauptrolle)
142: drives (fährt)
144: visited (besuchte)
146: battle (kämpfen)
148: party (Partei)
150: made in (hergestellt in)

In the example, the information is mapped onto the nodes and the edges according to the following mapping from node reference numbers to keyword and the following mapping from edge reference characters to keyword.

102: Republican Party
104: George HW Bush
106: Anne Hutchinson
108: Franklin D Roosevelt
110: John Aspinwall Roosevelt
112: Barbara Bush
114: James Roosevelt
116: Democratic Party
118: The After Party: The Last Party 3
120: Mercedes
122: Germany (Germany)
124: World War II
126: party
128: descendant
130: descendant
132: child
134: grandchild
136: child
138: party
140: starring (main role)
142: drives
144: visited
146: battle
148: party
150: made in

Der WG 100 wird in dem Beispiel als eine Datenmenge für Schlüsselwortsuche in der Datenmenge verwendet. Die Erfindung ist nicht auf Schlüsselwörter beschränkt, die für einen Menschen lesbar oder verstehbar sind. Allgemeiner bezieht sich der Begriff Schlüsselwort in diesem Kontext auf ein beliebiges Symbol oder Muster in den Daten, das mit einem entsprechenden WG analysiert werden kann.The WG 100 is used as a data set for keyword searches in the data set in the example. The invention is not limited to keywords that are human readable or understandable. More generally, the term keyword in this context refers to any symbol or pattern in the data that can be analyzed with a corresponding WG.

Ein erster Untergraph 152, der ein beispielhaftes Ergebnis der Schlüsselwortsuche darstellt, umfasst den zweiten Knoten 104, den dritten Knoten 106, den vierten Knoten 108, den sechsten Knoten 112, den siebten Knoten 114 und die Kanten zwischen diesen Knoten. Der erste Untergraph 152 in diesem Beispiel stellt eine Antwort auf eine Abfrage dar, die durch ein erstes Schlüsselwort „Barbara Bush“ und ein zweites Schlüsselwort „James Roosevelt“ dargestellt wird.A first subgraph 152, representing an example keyword search result, comprises the second node 104, the third node 106, the fourth node 108, the sixth node 112, the seventh node 114 and the edges between these nodes. The first subgraph 152 in this example represents a response to a query represented by a first keyword "Barbara Bush" and a second keyword "James Roosevelt".

Ein zweiter Untergraph 154, der ein weiteres beispielhaftes Ergebnis der Schlüsselwortsuche darstellt, umfasst den sechsten Knoten 112, den siebten Knoten 114, den zehnten Knoten 120 und den elften Knoten 122 und die Kanten zwischen diesen Knoten. Der zweite Untergraph 154 in diesem Beispiel stellt eine weitere Antwort auf die Abfrage dar, die durch die Schlüsselwörter „Barbara Bush“ und „James Roosevelt“ dargestellt wird.A second subgraph 154, representing another example keyword search result, includes the sixth node 112, the seventh node 114, the tenth node 120, and the eleventh node 122 and the edges between those nodes. The second subgraph 154 in this example represents another response to the query represented by the keywords "Barbara Bush" and "James Roosevelt".

Aspekte einer Einrichtung 200 für Schlüsselwortsuche in der Datenmenge sind in 2 dargestellt.Aspects of a facility 200 for keyword searches in the dataset are in 2 shown.

Die Einrichtung 200 umfasst einen Eingang 202, einen Prozessor 204 und einen Ausgang 206. Der Eingang 202 im Beispiel stellt eine Schnittstelle für Schlüsselwörter der Daten bereit, nach denen gesucht werden soll. Der Prozessor 204 ist ausgelegt zum Bestimmen des ersten Untergraphen 152 und/oder des zweiten Untergraphen 154. Der Ausgang 206 ist dazu ausgelegt, die Antwort auf ein Ergebnis der Schlüsselwortabfrage abzubilden und dazu ausgelegt, das Ergebnis der Schlüsselwortsuche auszugeben.The device 200 includes an input 202, a processor 204 and an output 206. The input 202 in the example provides an interface for keywords of the data to be searched for. The processor 204 is arranged to determine the first subgraph 152 and/or the second subgraph 154. The output 206 is arranged to map the answer to a result of the keyword query and arranged to output the result of the keyword search.

Der WG 100 kann auf Speicher 208 in der Einrichtung 200 gespeichert sein. Der WG 100 kann auf Speicher gespeichert sein, der außerhalb der Einrichtung 200 ist. Datenverbindungen verbinden den Eingang 200 und den Prozessor 204, den Ausgang 206 und den Prozessor 204 und den Speicher 208 und den Prozessor 204. Computerlesbare Anweisungen können im Speicher 208 oder auf einem unterschiedlichen Speicher gespeichert sein. Der Prozessor 204 ist in dem Beispiel ausgelegt zum Ausführen der computerlesbaren Anweisungen zum Durchführen der Schlüsselwortsuche gemäß dem Verfahren, das nachfolgend Bezug nehmend auf 3 beschrieben wird.WG 100 may be stored on memory 208 in device 200 . The WG 100 may be stored on storage external to the device 200 . Data links connect input 200 and processor 204, output 206 and processor 204, and memory 208 and processor 204. Computer-readable instructions may be stored in memory 208 or on a different memory. The processor 204 is configured in the example to execute the computer-readable instructions for performing the keyword search according to the method described below with reference to FIG 3 is described.

Das Verfahren für Schlüsselwortsuche wird für einen Wissensgraphen G = 〈V, E〉 beschrieben, wobei V eine Menge von n numerischen Darstellungen von Knoten v₁ ..., v_n ist und E ⊆ V × V eine Menge von m numerischen Darstellungen von gerichteten Kanten ist, die Beziehungen zwischen Entitäten darstellen, die durch die Knoten dargestellt werden. Entitäten und Beziehungen können mit Text, z. B. ihren Namen, bezeichnet werden. Die Kanten im Graphen können in unterschiedliche Richtungen ausgerichtet sein. Im beispielhaften WG 100 ist n = 12 und m = 13.The method for keyword searching is described for a knowledge graph G = 〈V, E〉, where V is a set of n numeric representations of vertices v ₁ ..., v _n and E ⊆ V × V is a set of m numeric representations of directed ones Edges are the relationships between entities represented by the nodes. Entities and relationships can be tagged with text, e.g. B. their names are referred to. The edges in the graph can be oriented in different directions. In the exemplary WG 100, n = 12 and m = 13.

Die Schlüsselwortsuche basiert auf einer Schlüsselwortabfrage Q = (k₁,...,k_g}, die die g Schlüsselwörter k₁, ..., k_g umfasst.The keyword search is based on a keyword query Q=(k ₁ ,...,kg }, which includes the _g keywords k ₁ ,..., _kg .

In einem Schritt 302 wird eine Schlüsselwortabfrage Q, die eine Menge von Schlüsselwörtern k₁, ..., k_g umfasst, empfangen. Die Schlüsselwörter werden auf numerische Darstellungen von Knoten abgebildet. In dem Beispiel werden die g Schlüsselwörter k₁, ...,k_g auf g numerische Darstellungen von Knoten v₁...,v_g abgebildet. Ein Schlüsselwort kann auf mehrere Knoten abgebildet werden. In einem Aspekt wird zumindest eines der g Schlüsselwörter k₁,...,k_g auf zumindest eine numerische Darstellung von Knoten v₁,...,v_g abgebildet.In a step 302, a keyword query Q comprising a set of keywords k ₁ , ..., k _g is received. The keywords are mapped to numeric representations of nodes. In the example, the g keywords k ₁ ,...,k _g are mapped to g numeric representations of nodes v ₁ ...,v _g . A keyword can be mapped to multiple nodes. In one aspect, at least one of the g keywords k ₁ ,...,k _g is mapped to at least one numeric representation of nodes v ₁ ,...,v _g .

Um ein Ergebnis für jedes der g Schlüsselwörter k₁,...,k_g zu erhalten, wird jedes der g Schlüsselwörter k₁,...,k_g auf zumindest eine numerische Darstellung von Knoten v₁...,v_g abgebildet. Wenn ein Schlüsselwort nicht auf irgendeinen Knoten im Wissensgraphen abgebildet werden kann, kann das Ergebnis leer sein.In order to obtain a result for each of the g keywords k ₁ ,...,k _g , each of the g keywords k ₁ ,...,k _g is mapped to at least one numeric representation of nodes v ₁ ...,v _g . If a keyword cannot be mapped to any node in the knowledge graph, the result can be empty.

Eine Schlüsselwortabgleichsfunktion kann verwendet werden, um ein Schlüsselwort auf einen beliebigen Knoten des Wissensgraphen abzubilden. Beispielsweise kann die Abgleichsfunktion auf wörtlichen Bezeichnungen basieren, die ein Schlüsselwort umfassen. Allerdings ist die Erfindung nicht auf irgendeine spezifische Abbildungsfunktion beschränkt.A keyword matching function can be used to map a keyword to any node of the knowledge graph. For example, the matching function can be based on literal labels that include a keyword. However, the invention is not limited to any specific mapping function.

Gemäß dem Beispiel umfasst die Abfrage g = 2 Schlüsselwörter, und das erste Schlüsselwort k₁ =„Barbara Bush“ wird auf die numerische Darstellung des Knotens 112 abgebildet, und das zweite Schlüsselwort k₂ =„James Roosevelt“ wird auf die numerische Darstellung von Knoten 114 abgebildet.According to the example, the query includes g = 2 keywords, and the first keyword k ₁ ="Barbara Bush" maps to the numeric representation of node 112, and the second keyword k ₂ ="James Roosevelt" maps to the numeric representation of nodes 114 pictured.

In einem Aspekt können die Funktionstreffer: $K \to 2^{V}$

verwendet werden, um eine Menge IK von Schlüsselwörtern auf eine Untermenge der numerischen Darstellungen der Knoten des Wissensgraphen G abzubilden. Die konkrete Umsetzung von Treffern, d. h. die Art und Weise des Abgleichens von Schlüsselwörtern mit Bezeichnungen von Entitäten steht nicht im Fokus dieser Offenbarung. In diesem Aspekt werden Treffer(k_i) als K_i für 1 ≤ i ≤ g bezeichnet. Wobei K_i die numerischen Darstellungen einer Menge von Knoten sind, auf die das Schlüsselwort abgebildet wird, auch Schlüsselwortknoten genannt. Das Verfahren ist nicht auf diese Art und Weise der Abbildung beschränkt.In one aspect, the function hits can:

K \to 2^{V}

be used to map a set IK of keywords to a subset of the numerical representations of the knowledge graph G nodes. The specific implementation of hits, ie the manner in which keywords are matched with designations of entities, is not the focus of this disclosure. In this aspect, Hits(k _i ) are denoted as K _i for 1≦i≦g. Where K _i are the numeric representations of a set of nodes to which the keyword maps, also called keyword nodes. The method is not limited to this way of mapping.

In der Offenbarung wird Kantenabbildung ausgelassen, Kantenabbildung kann aber durch Unterteilen der Kanten in Knotenabbildung transformiert werden. Insbesondere ergibt die Unterteilung einer Kante (u, v) einen neuen Knoten w mit den Bezeichnungen der Kante (u, v) und ersetzt dann (u, v) durch zwei neue Kanten (u, w) und (w, v).In the disclosure, edge mapping is omitted, but edge mapping can be transformed into node mapping by dividing the edges. In particular, subdividing an edge (u, v) gives a new node w with the labels of the edge (u, v) and then replaces (u, v) with two new edges (u, w) and (w, v).

Für ein gegebenes G = (V, E) ist eine Antwort auf Q definiert als ein Untergraph von G, bezeichnet durch T = 〈V_T, E_T〉, wobei der Untergraph T die folgenden Anforderungen erfüllt. (1) T ist zusammenhängend. (2) T enthält zumindest einen Schlüsselwortknoten aus jedem K_i für 1 ≤ i ≤ g, d. h. T_T n K_i ≠ Ø. (3) T ist strukturell minimal für (1) und (2), d. h. keiner seiner echten Untergraphen erfüllt sowohl (1) als auch (2). Strukturelle Minimalität zeigt an, dass T eine Baumstruktur aufweist, wobei Blattknoten Schlüsselwortknoten sind.Given G = (V, E), a response to Q is defined as a subgraph of G denoted by T = 〈V _T , E _T 〉, where the subgraph T satisfies the following requirements. (1) T is connected. (2) T contains at least one keyword node from each K _i for 1 ≤ i ≤ g, ie T _T n K _i ≠ Ø. (3) T is structurally minimal for (1) and (2), ie none of its proper subgraphs satisfies both (1) and (2). Structural minimality indicates that T has a tree structure, with leaf nodes being keyword nodes.

Der Prozess des Berechnens der Antwort wird im Folgenden beschrieben.The process of calculating the answer is described below.

In einem Schritt 304 werden Knoten des Wissensgraphen basierend auf einem kürzesten Pfad zu Schlüsselwortknoten in eine Rangfolge gebracht, wobei ein Schlüsselwortknoten ein Knoten ist, der mit zumindest einem Schlüsselwort der Schlüsselwortabfrage übereinstimmt;In a step 304, nodes of the knowledge graph are ranked based on a shortest path to keyword nodes, where a keyword node is a node that matches at least one keyword of the keyword query;

In einem Schritt 306 wird eine kleinste Menge von Schlüsselwortknoten K_imin. unter allen Mengen von Schlüsselwortknoten, umfassend zumindest einen Schlüsselwortknoten für jedes Schlüsselwort der Menge von Schlüsselwörtern, ausgewählt.In a step 306, a smallest set of keyword nodes K _i _{at least} _. selected among all sets of keyword nodes comprising at least one keyword node for each keyword of the set of keywords.

In einem Schritt 308 wird eine optimale Pfadmenge im Wissensgraphen G basierend auf der kleinsten Menge von Schlüsselwortknoten K_imin und basierend auf einer Anzahl von in eine Rangfolge gebrachten Knoten bestimmt.In a step 308, an optimal path set in the knowledge graph G based on the smallest set of keyword nodes K _i _{at least} and determined based on a number of ranked nodes.

Die Anzahl von in Rangfolge gebrachten Knoten kann angegeben sein. Die Anzahl kann beispielsweise in Abhängigkeit von der Größe des Wissensgraphen variieren. Beispielsweise kann, für einen Wissensgraphen mit etwa 30.000 und 3.000.000 Knoten, diese Anzahl von 5 bis 20 reichen. Die Anzahl kann anhand von Experimenten angegeben werden. Es ist anzumerken, dass eine beliebige andere Anzahl ausgewählt werden kann.The number of nodes ranked may be specified. The number can vary depending on the size of the knowledge graph, for example. For example, for a knowledge graph with about 30,000 and 3,000,000 nodes, this number can range from 5 to 20. The number can be given based on experiments. It should be noted that any other number can be selected.

Die optimale Pfadmenge umfasst Pfade im Wissensgraphen, die Knoten verbinden, die mit den Schlüsselwörtern der Schlüsselwortabfrage übereinstimmen, und wobei die optimale Pfadmenge optimal hinsichtlich minimaler Kosten der Pfadmenge ist, wobei die Kosten auf einer Linearkombination des Gesamtgewichts von Knoten der optimalen Pfadmenge und paarweisen semantischen Abständen zwischen den Knoten der optimalen Pfadmenge basieren.The optimal path set includes paths in the knowledge graph that connect nodes that match the keywords of the keyword query, and where the optimal path set is optimal in terms of minimum cost of the path set, where the cost is based on a linear combination of the total weight of nodes of the optimal path set and pairwise semantic distances between the nodes of the optimal path set.

Die optimale Pfadmenge ist optimal hinsichtlich minimaler Kosten der Pfadmenge, wobei die Kosten auf einer Linearkombination des Gesamtgewichts von Knoten der optimalen Pfadmenge und paarweisen semantischen Abständen zwischen den Knoten der optimalen Pfadmenge basieren.The optimal path set is optimal in terms of minimum cost of the path set, where the cost is based on a linear combination of the total weight of nodes of the optimal path set and pairwise semantic distances between the nodes of the optimal path set.

Für G = 〈V, E〉 bildet eine Gewichtungsfunktion Knoten auf nicht-negative reelle Zahlen ab, bezeichnet durch wt: V ↦ ℝ_≥0.For G = 〈V, E〉, a weighting function maps nodes to non-negative real numbers, denoted by wt: V ↦ ℝ _≥0 .

Eine semantische Abstandsfunktion sd bildet Paare von Knoten auf nicht-negative reelle Zahlen ab, bezeichnet durch sd: V × V ↦ ℝ_≥0. Diese pseudometrische Funktion erfüllt, für alle u, v, w ∈ V: $s d (v, v) = 0,$

d. h. Ununterscheidbarkeit von identischen Werten,

s d (u, v) = s d (v, u),

d. h. Symmetrie und

s d (u, v) \leq s d (u, w) + s d (w, v),

d. h. Dreiecksungleichung.A semantic distance function sd maps pairs of nodes to non-negative real numbers, denoted by sd: V × V ↦ ℝ _≥0 . This pseudometric function satisfies for all u, v, w ∈ V:

s i.e (v, v) = 0,

i.e. indistinguishability of identical values,

s i.e (and, v) = s i.e (v, and),

i.e. symmetry and

s i.e (and, v) \leq s i.e (and, w) + s i.e (w, v),

ie triangle inequality.

Die Messung des semantischen Abstands kann unabhängig von der Graphenstruktur und den Knotengewichten sein. Insbesondere unterscheidet er sich vom Graphenabstand, namentlich der Anzahl von Kanten eines kürzesten Pfades. Beispielsweise können angrenzende Knoten semantisch entfernt voneinander sein.The measurement of semantic distance can be independent of graph structure and node weights. In particular, it differs from graph distance, namely the number of edges of a shortest path. For example, adjacent nodes may be semantically distant from each other.

Die Kosten einer Antwort T = 〈V_T, E_T〉 sind das Gesamtgewicht ihrer Knoten und ihrer paarweisen semantischen Abstände: $c o s t (T) = α \sum_{v \in V_{T}} w t (v) + (1 - α) \sum_{\begin{array}{l} v_{i}, v_{j} \in V_{T} \\ i < j \end{array}} s d (v_{i}, v_{j}),$

wobei α ∈ [0,1] ein Parameter ist. In der Kostengleichung stellt das erste Element die Salienz der Knoten von T dar, und das zweite Element charakterisiert ihre semantische Zusammengehörigkeit. Das Verfahren erfordert keine spezifische Implementierung des Gewichts wt und des semantischen Abstands sd. Das Gewicht wt und die Art und Weise, wie der semantische Abstand sd bestimmt wird, kann so ausgewählt werden, dass die Relevanz der Abfrage, die Zentralität in der Graphenstruktur, die Semantik in den Bezeichnungen usw. berücksichtigt werden. Edmund Ihler, 1991, The Complexity of Approximating the Class Steiner Tree Problem, in WG 1991, 85-96; https://doi.org/10.1007/3-540-55121-2_8 bietet ein Beispiel für das Gewicht wt; Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti und S. Sudarshan, 2002, Keyword Searching and Browsing in Databases using BANKS, in ICDE 2002, 431-440; https://doi.org/10.1109/ICDE.2002.994756 bietet ein Beispiel für den semantischen Abstand sd. Kleine Gewichte repräsentieren Salienz, und kleiner semantischer Abstand repräsentiert Zusammengehörigkeit. Die Berechnung des Gewichts wt und des semantischen Abstands sd können unabhängig voneinander sein. Im Wissensgraphen werden Gewichte zu ihren Knoten zugeordnet. Gewichte von Knoten werden beispielsweise vorab berechnet, z. B. unter Verwendung von normalisiertem pageRank.The cost of an answer T = 〈V _T , E _T 〉 is the total weight of its nodes and their pairwise semantic distances:

c O s t (T) = a \sum_{v \in V_{T}} w t (v) + (1 - a) \sum_{\begin{array}{l} v_{i}, v_{j} \in V_{T} \\ i < j \end{array}} s i.e (v_{i}, v_{j}),

where α ∈ [0,1] is a parameter. In the cost equation, the first element represents the salience of the nodes of T, and the second element characterizes their semantic togetherness. The method does not require a specific implementation of the weight wt and the semantic distance sd. The weight wt and the way the semantic distance sd is determined can be chosen such that the relevance of the query, the centrality in the graph structure, the Semantics in the designations etc. are taken into account. Edmund Ihler, 1991, The Complexity of Approximating the Class Steiner Tree Prob lem, in WG 1991, 85-96; https://doi.org/10.1007/3-540-55121-2_8 provides an example of the weight wt; Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti and S Sudarshan, 2002, Keyword Searching and Browsing in Databases using BANKS, in ICDE 2002, 431-440; https://doi.org/10.1109/ICDE.2002.994756 provides an example of the semantic distance sd. Small weights represent salience, and small semantic distance represents togetherness. The calculation of the weight wt and the semantic distance sd can be independent of each other. In the knowledge graph, weights are assigned to their nodes. For example, weights of nodes are precalculated, e.g. B. using normalized pageRank.

Im Wissensgraphen sind Kanten beispielsweise vorab berechnete Beziehungen zwischen Knoten.For example, in the knowledge graph, edges are precomputed relationships between nodes.

Im Wissensgraphen werden semantische Abstände beispielsweise für Paare seiner Knoten vorab berechnet.For example, in the knowledge graph, semantic distances are precomputed for pairs of its nodes.

Ein Ziel des Verfahrens 300 ist, eine optimale Antwort zu bestimmen, wobei eine optimale Antwort eine Antwort ist, die die Kosten minimiert. Das Verfahren 300 erweitert das bekannte GST-Mindestgewichtsproblem, beispielsweise beschrieben in Edmund Ihler, 1991, The Complexity of Approximating the Class Steiner Tree Problem, durch Einführen von quadratischen Termen sd(v_i, v_j) in die Zielfunktion, die zusätzliche Kosten darstellen, die zu bezahlen sind, wenn zwei Knoten v_i und v_j beide in T enthalten sind.A goal of the method 300 is to determine an optimal response, where an optimal response is a response that minimizes cost. The method 300 extends the well-known GST minimum weight problem, for example described in Edmund Ihler, 1991, The Complexity of Approximating the Class Steiner Tree Problem, by introducing quadratic terms sd(v _i ,v _j ) into the objective function that represent additional costs, to be paid when two vertices v _i and v _j are both contained in T.

Gemäß einem Aspekt der Offenbarung können die Schritte 304 bis 308 durch den folgenden Algorithmus 1 implementiert werden, umfassend die Zeilen 1 bis 14: $Eingabe : G = 〈 V, E 〉 und Q = {k_{1}, \dots, k_{g}}$

In accordance with an aspect of the disclosure, steps 304 through 308 may be implemented by the following Algorithm 1, comprising lines 1 through 14:

input : G = 〈 V, E 〉 and Q = {k_{1}, ..., k_{G}}

Der Schritt 308 und der Algorithmus OptimizedRPS(G, Q, r) aus Zeile 10 werden im Folgenden ausführlich und Bezug nehmend auf Algorithmus 2 beschrieben:

Der Schritt 308 des Bestimmens einer optimalen Pfadmenge umfasst einen iterativen Prozess 308a basierend auf dem Bestimmen einer lokal optimalen Pfadmenge, wobei die lokal optimale Pfadmenge Pfade im Wissensgraphen umfasst, die einen Wurzelknoten der kleinsten Menge von Schlüsselwortknoten mit Knoten verbinden, die mit den Schlüsselwörtern der Schlüsselwortabfrage übereinstimmen, und wobei die lokal optimale Pfadmenge optimal hinsichtlich minimaler Kosten der Pfadmenge für den entsprechenden Wurzelknoten ist, wobei der iterative Prozess 308a die folgenden Schritte umfasst:
- Bestimmen einer Gesamtmindestpfadlänge der lokal optimalen Pfadmenge durch Bestimmen, für jeden Schlüsselwortknoten, des kürzesten Pfades aus dem entsprechenden Wurzelknoten;
- Bestimmen einer unteren Grenze der Kosten der lokal optimalen Pfadmenge; Bestimmen der Kosten jedes Pfades, der größer als die
- Gesamtmindestpfadlänge ist;
- Vergleichen der Kosten jedes Pfades, der größer als die
- Gesamtmindestpfadlänge ist, mit der unteren Grenze;
- Erhalten der Pfadmenge mit minimalen Kosten als die lokal optimale Pfadmenge.

Step 308 and the OptimizedRPS(G,Q,r) algorithm of line 10 are described in detail below with reference to Algorithm 2:

The step 308 of determining an optimal path set includes an iterative process 308a based on determining a locally optimal path set, where the locally optimal path set includes paths in the knowledge graph that connect a root node of the smallest set of keyword nodes with nodes associated with the keywords of the keyword query and where the locally optimal path set is optimal in terms of minimum cost of the path set for the corresponding root node, the iterative process 308a comprising the steps of:
- determining an overall minimum path length of the locally optimal path set by determining, for each keyword node, the shortest path from the corresponding root node;
- determining a lower bound on the cost of the locally optimal path set; Determine the cost of any path greater than the
- total minimum path length is;
- Compare the cost of each path greater than that
- total minimum path length is, with the lower bound;
- Obtain the minimum cost path set as the locally optimal path set.

Der Schritt 308 des Bestimmens einer optimalen Pfadmenge umfasst ferner Folgendes:

The step 308 of determining an optimal path set further includes the following:

In G = (V, E) ist eine RPS eine Menge von Pfaden, die relevant für eine Abfrage Q = {k₁, ... ,k_g} sind. Speziell ist, bei gegebenem r ∈ V, das Wurzelknoten genannt wird, eine lokal optimale Pfadmenge r-RPS, bezeichnet durch P_r = {P₁,...,P_g}, eine Menge von g Pfaden, sodass für 1 ≤ i ≤ g jedes P_i = (V_Pi, E_Pi) ein r - K_i-Pfad ist, oder, noch spezieller, ein r - v_i-Pfad, der r mit einem Schlüsselwortknoten v_i ∈ K_i verbindet. Es ist anzumerken, dass es für i ≠ j möglich ist, dass K_i ∩ K_j ≠ 0 und P_i = P_j.In G = (V, E), an RPS is a set of paths relevant to a query Q = {k ₁ ,...,k _g }. In particular, given r ∈ V, called the root node, a locally optimal path set r-RPS, denoted by P _r = {P ₁ ,...,P _g }, is a set of g paths such that for 1 ≤ i ≤ g any P _i = (V _P _i , E _P _i ) is an r - K _i -path, or, more specifically, an r - vi _i _{-path connecting r to a keyword node vi ∈ K i} _. Note that for i ≠ j it is possible that K _i ∩ K _j ≠ 0 and P _i = P _j .

Kosten einer r-RPS, bezeichnet durch P_r = {P₁,..., P_g} mit P_i = 〈V_Pi, E_Pi〉 für 1 ≤ i ≤ g, ist durch die folgende Kostenfunktion gegeben: $p c o s t (P_{r}) = α (w t (r) + \sum_{P_{i} \in P_{r}} \sum_{v \in V_{P_{i} {r}}} w t (v)) + (1 - α) \frac{v n u m (P_{r})}{2} \sum_{P_{i} \in P_{r}} \sum_{v \in V_{P_{i} {r}}} s d (r, v) .$

Cost of an r-RPS, denoted by P _r = {P ₁ ,..., P _g } with P _i = 〈V _P _i , E _P _i 〉 for 1 ≤ i ≤ g, is given by the following cost function:

p c O s t (P_{right}) = a (w t (right) + \sum_{P_{i} \in P_{right}} \sum_{v \in V_{P_{i} {right}}} w t (v)) + (1 - a) \frac{v n and m (P_{right})}{2} \sum_{P_{i} \in P_{right}} \sum_{v \in V_{P_{i} {right}}} s i.e (right, v) .

Die Funktion vnum zählt die Knoten in einer RPS. Sie zählt den Wurzelknoten r absichtlich nur einmal: $v n u m (P_{r}) = 1 + \sum_{P_{i} \in P_{r}} | V_{P_{i}} \ {r} | .$

The vnum function counts the nodes in an RPS. It intentionally counts the root node r only once:

v n and m (P_{right}) = 1 + \sum_{P_{i} \in P_{right}} | V_{P_{i}} \ {right} | .

Für jeden Wurzelknoten r ∈ V wird eine lokal optimale Pfadmenge r-RPS, d. h. eine r-RPS, die pcost minimiert, wie folgt bestimmt: $P_{r}^{m i n} = a r g min_{P_{r}} p c o s t (P_{r})$

For each root node r ∈ V, a locally optimal path set r-RPS, i.e. an r-RPS that minimizes pcost, is determined as follows:

P_{right}^{m i n} = a right G \underset{P_{right}}{at least} p c O s t (P_{right})

Gemäß einem Aspekt der Offenbarung wird die folgende Variante von pcost definiert: $p c o s t' (P r) = p c o s t (P r) - α w t (r)$

According to one aspect of the disclosure, the following variant of pcost is defined:

p c O s t' (P right) = p c O s t (P right) - a w t (right)

Da αwt(r) in pcost jeder r-RPS auftaucht, ist Minimieren der pcost'-Funktion äquivalent zum Minimieren von pcost: $P_{r}^{m i n} = a r g min_{P_{r}} p c o s t (P_{r}) = a r g min_{P_{r}} p c o s t' (P_{r}) .$

Since αwt(r) appears in pcost of every r-RPS, minimizing the pcost' function is equivalent to minimizing pcost:

P_{right}^{m i n} = a right G \underset{P_{right}}{at least} p c O s t (P_{right}) = a right G \underset{P_{right}}{at least} p c O s t' (P_{right}) .

Das Minimieren der pcost'-Funktion berechnet die Summe der Kosten einer Menge von Pfaden.Minimizing the pcost' function calculates the sum of the costs of a set of paths.

pcost' enthält allerdings vnum(Pr), was von P_r abhängt und beim Berechnen eines Pfades mit minimalen Kosten unbekannt ist.However, pcost' contains vnum(Pr), which depends on P _r and is unknown when computing a minimum cost path.

Daher wird jeder mögliche Wert von vnum(Pr) berücksichtigt und wird daher in jedem Fall eine Konstante.Therefore, every possible value of vnum(Pr) is considered and therefore becomes a constant in every case.

P_r unterliegt beispielsweise vnum(P_r) = n. Für einen Pfad P = 〈V_P, E_p〉, der mit dem Wurzelknoten r beginnt, ist definiert: $p l_{n} (P) = \sum_{v \in V_{P} \ {r}} α w t (v) + n \frac{1 - α}{2} s d (r, v) .$

For example, P _r is subject to vnum(P _r ) = n. For a path P = 〈V _P , E _p 〉 starting with the root node r, we have defined:

p l_{n} (P) = \sum_{v \in V_{P} \ {right}} a w t (v) + n \frac{1 - a}{2} s i.e (right, v) .

Für v ∈ V wird das Minimum pln von r-v-Pfaden, die exakt m Kanten umfassen, iterativ wie folgt berechnet: $p d_{n} [v] [m] = {\begin{matrix} 0 & m = 0, v = r, \\ \infty & m = 0, v \neq r \\ min_{u \in N (v)} p d_{n} [u] [m - 1] + α w t (v) + n \frac{1 - α}{2} s d (r, v) & m > 0 \end{matrix}$

wobei N(v) die Menge der Nachbarn von v in G ist. Das Minimum pln von r-K_i-Pfaden, die exakt m Kanten umfassen, wird bezeichnet durch

p d k_{n} [i] [m] = min_{v \in K_{i}} p d_{n} [v] [m]

For v ∈ V the minimum pln of rv-paths, which contain exactly m edges, is computed iteratively as follows:

p {i.e}_{n} [v] [m] = {\begin{matrix} 0 & m = 0, v = right, \\ \infty & m = 0, v \neq right \\ \underset{and \in N (v)}{at least} p {i.e}_{n} [and] [m - 1] + a w t (v) + n \frac{1 - a}{2} s i.e (right, v) & m > 0 \end{matrix}

where N(v) is the set of neighbors of v in G. The minimum pln of rK _i paths spanning exactly m edges is denoted by

p i.e k_{n} [i] [m] = \underset{v \in K_{i}}{at least} p {i.e}_{n} [v] [m]

Für 1 ≤ 1 ≤ g seien Q_I ⊆ Q die ersten I Schlüsselwörter in Q: $Q_{I} = {k_{1}, \dots, k_{I}} .$

For 1 ≤ 1 ≤ g let Q _I ⊆ Q be the first I keywords in Q:

Q_{I} = {k_{1}, ..., k_{I}} .

Die minimalen pcost' von r-RPSen, die insgesamt exakt m Kanten enthalten und relevant für Q, sind (d. h. I Pfade umfassen - einen r-K_i -Pfad für jedes 1 ≤ i ≤ I) werden iterativ berechnet durch: $p c_{n} [I] [m] = {\begin{matrix} p d k_{n} [I] [m] & I = 1 \\ min_{0 \leq x \leq m i n {m, | V | - 1}} p c_{n} [I - 1] [m - x] + p d k_{n} [I] [x] & I > 1. \end{matrix}$

The minimum pcost' of r-RPSs that contain a total of exactly m edges and are relevant to Q i (i.e., comprise I paths - one rK _i -path for each 1 ≤ i ≤ I) are computed iteratively by:

p c_{n} [I] [m] = {\begin{matrix} p i.e k_{n} [I] [m] & I = 1 \\ \underset{0 \leq x \leq m i n {m, | V | - 1}}{at least} p c_{n} [I - 1] [m - x] + p i.e k_{n} [I] [x] & I > 1. \end{matrix}

Da P_r mit vnum(P_r) = n insgesamt n - 1 Kanten enthält, wird pc_n [g] [n - 1] berücksichtigt. Schließlich wird über alle möglichen Werte von vnum(P_r), erhalten. $min_{P_{r}} p c o s t' (P_{r}) = min_{1 \leq n \leq 1 + g (| V | - 1)} p c_{n} [g] [n - 1]$

Since P _r with vnum(P _r ) = n contains a total of n - 1 edges, pc _n [g] [n - 1] is considered. Finally, over all possible values of vnum(P _r ), is obtained.

\underset{P_{right}}{at least} p c O s t' (P_{right}) = \underset{1 \leq n \leq 1 + G (| V | - 1)}{at least} p c_{n} [G] [n - 1]

Für jedes n wird pc_n [g] [n - 1] berechnet, und P_r - die r-RPS mit tatsächlich minimalem pcost' mit vnum = n - wird rekonstruiert. Die Rekonstruktion kann durch Nachschlagen von zusätzlichen Feldern implementiert werden, in denen berechnete Pfade mit minimalen Kosten und RPSen in einer Standardweise aufgezeichnet werden.For each n, pc _n [g][n-1] is computed and P _r - the r-RPS with actual minimum pcost' with vnum = n - is reconstructed. The reconstruction can be implemented by looking up additional fields in which computed minimum cost paths and RPSs are recorded in a standard manner.

Abschließend wird $P_{r}^{m i n}$

aktualisiert, und die r-RPS mit lokal minimalem pcost wird als optimale Pfadmenge P^# zurückgegeben, wobei P^# die RPS mit global minimalem pcost bezeichnet.Finally will

P_{right}^{m i n}

updated, and the r-RPS with locally minimum pcost is returned as the optimal path set P ^# , where P ^# denotes the RPS with globally minimum pcost.

Dies erfolgt durch Schritt 308 des Erhaltens der lokal optimalen Pfadmenge r-RPS mit den minimalen Kosten als die optimale Pfadmenge P^#.This is done by step 308 of obtaining the locally optimal path set r-RPS with the minimum cost as the optimal path set P ^# .

Gemäß einem Aspekt der Offenbarung kann der Schritt 308, insbesondere der iterative Prozess 308a von Schritt 308, durch den folgenden Algorithmus 2 implementiert werden, umfassend die Zeilen 1 bis 27: $Eingabe : G = 〈 V, E 〉, Q = {k_{1}, \dots, k_{g}},$

und r ∈ V Ausgabe: eine r-RPS

P_{r}^{m i n}

mit lokal minimalem pcost

In accordance with one aspect of the disclosure, step 308, particularly iterative process 308a of step 308, may be implemented by the following Algorithm 2, comprising lines 1 through 27:

input : G = 〈 V, E 〉, Q = {k_{1}, ..., k_{G}},

and r ∈ V output: an r-RPS

P_{right}^{m i n}

with locally minimal pcost

Zeilen 3 bis 5 des Algorithmus 2 beziehen sich auf den Schritt des Bestimmens einer Gesamtmindestlänge eines Pfades der lokal optimalen Pfadmenge durch Bestimmen, für jeden Schlüsselwortknoten, des kürzesten Pfades vom entsprechenden Wurzelknoten.Lines 3 to 5 of Algorithm 2 relate to the step of determining an overall minimum length of a path of the locally optimal path set by determining, for each keyword node, the shortest path from the corresponding root node.

Zeile 6 bezieht sich auf den Schritt des Bestimmens einer unteren Grenze der Kosten der lokal optimalen Pfadmenge. Dies ist eine Maßnahme, um große Werte von n zu kürzen, siehe Zeilen 1 bis 5. Im oben angezeigten beispielhaften Algorithmus für Schritt 306 endet die äußerste Schleife in Zeile 7 mit n = (|V| - 1), was ein großer Wert sein kann. Eine untere Grenze für pcost(P_r) mit vnum(P_r) = n zum Kürzen großer Werte von n wird berechnet. pcost(P_r) ist gegeben durch $\begin{matrix} p c o s t (P r) = α w t (r) + p c o s t' (P r) \\ p c o s t' (P r) = \sum_{P_{i} \in P_{r}} \sum_{v \in V_{P_{i}} \ {r}} α w t (v) + n \frac{1 - α}{2} s d (r, v) . \end{matrix}$

Line 6 refers to the step of determining a lower bound on the cost of the locally optimal path set. This is a measure to truncate large values of n, see lines 1 through 5. In the exemplary algorithm for step 306 shown above, the outermost loop ends in line 7 with n = (|V| - 1), which can be a large value can. A lower bound on pcost(P _r ) with vnum(P _r ) = n for truncating large values of n is computed. pcost(P _r ) is given by

\begin{matrix} p c O s t (P right) = a w t (right) + p c O s t' (P right) \\ p c O s t' (P right) = \sum_{P_{i} \in P_{right}} \sum_{v \in V_{P_{i}} \ {right}} a w t (v) + n \frac{1 - a}{2} s i.e (right, v) . \end{matrix}

Die Funktion pcost' berechnet die Summe der Kosten von g Pfaden - ein r-K_i-Pfad für jedes 1 ≤ i ≤ g.The function pcost' calculates the sum of the costs of g paths - an rK _i -path for every 1 ≤ i ≤ g.

Durch Abbilden der Kosten eines r-K_i-Pfades in die Länge, d. h. das Gesamtkantengewicht, eines r-K_i-Pfades in einem kantengewichteten Graphen kann seine untere Grenze erhalten werden durch Berechnen der Länge eines kürzesten Pfades, d. h. eines r-K_i-Pfades mit minimalem Gewicht in diesem kantengewichteten Graphen. Dies kann implementiert werden, indem der Algorithmus wie folgt erweitert wird. Zum Beginn der äußersten Schleife wird ein kantengewichteter, gerichteter Graph G_r,n = 〈V_r,n,〉 E_r,n〉, wobei V_r,n = V und jede Kante (u, v) ∈ E zwei gerichteten Kanten (u, v), (v, u) ∈ E_r,n entspricht, erzeugt. Jede gerichtete Kante (u, v) ∈ E_r,n wird gewichtet durch $α w t (v) + n \frac{1 - α}{2} s d (r, v) .$

By mapping the cost of an rK _i -path into the length, i.e. the total edge weight, of an rK _i -path in an edge-weighted graph, its lower bound can be obtained by computing the length of a shortest path, i.e. an rK _i -path with minimum weight in this edge-weighted graph. This can be implemented by extending the algorithm as follows. At the beginning of the outermost loop, an edge-weighted directed graph G _r,n = 〈V _r,n ,〉 E _r,n 〉, where V _r,n = V and each edge (u, v) ∈ E two directed edges ( corresponds to u, v), (v, u) ∈ E _r,n . Each directed edge (u, v) ∈ E _r,n is weighted by

a w t (v) + n \frac{1 - a}{2} s i.e (right, v) .

Der Dijkstra-Algorithmus kann auf G_r,n verwendet werden, um einen r-K_i-Pfad mit minimalem Gewicht für jedes 1 ≤ i ≤ g zu berechnen, beeichnet durch P_r,n,i. Er hat das kleinste Gesamtkantengewicht unter r-K_i-Pfaden in G_r,n, bezeichnet durch d_r,n,i. Daher ist eine untere Grenze für pcost(P_r) mit vnum(P_r) = n gegeben durch $D_{r, n} = α w t (r) + \sum_{1 \leq i \leq g} d_{r, n, i} .$

The Dijkstra algorithm can be used on Gr,n to compute a minimum weight rK _i -path for any 1 ≤ i ≤ g denoted by P _r _,n,i . It has the smallest total edge weight among rK _i -paths in G _r,n , denoted by d _r,n,i . Therefore a lower bound for pcost(P _r ) with vnum(P _r ) = n is given by

D_{right, n} = a w t (right) + \sum_{1 \leq i \leq G} {i.e}_{right, n, i} .

D_r,n erhöht sich, wenn sich n erhöht, daher kann die folgende Ungleichung geprüft werden: $D_{r, n} \geq p c o s t (P_{r}^{m i n}) .$

D _r,n increases as n increases, so the following inequality can be checked:

D_{right, n} \geq p c O s t (P_{right}^{m i n}) .

Wenn sie für die aktuellen n und $P_{r}^{m i n}$

gilt, wird die äußerste Schleife unterbrochen, und das aktuelle

P_{r}^{m i n}

wird zurückgegeben, da es für die aktuellen und größeren Werte von n, d. h. vnum(Pr), kein P_r mit einem kleineren pcost gibt. In ähnlicher Weise kann die folgende Ungleichung geprüft werden:

D_{r, n} \geq p c o s t (P^{#}) .

If they are for the current n and

P_{right}^{m i n}

holds, the outermost loop is broken, and the current one

P_{right}^{m i n}

is returned because for the current and larger values of n, ie vnum(Pr), there is no P _r with a smaller pcost. Similarly, the following inequality can be checked:

D_{right, n} \geq p c O s t (P^{#}) .

Wenn sie für die aktuellen n und P^# gilt, was die aktuelle RPS mit global minimalem pcost in Algorithmus QO ist, wird die äußerste Schleife unterbrochen, und das aktuelle $P_{r}^{m i n}$

wird zurückgegeben, siehe Zeilen 26 und 27.If true for the current n and P ^# , which is the current RPS with global minimum pcost in algorithm QO, the outermost loop is broken, and the current one

P_{right}^{m i n}

is returned, see lines 26 and 27.

Zeilen 7 bis 22 beziehen sich auf den Schritt des Bestimmens der Kosten jedes Pfades, der größer als die Gesamtmindestpfadlänge ist. Die Schleife in Zeile 7 beginnt mit n = L_r. Dies ist eine Maßnahme, um kleine Werte von n zu kürzen, siehe Zeilen 1 bis 5. Eine untere Grenze für vnum(P_r) (d. h. n) zum Kürzen kleiner Werte von n wird berechnet. Zum Beginn des Algorithmus kann eine Breitensuche von G, beginnend mit r, zum Berechnen der kleinsten Anzahl von Kanten in r-K_i-Pfaden für jedes 1 ≤ i ≤ g, bezeichnet durch l_r,i, eingefügt werden. Eine untere Grenze für vnum(P_r) ist gegeben durch $L_{r} = 1 + \sum_{1 \leq i \leq g} l_{r, i} .$

Lines 7 through 22 refer to the step of determining the cost of each path that is greater than the total minimum path length. The loop in line 7 starts with n = L _r . This is a measure to truncate small values of n, see lines 1 to 5. A lower bound on vnum(P _r ) (ie n) for truncating small values of n is computed. To start the algorithm, a breadth-first search of G starting with r can be inserted to compute the smallest number of edges in rK _i -paths for each 1 ≤ i ≤ g, denoted by l _r,i . A lower bound for vnum(P _r ) is given by

L_{right} = 1 + \sum_{1 \leq i \leq G} l_{right, i} .

Zeile 23 bezieht sich auf den Schritt des Vergleichens der Kosten jedes Pfades, der größer als die Gesamtmindestpfadlänge ist, mit der unteren Grenze.Line 23 refers to the step of comparing the cost of each path greater than the total minimum path length to the lower bound.

Zeile 24 bezieht sich auf den Schritt des Erhaltens der Pfadmenge mit den minimalen Kosten als die lokal optimale Pfadmenge.Line 24 refers to the step of obtaining the minimum cost path set as the locally optimal path set.

Erhalten der lokal optimalen Pfadmenge mit den minimalen Kosten als die optimale Pfadmenge. Dies wird durch die Zeilen 11 und 12 von Algorithmus 1 implementiert.

The step 308 of determining an optimal path set further includes the following:

Obtain the locally optimal path set with the minimum cost as the optimal path set. This is implemented by lines 11 and 12 of Algorithm 1.

Gemäß einer weiteren Kürzungsstrategie kann der r-K_i-Pfad mit minimalem Gewicht für jedes 1 ≤ i ≤ g, bezeichnet durch P_r,n,i, ausgenutzt werden. Die Anzahl von Kanten in P_r,n,i ist gegeben durch die Anzahl i_r,n,i.According to a further pruning strategy, the minimum weight rK _i -path can be exploited for each 1 ≤ i ≤ g, denoted by P _r,n,i . The number of edges in P _r,n,i is given by the number i _r,n,i .

Es wird davon ausgegangen, dass eine r-RPS P_r mit vnum(P_r) = n für das aktuelle n eine r-RPS mit lokal minimalen Kosten ist. Wenn für irgendein 1 ≤ i ≤, der r-K_i-Pfad P_i ∈ P_r mehr als l_r,n,i Kanten enthält, wird P_i ersetzt durch P_r,n,i, um eine andere r-RPS P_r' zu erzeugen. Ihr definiertes Gesamtkantengewicht ist nicht größer als das von P_r, und ihr vnum ist kleiner. Daher ist pcost(P_r') ≤ pcost(P_r), d. h. P_r' ist auch eine r-RPS mit lokal minimalem pcost, die und/oder irgendeine andere r-RPS mit lokal minimalem pcost in vorherigen Iterationen der äußersten Schleife für kleinere Werte von n gefunden wurde. Es ist daher nicht notwendig, P_r für das aktuelle n zu berücksichtigen. Dies kann implementiert werden, indem der Algorithmus wie folgt erweitert wird: Einengen der Bereiche von m und x: $m to m i n {n - 1, | V | - 1, l_{r, n, i},$

m to m i n {n - 1, \sum_{1 \leq i \leq I} l_{r, n, i},

max {0, m - \sum_{1 \leq i \leq I - 1} l_{r, n, i}} \leq x \leq min {m, | V | - 1, l_{r, n, I}},

m to m i n {n - 1, | V | - 1, l_{r, n, i},

m to m i n {n - 1, | V | - 1, max_{1 \leq i \leq g} l_{r, n, i}} .

An r-RPS P _r with vnum(P _r ) = n for the current n is assumed to be a locally minimum cost r-RPS. If, for any 1 ≤ i ≤, the rK _i -path P _i ∈ P _r contains more than l _r,n,i edges, then P _i is replaced by P _r,n,i to get another r-RPS P _r ' to create. Its defined total edge weight is no larger than that of P _r , and its vnum is smaller. Hence pcost(P _r ') ≤ pcost(P _r ), ie P _r ' is also an r-RPS with locally minimum pcost which and/or any other r-RPS with locally minimum pcost in previous iterations of the outermost loop for smaller values of n were found. It is therefore not necessary to consider P _r for the current n. This can be implemented by expanding the algorithm as follows: narrowing the ranges of m and x:

m to m i n {n - 1, | V | - 1, l_{right, n, i},

m to m i n {n - 1, \sum_{1 \leq i \leq I} l_{right, n, i},

Max {0, m - \sum_{1 \leq i \leq I - 1} l_{right, n, i}} \leq x \leq at least {m, | V | - 1, l_{right, n, I}},

m to m i n {n - 1, | V | - 1, l_{right, n, i},

m to m i n {n - 1, | V | - 1, \underset{1 \leq i \leq G}{Max} l_{right, n, i}} .

Das Verfahren umfasst ferner einen Schritt 310 des Extrahierens einer Antwort auf die Schlüsselwortabfrage aus der optimalen Pfadmenge P^#.The method further includes a step 310 of extracting an answer to the keyword query from the optimal path set P ^# .

Die Antwort ist definiert als $T^{#} = 〈 V_{T^{#}}, E_{T^{#}} 〉$

The answer is defined as

T^{#} = 〈 V_{T^{#}}, E_{T^{#}} 〉

Der Schritt 310 wird ausführlicher bezüglich eines Schrittes 312 des Zusammenführens der Pfade der optimalen Pfadmenge in einem Untergraphen des Wissensgraphen und einen Schritt 314 des Entfernens unnötiger Knoten und Kanten aus dem Untergraphen beschrieben.The step 310 is described in more detail in terms of a step 312 of merging the paths of the optimal path set into a subgraph of the knowledge graph and a step 314 of removing unnecessary nodes and edges from the subgraph.

Die optimale Pfadmenge P^# wird wie folgt in eine Antwort T^# umgewandelt. Alle r-vi-Pfade in P_r werden in einen Untergraphen T^# aus G zusammengeführt, der über einen Wurzelknoten r zusammenhängt und v_i e K_i für 1 ≤ i ≤ g enthält.The optimal path set P ^# is converted into an answer T ^# as follows. All r-vi paths in P _r are merged into a subgraph T ^# of G connected via a root node r and containing v _i e K _i for 1 ≤ i ≤ g.

Gemäß den oben beschriebenen Anforderungen für eine Antwort T, muss T die Anforderung erfüllen, strukturell minimal zu sein. Daher ist die Antwort T^# eine optimale Antwort, wenn sie strukturell minimal ist. Wenn nicht, wird der Schritt 314 des Entfernens von unnötigen Knoten und Kanten aus dem Untergraphen wiederholt verarbeitet, bis dieser strukturell minimal wird.According to the requirements for an answer T described above, T must satisfy the requirement to be structurally minimal. Therefore, the response T ^# is an optimal response if it is structurally minimal. If not, the step 314 of removing unnecessary nodes and edges from the subgraph is repeatedly processed until it becomes structurally minimal.

Gemäß dem in 1 dargestellten beispielhaften Wissensgraphen 100 wird eine Antwort auf die Suchabfrage beispielsweise durch den Untergraphen 152 gegeben. Der Untergraph 152 ist semantisch zusammenhängend, verglichen mit dem anderen Untergraphen 154, der semantisch nicht zusammenhängend ist. Daher ist der Untergraph 152 die bevorzugte Antwort gemäß dieser Ausführungsform.According to the 1 In the exemplary knowledge graph 100 shown, an answer to the search query is given, for example, by the subgraph 152. Subgraph 152 is semantically connected compared to the other subgraph 154 which is semantically discontinuous. Therefore, subgraph 152 is the preferred response according to this embodiment.

In einem Schritt 316 kann die Antwort auf ein Ergebnis der Schlüsselwortabfrage abgebildet werden, und das Ergebnis der Schlüsselwortsuche kann ausgegeben werden.In a step 316, the response may be mapped to a keyword query result and the keyword search result may be output.

Claims

Computer-implemented method (300) for keyword searches in a dataset, data of the dataset being represented by a knowledge graph (100), the knowledge graph (100) comprising nodes representing entities of the dataset and edges representing relationships between the entities, the method (300) comprising the steps of: receiving (302) a keyword query comprising a set of keywords; ranking (304) nodes of the knowledge graph based on a shortest path to keyword nodes, a keyword node being a node that matches at least one keyword of the keyword query; selecting (306) the smallest set of keyword nodes among all sets of keyword nodes, comprising at least one keyword node for each keyword of the set of keywords; determining (308) an optimal path set based on the smallest set of keyword nodes and based on a number of ranked nodes, the optimal path set including paths in the knowledge graph (100) connecting nodes that match the keywords of the keyword query , and wherein the optimal path set is optimal in terms of minimum cost of the path set, the cost being based on a linear combination of the total weight of nodes of the optimal path set and pairwise semantic distances between the nodes of the optimal path set, and extracting (310) a response to the keyword query from the optimal path set.

Method (300) according to claim 1 , where the total weight of a node represents salience, and/or the pairwise semantic distance between two nodes represents semantic belonging.

Method (300) according to claim 1 or 2 , where the cost of a path set is given by the sum of the total weight of the root node and the total weights of the nodes matching the keywords of the keyword query, and the pairwise semantic distances between the root node and the nodes matching the keywords of the keyword query.

Method (300) according to any one of the preceding claims, wherein the step (308) of determining an optimal path set comprises an iterative process (308a) based on determining a locally optimal path set, the locally optimal path set comprising paths in the knowledge graph (100), connecting a root node of the smallest set of keyword nodes to nodes matching the keywords of the keyword query, and wherein the locally optimal path set is optimal in terms of minimum cost of the path set for the corresponding root node, the iterative process (308a) comprising the following steps: determining an overall minimum path length of the locally optimal path set by determining, for each keyword node, the shortest path from the corresponding root node; determining a lower bound on the cost of the locally optimal path set; determining the cost of each path that is greater than the total minimum path length; comparing the cost of each path greater than the total minimum path length to the lower bound; Obtain the minimum cost path set as the locally optimal path set.

Method (300) according to claim 4 , wherein the step (308) of determining an optimal path set comprises: obtaining the locally optimal path set with the minimum cost as the optimal path set.

Method (300) according to one of the preceding claims, wherein the iterative process (308a) of the step (308) of determining an optimal path set is extended by at least one shortening strategy.

A method according to any one of the preceding claims, wherein the step of extracting (310) the answer from the optimal path set comprises: Merging (312) the paths of the optimal path set into a subgraph of the knowledge graph (100).

The method of any preceding claim, wherein the step (310) of extracting the answer from the optimal path set further comprises: Remove (314) unnecessary vertices and edges from the subgraph.

Device (200) for searching for keywords in a data set, data of the data set being represented by a knowledge graph (100), the knowledge graph (100) having nodes (V) representing entities of the data set and edges (E) representing relationships between the represent entities, wherein the device comprises an input which is designed to receive a keyword query (Q) comprising a set of keywords and is designed to map keywords to nodes of the knowledge graph (100), the facility further comprising a processor (204), the processor (204) being configured to rank (304) the nodes of the knowledge graph based on a shortest path to keyword nodes, a keyword node being a node associated with at least one keyword of the keyword query agrees; selecting (306) the smallest set of keyword nodes among all sets of keyword nodes, comprising at least one keyword node for each keyword of the set of keywords; determining (308) an optimal path set based on the smallest set of keyword nodes and based on a number of ranked nodes, the optimal path set including paths in the knowledge graph (100) connecting nodes that match the keywords of the keyword query , and wherein the optimal path set is optimal in terms of minimum cost of the path set, where the cost is based on a linear combination of the total weight of nodes of the optimal path set and pairwise semantic distances between the nodes of the optimal path set, and extracting (310) a response to the keyword query from the optimal path set, and wherein the means comprises an output (206) arranged to map the response to a result of the keyword query and arranged to output the result.

Setup (200) after claim 9 , wherein the device (200) is further designed to carry out steps of the method (300) according to one of claims 2 until 8th .

A computer program for searching for keywords in a data set, comprising computer-readable instructions which, when executed by a computer, cause the computer to perform steps of the method (300) according to any one of Claims 1 until 8th to execute.