CN118132743A

CN118132743A - Data processing system for presenting triples

Info

Publication number: CN118132743A
Application number: CN202410335729.8A
Authority: CN
Inventors: 杨东锋; 吕观祥; 叶新江; 袁凯
Original assignee: Merit Interactive Co Ltd
Current assignee: Merit Interactive Co Ltd
Priority date: 2024-03-22
Filing date: 2024-03-22
Publication date: 2024-06-04

Abstract

The application relates to the technical field of knowledge maps, in particular to a data processing system for presenting triples, which comprises: the method comprises the following steps of a storage medium, a processor and a memory, wherein the storage medium stores preset text entity clusters and preset text relation clusters corresponding to a plurality of preset text triples, the memory stores computer programs, and the computer programs are executed by the processor: and acquiring a key text entity cluster according to the received query text, rendering each key text entity cluster in the canvas, acquiring a plurality of key text entities corresponding to a certain key text entity cluster when receiving a selection instruction of the key text entity cluster, and further acquiring and presenting triples corresponding to each key text entity. The application can reasonably utilize the canvas space to present a large amount of data in the canvas, can realize simultaneous inquiry of a plurality of results, and improves inquiry efficiency and inquiry quality.

Description

Data processing system for presenting triples

Technical Field

The invention relates to the technical field of knowledge maps, in particular to a data processing system for presenting triples.

Background

Rendering of a map in computer graphics refers to a process of generating an image from a model by software, that is, a process of two-dimensionally projecting the model in a three-dimensional scene into a digital image according to a set environment, light, materials and rendering parameters. Rendering of the graph may be divided into three parts, data processing, layout, and rendering. Currently, the renderers in the market can intuitively display the knowledge graph from the interface, but the rendered nodes are always limited, and when the query results are more, all the query results cannot be intuitively seen, so that the query experience of a user is affected.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

A data processing system for presenting triples, the system comprising: a storage medium storing a preset text entity cluster a= { a ₁,A₂,……,A_i,……,A_m } and a preset text relation cluster b= { B ₁,B₂,……,B_j,……,B_n } corresponding to a plurality of preset text triples, a processor and a memory storing a computer program, wherein a _i is an ith preset text entity cluster, B _j is a jth preset text relation cluster, i=1, 2, … …, m, m is the number of preset text entity clusters, j=1, 2, … …, n, n is the number of preset text relation clusters, the preset text entity clusters comprise a plurality of preset text entities, the preset text relation clusters comprise a plurality of preset text relations, the preset text entities in each preset text triplet are associated with the preset text relation, when the computer program is executed by the processor, the following steps are realized:

And S100, acquiring a key text entity cluster C= { C ₁,C₂,……,C_e,……,C_f},C_e corresponding to the query text from the A according to the received query text, wherein e=1, 2, … …, f and f are the number of the key text entity clusters.

S200, rendering the C _e according to canvas information to present the key text entity clusters in the C in the canvas; the canvas information includes a canvas position and a canvas size.

S300, when an expanding instruction of C _e in the canvas is received, acquiring the intermediate text entity cluster corresponding to C _e to present the intermediate text entity cluster corresponding to C _e in the canvas.

S400, when receiving a selection instruction for C _e in the canvas, obtaining a set D _e＝{D_e1,D_e2,……,D_ep,……,D_eq of key text entities corresponding to C _e, where D _ep is the p-th key text entity corresponding to C _e, p=1, 2, … …, q, and q is the number of key text entities corresponding to C _e.

S500, obtaining an intermediate text relationship set G _ep＝{G_ep1,G_ep2,……,G_epr,……,G_eps corresponding to D _ep from B, where G _epr is the r-th intermediate text relationship corresponding to D _ep, r=1, 2, … …, S, S is the number of intermediate text relationships corresponding to D _ep.

S600, obtaining target text entities corresponding to D _ep and G _epr from the A, and presenting the D _ep、G_epr and target text entities in the canvas in the form of triples.

Compared with the prior art, the data processing system for presenting the triples has obvious beneficial effects, by virtue of the technical scheme, the data processing system for presenting the triples can achieve quite technical progress and practicality, has wide industrial utilization value, and has at least the following beneficial effects:

The present invention provides a data processing system for presenting triples, the system comprising: the storage medium, the processor and the memory storing the computer program store the preset text entity clustering and the preset text relation clustering corresponding to the preset text triples, the text entities and the text relations are respectively clustered to be presented in the form of clusters in the canvas, only the names corresponding to the clusters are required to be presented and presented, when clicking each cluster, the next-stage entity is unfolded and then presented, so that the cluster-clustered form can support the rendering of a large amount of data, and the data analysis of a user is facilitated; and the computer program, when executed by the processor, performs the steps of: according to the received query text, acquiring a key text entity cluster, rendering each key text entity cluster in a canvas, acquiring a plurality of key text entities corresponding to a certain key text entity cluster when receiving a selection instruction of the key text entity cluster, further acquiring and presenting triples corresponding to each key text entity.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a computer program executed by a data processing system that presents triples, according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

An embodiment of the present invention provides a data processing system for presenting triples, the system including: a storage medium storing a preset text entity cluster a= { a ₁,A₂,……,A_i,……,A_m } and a preset text relation cluster b= { B ₁,B₂,……,B_j,……,B_n } corresponding to a plurality of preset text triples, a processor and a memory storing a computer program, wherein a _i is an ith preset text entity cluster, B _j is a jth preset text relation cluster, i=1, 2, … …, m, m is the number of preset text entity clusters, j=1, 2, … …, n, n is the number of preset text relation clusters, the preset text entity clusters comprise a plurality of preset text entities, the preset text relation clusters comprise a plurality of preset text relations, the preset text entities in each preset text triplet are associated with the preset text relation, when the computer program is executed by the processor, the following steps are implemented as shown in fig. 1:

In a specific embodiment, A and B are obtained by:

S1, acquiring a first text entity, a second text entity and a relation between the first text entity and the second text entity in each target information text according to a plurality of received target information texts, wherein a person skilled in the art can adopt an entity relation extraction model to extract the entity and the relation of the target information texts, and can also directly acquire the entity and the relation acquired in advance by a worker, and the description is omitted.

S2, clustering the first text entity and the second text entity with a plurality of initial text entities in the initial text entity cluster respectively according to a preset entity classification rule table to obtain a preset text entity cluster.

Specifically, the preset entity classification rule table is a rule table for classifying entities with identical categories into the same category according to a preset classification rule, for example: and gathering the Jili and masses into the vehicle category according to a preset rule.

Specifically, the preset text entity clusters are obtained by clustering a plurality of preset text entities through a k-means clustering model, and specific implementation processes of the k-means clustering model are known to those skilled in the art and are not described herein.

S3, clustering the relation between the first text entity and the second text entity and the initial text relation in the initial text relation cluster according to a preset relation classification rule table to obtain a preset text relation cluster.

Specifically, the clustering mode of the preset text relation cluster is consistent with the clustering mode of the preset text entity cluster.

By clustering the plurality of entities, the plurality of entities belonging to the same category can be clustered, so that the plurality of entities can be presented in the form of clusters when presented in a page, and the clusters comprise the plurality of entities, and when clicking each cluster, the entities in the clusters can be expanded, so that the form of the clusters can support the presentation of a large amount of data, and a user can clearly grasp the data volume of each category, thereby being beneficial to the use and analysis of the data.

Specifically, the query text refers to text input by a user for querying a required text triplet from a plurality of preset text triples.

In a specific embodiment, C _e is obtained in S100 by:

S101, analyzing the query text, and obtaining a plurality of query keyword lists D= { D ₁,D₂,……,D_e,……,D_f},D_e as the e-th query keyword from the query text, wherein any method for extracting keywords from the text in the prior art can be known by a person skilled in the art to fall into the protection scope of the invention, and the description is omitted herein.

S102, acquiring preset text entity clusters corresponding to preset text entities D _Ae and D _Ae corresponding to the D _e from the A according to the D _e; it can be understood that: the preset text entity D _Ae may be the name of a cluster formed by a plurality of entities, when D _Ae is the last-stage entity, the corresponding preset text entity cluster is empty, for example, when the entity name corresponding to D _Ae is a member of a company, a plurality of employee names of the company may be further included in D _Ae, and when the entity name corresponding to D _Ae is an employee name and is the last-stage entity, the division cannot be performed any more, so that the corresponding preset text entity cluster is empty.

In a specific embodiment, D _Ae is determined in S102 by:

S1021, when there is a preset text entity in a whose entity character is the same as the character corresponding to D _e, determining the preset text entity whose character is the same as the character corresponding to D _e as D _Ae.

S1022, when there is no preset text entity in a, whose entity character is the same as the character corresponding to D _e, obtaining a preset text entity priority list K _A＝{K_A1,K_A2,……,K_Ah,……,K_At corresponding to a, where K _Ah is a preset text entity priority corresponding to the h preset text entity in a, and h=1, 2, … …, t, t is the number of preset text entities in a.

Specifically, K _Ah meets the following conditions:

K_Ah＝(K⁰ _Ah·D⁰ _e)/(||K⁰ _Ah||×||D⁰ _e||), Wherein, K ⁰ _Ah is a word vector corresponding to the h preset text entity in a, and D ⁰ _e is a word vector corresponding to D _e.

Specifically, the Word vector corresponding to the preset text entity is a vector obtained by processing the preset text entity through a Word vector construction model, where the Word vector construction model may be a Word2Vec model, and a specific implementation process of the Word2Vec model is known to a person skilled in the art and is not described herein.

Specifically, the word vector corresponding to D _e is identical to the word vector corresponding to the preset text entity in terms of obtaining manner, which is not described herein.

S1023, determining a preset text entity corresponding to the maximum preset text entity priority as D _Ae; it can be understood that: and when the priority of the largest preset text entity is multiple, determining the corresponding preset text entities as D _Ae.

S103, determining the preset text entity cluster corresponding to the D _Ae as C _e.

By comparing the query keywords with the names of the preset text entities, the obtained preset text entities are more accurate, and the searched keyword text entity clusters are more comprehensive by obtaining the preset text entity clusters corresponding to each query keyword, so that the query quality is improved.

Specifically, the C _e is rendered through the webworker model, and those skilled in the art know the specific implementation of the webworker model, which is not described herein.

In a specific embodiment, the following steps are included in S200:

S201, when f is less than or equal to lambda, f key text entity clusters are presented in a preset area in the canvas through a layout model according to the position of the canvas and the size of the canvas, wherein lambda is the number of the preset key text entity clusters; the center of the preset area is consistent with the center of the canvas, and the shape of the preset area is identical with the shape of the canvas; for example, when the center of the canvas coincides with the center of the presentation screen and the length of the canvas is twice the length of the presentation screen, and the width of the canvas is twice the width of the presentation screen, the preset area may be an area in the canvas corresponding to the presentation screen.

Specifically, the layout model adopts a d3-force model, and those skilled in the art know the specific implementation of the d3-force model, which is not described herein.

S202, when f is more than lambda, acquiring a target area corresponding to C according to the position of the canvas; the center of the target area is consistent with the center of the canvas, and the shape of the target area is the same as the shape of the canvas; the area of the target area meets the following conditions:

θ ⁰ =f/λ×θ, where θ ⁰ is an area of the target area, and θ is an area corresponding to the preset area.

And S203, presenting f key text entity clusters in the target area through a layout model.

In another specific embodiment, in S203, the following steps are further included:

S2031, if theta ⁰ is less than or equal to zeta, presenting f key text entity clusters in a target area in the canvas through a layout model, wherein zeta is the area of the canvas.

S2032, if theta ⁰ is more than zeta, hiding the key text entity clusters which are in the target area and outside the canvas area when rendering f key text entity clusters; it can be understood that: and when the page where the canvas is positioned is zoomed or dragged, adopting webworker model iteration to render.

The size of the presented area is determined according to the number of the key text entity clusters, a layout model is adopted to reasonably present a plurality of key text entity clusters in the calculated area, so that the search of a user is facilitated, in addition, when the area of the target area exceeds the canvas area, the key text entity clusters outside the canvas area are subjected to hiding processing, only corresponding data are needed to be cached, only the user drags and then the rendering is iterated, and the occupation of memory during the rendering is reduced.

Specifically, the expanding instruction of C _e in the canvas refers to an instruction sent by a user to click on C _e to expand C _e to obtain a plurality of intermediate text entity clusters in C _e, wherein the intermediate text entity clusters refer to any one text entity cluster in C _e; it can also be understood that: when C _e is the cluster of the last stage, then several intermediate text entities in C _e are obtained.

Specifically, the selected instruction of C _e in the canvas refers to an instruction sent by the user to the selected operation of C _e to obtain a plurality of key text entities in C _e.

Specifically, the key text entity refers to a text entity belonging to the next stage of C _e, which is acquired in C _e; it can be understood that: when C _e is not the last level of clusters, the text entities at the next level of C _e are several text entity clusters at the next level of C _e.

When receiving the selection instruction of C _e in the canvas, the user is indicated to want to query a plurality of entities included in the hierarchy of C _e, so that only a plurality of entities corresponding to C _e need to be obtained from the key text entity cluster of C _e, so that the queried triples more meet the requirement of the user.

Specifically, the intermediate text relationship corresponding to D _ep refers to a relationship that the same triplet as D _ep belongs to in the preset text triplet.

Specifically, the target text entity refers to a text entity corresponding to D _ep and G _epr in a preset text triplet; it can be understood that: the first text entity and the second text entity in the preset text triplet are both stored in the preset text entity cluster, but the first text entity, the second text entity and the corresponding relation between the two entities are stored in the system.

In a specific embodiment, the following steps are included in S600:

s601, according to the layout model, the node positions corresponding to the D _ep and the target text entity in the canvas are obtained.

S602, acquiring a first node weight W _ep1 corresponding to the D _ep and a second node weight W _ep2 corresponding to the target text entity, and respectively determining the presentation size of the D _ep and the presentation size of the target text entity according to the first node weight and the second node weight; it can be understood that: the display size is proportional to the node weight, and the proportion is set according to the actual requirement.

Specifically, W _ep1 meets the following conditions:

W _ep1＝Z_ep1/∑^y _x＝1Z_x, wherein Z _ep1 is the data volume in D _ep, y is the number of nodes in the canvas, and Z _x is the data volume in the entity corresponding to the xth node in the canvas.

Specifically, W _ep2 meets the following conditions:

w _ep2＝Z_ep2/∑^y _x＝1Z_x,Z_ep2 is the amount of data in the target text entity.

S603, respectively presenting the D _ep and the target text entity in the corresponding node positions in the canvas according to the corresponding presentation sizes.

For ease of understanding, the following examples are given as illustrations:

When the target information text is collected employee information in a company, the employee information includes: the method comprises the steps of dividing all staff into core staff, common staff and edge staff according to a preset entity classification rule list by working experience years, accumulated participated project quantity and performance score; when C _e is "a staff of a certain company", after clicking C _e, the obtained key text entities corresponding to C _e are respectively: the method comprises the steps of obtaining intermediate text relations corresponding to core staff according to preset text triples acquired in advance by the core staff, common staff and edge staff, wherein the intermediate text relations comprise: at least one A of performance scores, wherein the work is not less than three years, the participation projects are not less than five, corresponding staff can be inquired according to core staff and the work is not less than three years, and the names of the core staff, the work is not less than three years and the inquired staff are presented in a canvas in a form of triples.

According to the method, only a plurality of queried text entity clusters are needed to be presented instead of all triples, so that the text entity is classified and clustered, a large amount of display space is saved while a large amount of data can be contained, after a selection instruction of a certain key text entity cluster sent by a user is received, triples corresponding to each key text entity in the key text entity cluster can be queried from preset text triples, simultaneous query of a plurality of results is achieved, and the final query result is needed to be presented, so that the query result is clearer and more clear while canvas space is reasonably utilized.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims

1. A data processing system for presenting triples, the system comprising: a storage medium storing a preset text entity cluster a= { a ₁,A₂,……,A_i,……,A_m } and a preset text relation cluster b= { B ₁,B₂,……,B_j,……,B_n } corresponding to a plurality of preset text triples, a processor and a memory storing a computer program, wherein a _i is an ith preset text entity cluster, B _j is a jth preset text relation cluster, i=1, 2, … …, m, m is the number of preset text entity clusters, j=1, 2, … …, n, n is the number of preset text relation clusters, the preset text entity clusters comprise a plurality of preset text entities, the preset text relation clusters comprise a plurality of preset text relations, the preset text entities in each preset text triplet are associated with the preset text relation, when the computer program is executed by the processor, the following steps are realized:

S100, acquiring a key text entity cluster C= { C ₁,C₂,……,C_e,……,C_f},C_e corresponding to the query text from the A according to the received query text, wherein e=1, 2, … …, f and f are the number of the key text entity clusters;

S200, rendering the C _e according to canvas information to present the key text entity clusters in the C in the canvas; the canvas information includes a canvas location and a canvas size;

S300, when an expanding instruction of C _e in the canvas is received, acquiring an intermediate text entity cluster corresponding to C _e to present the intermediate text entity cluster corresponding to C _e in the canvas; the intermediate text entity cluster comprises a plurality of intermediate text entity clusters, wherein the intermediate text entity cluster refers to any one text entity cluster in C _e;

S400, when a selection instruction of C _e in the canvas is received, acquiring a key text entity set D _e＝{D_e1,D_e2,……,D_ep,……,D_eq corresponding to C _e, wherein D _ep is the p-th key text entity corresponding to C _e, p=1, 2, … …, q and q are the number of key text entities corresponding to C _e;

S500, obtaining an intermediate text relation set G _ep＝{G_ep1,G_ep2,……,G_epr,……,G_eps corresponding to D _ep from B, wherein G _epr is the r-th intermediate text relation corresponding to D _ep, r=1, 2, … …, S and S are the number of intermediate text relations corresponding to D _ep;

2. The data processing system for presenting triples according to claim 1, wherein a and B are obtained by:

S1, acquiring a first text entity, a second text entity and a relation between the first text entity and the second text entity in each target information text according to a plurality of received target information texts;

S2, clustering the first text entity and the second text entity with a plurality of initial text entities in an initial text entity cluster respectively according to a preset entity classification rule table to obtain a preset text entity cluster;

3. The data processing system for rendering triples as recited in claim 1, wherein C _e is acquired in S100 by:

S101, analyzing a query text, and acquiring a plurality of query keyword lists D= { D ₁,D₂,……,D_e,……,D_f},D_e as an e-th query keyword from the query text;

S102, acquiring preset text entity clusters corresponding to preset text entities D _Ae and D _Ae corresponding to the D _e from the A according to the D _e;

4. A data processing system for rendering triples as claimed in claim 3, wherein D _Ae is determined in S102 by:

S1021, when a preset text entity with the same entity character as the character corresponding to D _e exists in A, determining the preset text entity with the same character corresponding to D _e as D _Ae;

S1022, when no preset text entity with the same entity character as the character corresponding to D _e exists in A, acquiring a preset text entity priority list K _A＝{K_A1,K_A2,……,K_Ah,……,K_At corresponding to A, wherein K _Ah is the preset text entity priority corresponding to the h preset text entity in A, and h=1, 2, … …, t and t are the number of preset text entities in A;

S1023, determining the preset text entity corresponding to the maximum preset text entity priority as D _Ae.

5. The data processing system for rendering triples as recited in claim 4, wherein K _Ah meets the following criteria:

6. The data processing system for presenting triples as claimed in claim 1, comprising the steps of, in S200:

S201, when f is less than or equal to lambda, f key text entity clusters are presented in a preset area in the canvas through a layout model according to the position of the canvas and the size of the canvas, wherein lambda is the number of the preset key text entity clusters; the center of the preset area is consistent with the center of the canvas, and the shape of the preset area is identical with the shape of the canvas;

θ ⁰ =f/λ×θ, where θ ⁰ is an area of the target area, and θ is an area corresponding to the preset area;

7. The data processing system for presenting triples as claimed in claim 1, comprising the steps of, in S600:

S601, acquiring node positions corresponding to the D _ep and the target text entity in canvas respectively according to the layout model;

S602, acquiring a first node weight corresponding to the D _ep and a second node weight corresponding to the target text entity, and respectively determining the presentation size of the D _ep and the presentation size of the target text entity according to the first node weight and the second node weight;

8. The data processing system for rendering triples of claim 1, wherein the expand instruction for C _e in the canvas is an instruction sent by a user's click operation on C _e to expand C _e to get a number of intermediate text entity clusters in C _e.

9. The data processing system for rendering the triplet of claim 1, wherein the selected instruction for C _e in the canvas is an instruction sent by a user's selected operation for C _e to obtain a number of key text entities in C _e.