CN113239203A

CN113239203A - Knowledge graph-based screening method and device

Info

Publication number: CN113239203A
Application number: CN202110616193.3A
Authority: CN
Inventors: 侯昶宇; 李长亮; 毛璐
Original assignee: Beijing Kingsoft Software Co Ltd
Current assignee: Beijing Kingsoft Software Co Ltd; Beijing Kingsoft Digital Entertainment Co Ltd
Priority date: 2021-06-02
Filing date: 2021-06-02
Publication date: 2021-08-10

Abstract

The present application provides a knowledge graph-based screening method and device, wherein the knowledge graph-based screening method includes: acquiring target object information; adding the target object information to a knowledge graph constructed from a set of candidate objects, and determining A target community corresponding to target object information, wherein the knowledge graph includes multiple communities; an initial candidate object set is determined in the candidate object set according to the target community; an initial candidate object set is determined in the candidate object set according to the target object information; A collection of target candidate objects to filter in the collection.

Description

Knowledge graph-based screening method and device

Technical Field

The present application relates to the field of knowledge graph technology, and in particular, to a method and an apparatus for screening based on a knowledge graph, a computing device, and a computer-readable storage medium.

Background

With the rapid development of computer technology, many important information is recorded in the database, facilitating the subsequent use of the data. For example, when a certain service is released, client information participating in the service is received, a client database is created according to the received client information, and a recommendation can be made to a client participating in a similar service when a new related service is released. After finding a target client meeting the service recommendation requirement, the service presenter screens out the client with the same label as the target client from the client database and recommends the client to the service presenter according to the label corresponding to the target client.

However, in the prior art, the screening method by the label is not accurate, and the implicit semantics in the information cannot be compared. In addition, when labels in a plurality of fields are involved, the labels are distributed unevenly, the final screening result is influenced, and the screening effect is poor.

Disclosure of Invention

In view of this, embodiments of the present application provide a method and an apparatus for screening based on a knowledge graph, a computing device, and a computer-readable storage medium, so as to solve technical defects in the prior art.

According to a first aspect of embodiments of the present application, there is provided a method for knowledge-graph-based screening, including:

acquiring target object information;

adding the target object information into a knowledge graph constructed by a candidate object set, and determining a target community corresponding to the target object information, wherein the knowledge graph comprises a plurality of communities;

determining an initial candidate object set in the candidate object set according to the target community;

and screening a target candidate object set in the initial candidate object set according to the target object information.

According to a second aspect of the embodiments of the present application, there is provided a knowledge-graph-based screening apparatus, including:

an acquisition module configured to acquire target object information;

the adding module is configured to add the target object information to a knowledge graph constructed by a candidate object set and determine a target community corresponding to the target object information, wherein the knowledge graph comprises a plurality of communities;

a determination module configured to determine an initial set of candidate objects in the set of candidate objects according to the target community;

a screening module configured to screen a target candidate set from the initial candidate set according to the target object information.

According to a third aspect of embodiments herein, there is provided a computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the knowledge-graph based screening method when executing the computer instructions.

According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the knowledge-graph based screening method.

According to a fifth aspect of embodiments of the present application, there is provided a chip storing computer instructions that, when executed by the chip, implement the steps of the method for knowledge-graph based screening.

The method comprises the steps of obtaining target object information based on a knowledge graph screening method; adding the target object information into a knowledge graph constructed by a candidate object set, and determining a target community corresponding to the target object information, wherein the knowledge graph comprises a plurality of communities; determining an initial candidate object set in the candidate object set according to the target community; and screening a target candidate object set in the initial candidate object set according to the target object information.

The method comprises the steps of determining objects similar to target objects in a knowledge graph for dividing communities as candidate objects, generating an initial candidate object set, and screening out a target object set which is more accordant with actual requirements through similarity calculation, so that the efficiency of determining the candidate objects through the knowledge graph is improved. The similarity of the object texts is calculated by combining the community and the model, the method is independent of the size and the distribution form of data, has high mobility, can effectively mine the hidden information contained in the object texts, and can label the object with a more accurate label through similarity calculation, thereby more accurately finishing screening.

Drawings

FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;

FIG. 2 is a flow chart of a method of knowledge-graph based screening provided by an embodiment of the present application;

fig. 3 is a schematic diagram of a target community corresponding to target object information provided in an embodiment of the present application;

FIG. 4 is a flowchart of a knowledge-graph screening method applied to obtain similar resumes according to an embodiment of the present application;

FIG. 5 is a schematic representation of a talent knowledge-map provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of community 4 provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of a knowledge-graph-based screening apparatus provided in an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if," as used herein, may be interpreted as "responsive to a determination," depending on the context.

First, the noun terms to which one or more embodiments of the present invention relate are explained.

Knowledge Graph (Knowledge Graph): the knowledge graph is a graph-based data structure and consists of nodes (points) and edges (edges), each node represents an entity, each Edge is a relation between the entities, and the knowledge graph is essentially a semantic network. An entity refers to something in the real world, such as a person, place name, company, phone, animal, etc.; relationships are used to express some kind of linkage between different entities.

And (3) clustering algorithm: clustering analysis, also known as cluster analysis, is a statistical analysis method for studying (sample or index) classification problems, and is also an important algorithm for data mining. Clustering (Cluster) analysis is composed of several patterns (patterns), which are typically vectors of a metric (measure) or a point in a multidimensional space. Cluster analysis is based on similarity, with more similarity between patterns in one cluster than between patterns not in the same cluster.

Community (Community): the community is a subgraph comprising nodes and edges; nodes and node relationships in the same community are close, and connections among communities are sparse.

Community discovery algorithm (Community Detection): a method for computing clusters of closely structured nodes in a network.

In the present application, a method and an apparatus for screening resumes based on knowledge graph, a computing device and a computer readable storage medium are provided, which are described in detail in the following embodiments one by one.

FIG. 1 shows a block diagram of a computing device 100 according to an embodiment of the present application. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present application, the above-mentioned components of the computing device 100 and other components not shown in fig. 1 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

Wherein processor 120 may perform the steps of the knowledge-graph based screening method of fig. 2. Fig. 2 shows a flowchart of a knowledge-graph-based screening method provided by an embodiment of the present application, which includes steps 202 to 208.

Step 202: target object information is acquired.

The target objects may be objects participating in the same business, such as clients participating in the same insurance business, schools participating in the same selection, resumes delivered to the same post, and so on.

The target object information is acquired from the target object text. And acquiring target object information, namely selecting an object text meeting preset requirements from the object texts participating in the same service, and acquiring the target object information from the object text meeting the preset requirements.

Wherein, the preset requirement can be set according to the actual situation. Taking the objects as a client A and a client B participating in insurance business as an example, the preset requirement is that the client age is more than 50 years old. And selecting an object text meeting the preset requirement from the client data of the client A and the client data of the client B. If the age in the client data of the client A is 52 years and meets the preset requirement, the client data of the client A, namely the target object information, is obtained.

In practical application, the step of obtaining the target object information is as follows:

acquiring a target object text;

determining at least one region to be identified in the target object text;

identifying and extracting object information in each area to be identified;

and splicing each object information to generate target object information.

Specifically, the target object text may be text containing target object information, such as personal information text, resume, sales list, school profile information, and the like. The target object information is obtained from the target object text, so the target object text is determined before the target object information is obtained, and the target object text can be determined by methods such as manual selection and the like, which are not limited herein. And determining a region in the target text as a region to be recognized, wherein the region to be recognized can be a partial region or a whole region in the target text, and the region to be recognized can be one or more regions. And identifying and extracting the object information in each area to be identified, and splicing the extracted object information to generate target object information.

In a specific embodiment of the present application, taking the object information of the object a as an example, the step of obtaining the target object information is further explained. In this embodiment, a target object text D of the target object a is acquired. And taking all the areas in the text D as areas to be recognized, recognizing and extracting characters in the areas to be recognized through an OCR character recognition technology, splicing all the characters, and generating object information of the target object A, namely the target object information.

Step 204: and adding the target object information into a knowledge graph constructed by a candidate object set, and determining a target community corresponding to the target object information, wherein the knowledge graph comprises a plurality of communities.

The set of candidate objects consists of candidate objects, which may be objects participating in the same business, e.g. the set of candidate objects is a set of 5 objects that have purchased furniture a. And extracting the information of the objects in the object set as data for constructing a knowledge graph, and constructing the knowledge graph.

After the target object information is obtained, a knowledge graph is constructed, the knowledge graph comprises a plurality of communities, and the community corresponding to the target object information is determined to be the target community.

In practical application, the target object information is added to a knowledge graph constructed by a candidate object set, and the specific step of determining the target community corresponding to the target object information is as follows:

storing the target object information into a graph database, and updating a knowledge graph corresponding to the target object information, wherein the knowledge graph is constructed by a candidate object set, and the knowledge graph is clustered according to the candidate object information to generate a plurality of communities;

and determining a target community corresponding to the target object information in the plurality of communities based on the candidate object clustering result.

Specifically, the knowledge graph is established according to data in a graph database, and object information of candidate objects in a candidate object set is acquired and imported into the graph database to be used for constructing the knowledge graph. And adding the acquired target object information into the graph database to construct a knowledge graph containing the target object. The knowledge-graph is divided into a plurality of communities by candidate clustering. The candidate object clustering is to cluster the knowledge graph based on a community discovery algorithm, for example, the knowledge graph is clustered according to actual demand contents of companies, schools and the like in object information.

In the community discovery algorithm, the number of communities cannot be determined in advance, and a measurement mode is needed, which can measure whether each result is a relatively optimal result or not in the calculation process, and the measurement mode is Modularity (Modularity), which is used for measuring whether the division of a community is a relatively good result or not, the similarity of a relatively good result on a node inside the community is high, and the similarity of a node outside the community is low. The modularity is defined as the ratio of the total number of edges in the community to the total number of edges in the network minus an expected value, which is the ratio of the total number of edges in the community to the total number of edges in the network formed by the same community allocation when the network is set as a random network, and the algorithm of the modularity is shown in the following equation 1:

and Q represents modularity, the larger the modularity is, the better the community division effect is, the range of Q is [ -0.5, 1), and when the Q value is between 0.3 and 0.7, the clustering effect in the knowledge graph is better. Suppose there are x nodes in the knowledge-graph, each node representing an input, and we divide these inputs into N communities, with m connections between nodes, v and w being any two nodes in x, A_vwIndicating whether the two nodes of v and w are directly connected or not, and if v and w are directly connected, A_vw1, if v and w are not directly connected then a_vw＝0。k_vDegree, k, representing node v_wThe degree of the node w is shown, the degree is basic knowledge of graph theory, and from a node, the degree of the node is described by a plurality of edges. Delta (c)_v，c_w) Used for judging whether v and w are in the same community, if yes, delta(c_v，c_w) If no, δ (c) is equal to 1_v，c_w)＝0。

And determining a target community corresponding to the target object information in the plurality of communities obtained by clustering. For example, the target object information "zhang san" is added to the knowledge graph, the knowledge graph includes a community 1, a community 2, and a community 3, and since the target object "zhang san" is in the community 2, the target community corresponding to the target object information is the community 2.

Step 206: and determining an initial candidate object set in the candidate object set according to the target community.

A target community corresponding to the target object information is determined in the plurality of target communities, and the initial candidate object set is composed of at least one object in the target community.

In practical application, each community corresponds to at least one object; the step of determining an initial candidate object set in the candidate object set according to the target community comprises the following steps:

determining the objects in the target community as initial candidate objects, and generating an initial candidate object set.

Specifically, the target community includes at least one object node, each object node corresponds to one object, and the objects in the target community are used as initial candidate objects to generate an initial candidate object set.

In a specific embodiment of the present application, referring to fig. 3, fig. 3 shows a schematic diagram of a target community corresponding to target object information provided in an embodiment of the present application, as shown in fig. 3, a community Q is the target community, the target community includes 4 nodes, and zhang is a target object. In addition to the object Zhang III, the community Q comprises objects of Wang five, Li one and Zhao four, and the objects of Wang five, Li one and Zhao four are taken as initial candidate objects to generate an initial candidate object set.

Step 208: and screening a target candidate object set in the initial candidate object set according to the target object information.

The target candidate object is a target candidate object determined in the initial candidate object set according to the target object information, and a target candidate object set is generated according to at least one target candidate object.

In practical application, the step of screening the target candidate object set in the initial candidate object set according to the target object is as follows:

calculating a similarity of the target object to each initial candidate object in the initial candidate object set;

and determining a target candidate object set from the initial candidate object set according to the similarity.

Specifically, the initial candidate object set includes one or more initial candidate objects, the similarity between the target object and each initial candidate object is respectively calculated, and an object meeting the similarity requirement is determined in the initial target object set as a target candidate object according to the similarity, so as to form a target candidate object set.

In a specific embodiment of the present application, the target object is a, and the initial candidate object set includes an object B and an object C. And respectively calculating the similarity of the target object A with the object B and the object C, and determining that the object B meets the similarity requirement according to the similarity, namely, taking the object B as a target candidate object to form a target candidate object set.

In practical applications, the step of calculating the similarity between the target object and each initial candidate object in the initial candidate object set includes:

selecting a target initial candidate object from the initial candidate object set;

acquiring target object information of the target object and target initial candidate object information of the target initial candidate object;

embedding the target object information and the target initial candidate object information respectively to obtain a target object vector and a target initial candidate object vector;

and calculating the similarity of the target object vector and the target initial candidate object vector.

Specifically, any one initial candidate object is determined as a target initial candidate object in the initial candidate object set. Acquiring target object information and target initial candidate object information, and performing embedding processing on the target object information and the target initial candidate object information, wherein the target object information is converted into a target object vector, and the target initial candidate object information is converted into a target initial candidate object vector. And calculating the Distance between the target object vector and the target initial candidate object vector, wherein specific calculation methods include but are not limited to euclidean Distance (euclidean Distance), Manhattan Distance (Manhattan Distance), Minkowski Distance (Minkowski Distance), cosine similarity (cosine similarity), and Pearson Correlation Coefficient (Pearson Correlation Coefficient). The obtained distance value is the similarity between the target object and the target initial candidate object.

In a specific embodiment of the present application, the initial candidate object set includes an object E, an object F, and an object G, and the object E is selected as a target initial candidate object. Object information of the object E and object information of the target object a are acquired. The object information of the object E is converted into the form of a vector E, and the object information of the object A is converted into the form of a vector A. And calculating the similarity of the vector A and the vector E, namely calculating the distance between the vector A and the vector E, and obtaining the similarity of the object A and the object E through algorithms such as Euclidean distance and the like.

Specifically, the method for determining the target object candidate set according to the similarity includes, but is not limited to:

sorting the initial candidate objects in the initial candidate object set according to the similarity, and selecting a preset number of initial candidate objects to generate a target candidate object set; or

And selecting initial candidate objects with the similarity larger than a preset threshold value from the initial candidate object set to generate a target candidate object set.

In a specific embodiment of the present application, the number of objects is preset according to actual requirements. And sequencing the initial candidate objects in a descending order according to the similarity between the target object and each initial candidate object, and selecting a preset number of objects as the target candidate objects to form a target object set.

In another embodiment of the present application, a similarity threshold is preset. And acquiring the similarity between the target object and each initial candidate object, and generating an initial candidate object set by taking the object with the similarity larger than a preset threshold value as the initial candidate object.

In practical applications, the step of calculating the similarity between the target object and each initial candidate object in the initial candidate object set may further include:

inputting target object information of the target object and target initial candidate object information of the target initial candidate object into an object similarity model;

and receiving the similarity of the target object and the target initial candidate object output by the object similarity model.

Specifically, any one initial candidate object is determined in the initial candidate object set as a target initial candidate object, and the target initial candidate object and the target object are input into the object similarity model. The object similarity model calculates the similarity between two objects of the input model and outputs the calculated similarity value. And receiving the similarity of the target initial candidate object and the target object output by the similarity model.

In a specific embodiment of the present application, a target initial candidate object is selected as an object E from the initial candidate object set. And inputting the object E and the target object A into an object similarity model, and calculating the similarity between the two objects. The object similarity model outputs a similarity. And receiving the similarity of the object A and the object E output by the object similarity model as the similarity between the two objects.

The object similarity model is generated by the following steps:

acquiring sample data and a sample label, wherein the sample data comprises an object information pair, and the sample label comprises the object similarity of the object information pair;

inputting the object information pair into the object similarity model;

receiving the predicted similarity output by the object similarity model;

calculating a loss value according to the prediction similarity and the object similarity;

and performing iterative training on the object similarity model based on the loss value until a training stopping condition is reached.

Specifically, two similar object texts are obtained to form an object information pair, for example, positions in the object information are algorithm engineers, similarity of the two objects is labeled, sample data includes information of the two objects in the object information pair, and a sample label includes object similarity of the two objects in the object pair. The object similarity calculation model is preferably a Bert model. In the process of training the object similarity model, extracting the structural information in the object text, inputting the structural information into the object similarity model for processing, and receiving the predicted similarity output by the object similarity model. Calculating a loss value of the prediction similarity and the object similarity, adjusting parameters in an object similarity model according to the loss value, continuing training until the similarity output by the model meets the requirement, and stopping training to obtain a trained object similarity model.

The similarity of the object information is calculated through the object similarity model, implicit information contained in the object text can be effectively mined, so that related objects can be more accurately and comprehensively recommended, the similarity model obtained through training only judges the similarity of the text, is not limited by fields, and can be applied to many fields for calculating the similarity.

The method for screening based on the knowledge graph comprises the steps of obtaining target object information; adding the target object information into a knowledge graph constructed by a candidate object set, and determining a target community corresponding to the target object information, wherein the knowledge graph comprises a plurality of communities; determining an initial candidate object set in the candidate object set according to the target community; and screening a target candidate object set in the initial candidate object set according to the target object information. The method comprises the steps of determining objects similar to target objects in a knowledge graph for dividing communities as candidate objects, generating an initial candidate object set, and screening out a target object set which is more accordant with actual requirements through similarity calculation, so that the efficiency of determining the candidate objects through the knowledge graph is improved. The similarity of the object texts is calculated by combining the community and the model, the method is independent of the size and the distribution form of data, has high mobility, can effectively mine the hidden information contained in the object texts, and can label the object with a more accurate label through similarity calculation, thereby more accurately finishing screening.

The method of the present application is further explained below by taking the application of the method of the present application to screening resumes as an example:

with the development of enterprises, more and more job seekers can deliver resumes to the enterprises, after the enterprises receive the resumes, the enterprises can generate corresponding labels for the applicants according to the contents in the resumes, establish a talent bank of the enterprises, and find out talents meeting requirements in the talent bank when new recruitment requirements exist.

In the existing method for recommending resumes based on the talent base, recruiters of enterprises sometimes find an excellent resume, hope to recommend resumes similar to the resume through the talent base for comparison, at the moment, labels corresponding to the excellent resume are matched in the talent base, and resumes with the same labels are screened out.

The screening method based on the knowledge graph can well overcome the defects and screen out the similar resumes with high utilization rate.

Fig. 4 shows a flowchart of a method for knowledge-graph-based screening, which is described by taking screening of resumes similar to the target resume as an example, and includes steps 402 to 418.

Step 402: and acquiring the target resume.

The target resume is determined based on the selection of the recruiter. The recruiter selects from the plurality of resumes, determines a resume meeting the expected recruitment requirement as a target resume, and then searches the resume similar to the target resume in the talent library.

In a specific embodiment provided by the application, the recruiter receives one resume a, and wishes to find two resumes that are similar to resume a in the talent library of the company, where resume a is the target resume. And the server receives the resume information of the target resume A.

Step 404: and determining at least one to-be-identified area of the target resume, and identifying and extracting resume information in each to-be-identified area.

Specifically, after the target resume is determined, an area in the resume text is determined as an area to be identified, wherein the area to be identified includes but is not limited to a partial area or an entire resume area of the resume text. And after the area to be identified is determined, identifying the text information content in the area to be identified.

In a specific embodiment provided by the present application, the target resume a is in PDF format, and is divided into several different areas to be identified in the target resume a, including: basic information area, work skill area, work experience area, and the like. The target resume A is subjected to OCR recognition, each area to be recognized is recognized, text information in each area is extracted, and resume information of the target resume A is formed. Extracting related information from the target resume A by an information extraction technology to form structured information, and acquiring resume information identified from the area to be identified, wherein the information extraction technology comprises the following steps: "name", "zhang san", "sex", "male", "contact phone", "130 ×", "work skill", "communication", "work experience", "first company practice", and the like.

Step 406: and splicing each resume information to generate target resume information.

Specifically, resume information identified from the area to be identified is obtained, and the resume information is spliced to generate target resume information.

In a specific embodiment of the present application, acquiring resume information identified from an area to be identified includes: the resume information is spliced to generate target resume information name: zhang III; sex: male; and (4) contacting the telephone: 130 ×/x ×; … …'.

Step 408: and storing the target resume information into a talent database to construct a talent knowledge graph, and clustering the talent knowledge graph according to the candidate resumes to generate a plurality of communities.

Talent knowledge maps are created from a set of candidate resumes. The method comprises the steps of obtaining structural brief calendar information corresponding to each candidate brief calendar by identifying the brief calendar information in a candidate brief calendar set, constructing a talent knowledge map by taking schools, units, positions, skills and the like in the structural brief calendar information as entities, and storing the brief calendar information by using a map database. And dividing the talent knowledge graph into a plurality of communities according to the candidate resume clustering, wherein each community at least comprises one candidate.

Specifically, the talent knowledge graph is constructed based on data in a talent database, wherein the data in the talent database is candidate resume information extracted from candidate resumes in the candidate resume set. After the target resume information is obtained, the target resume information, namely the structured target resume information, is also imported into the talent database, and the talent knowledge graph is constructed according to the talent database imported with the structured target resume information. In the process of constructing the talent knowledge graph, a plurality of communities are generated according to the candidate resume clustering, and each community comprises at least one candidate identification. And the candidate resume clustering is to cluster the talent knowledge graph through community discovery according to a community discovery algorithm.

After the talent knowledge graph is obtained, the candidate person identifications in the talent knowledge graph are clustered according to a community discovery algorithm, and further, clustering is carried out according to information such as schools, units and skills corresponding to each candidate person identification. The community discovery (community detection) algorithm is used for discovering a community structure in a network, wherein the community structure is a common characteristic of a knowledge graph network, and the whole network is composed of a plurality of communities.

After the target resume information is obtained, the target resume information is added to the talent knowledge graph, which community belongs to the target resume information in the talent knowledge graph is calculated through the modularity calculating method, the community is used as a target community, and after the target community is determined, the target talent identification corresponding to the target resume is added to the target community.

In a specific embodiment of the present application, referring to fig. 5, fig. 5 shows a schematic diagram of a talent knowledge graph provided in an embodiment of the present application, and as shown in fig. 5, the talent knowledge graph includes 4 communities, namely community 1, community 2, community 3, and community 4, after being classified according to modularity. In this embodiment, the target resume is resume a, the resume a is a resume of Zhang III as a candidate, and target resume information corresponding to the resume a is added to the talent knowledge map to obtain a talent knowledge map containing the target resume information.

Step 410: and determining a target community corresponding to the target resume information in the plurality of communities based on the candidate resume clustering.

Target resume information is added to the talent knowledge graph, and the candidate resume clustering divides the talent knowledge graph into a plurality of communities. And identifying target resume information, comparing the resume information with resumes in each community in the talent knowledge map, and determining a target community corresponding to the target resume information according to matching of the target resume information and the resume information in the target community.

In a specific embodiment of the present application, following the above example, after the modularity calculation, it is determined that the target resume information belongs to the community 4, and then it is determined that the community 4 is the target community.

Step 412: and determining that the candidate in the target community is an initial candidate, acquiring the resume of each initial candidate, and forming an initial candidate resume set.

The target community of the talent knowledge graph comprises a plurality of corresponding candidate identifications, and each candidate identification corresponds to a respective candidate resume.

And clustering candidate identifications in the talent knowledge graph based on the modularity to obtain a plurality of communities, wherein the talent identifications in the same community have similar resume information, such as similar skills, similar positions, similar schools and the like.

Therefore, after the target community is determined, the talent identification corresponding to the target community is determined as the initial candidate identification, and the resumes corresponding to each initial candidate identification form an initial candidate resume set.

In a specific embodiment of the present application, referring to fig. 6 along with the above example, fig. 6 shows a schematic view of the community 4 provided by fig. 6 according to an embodiment of the present application, and further explanation is made on the step of determining the initial candidate resume set. The community 4 contains a plurality of candidates, and all the candidates in the community 4 are used as initial candidates, such as wangwu, li one, zhao, etc., as initial candidates according to the fact that the third candidate is in the community 4. And acquiring the resume of each initial candidate to form an initial candidate resume set.

Step 414: and selecting a target initial candidate resume from the initial candidate resume set, and inputting the target resume and the target initial candidate resume to a resume similarity model.

Specifically, an initial candidate resume is arbitrarily selected from the initial candidate resume set as a target initial candidate resume. And inputting the target resume and the target initial candidate resume into a resume similarity model.

In one embodiment of the present application, the target resume is resume a, and the initial candidate resume set includes resume G, resume K, and resume M … …. And selecting the resume K from the initial candidate resume set as a target initial candidate resume to be input into the resume similarity model together with the target resume A.

The method for training the resume similarity model in this embodiment is as follows:

and acquiring sample data and a sample label, wherein the sample data is a resume pair consisting of two resumes, and the sample label is the similarity of the two resumes in the sample data resume pair. Inputting the resume pairs into an untrained resume similarity model; receiving the prediction similarity output by the resume similarity model; calculating a loss value according to the prediction similarity and the resume similarity; and performing iterative training on the resume similarity model based on the loss value until a training stopping condition is reached to obtain the trained resume similarity model.

Step 416: and receiving the similarity of the target resume output by the resume similarity model and the target initial candidate resume.

In the above example, the resume A and the resume K are input into the resume similarity model, and the similarity between the resume A and the resume K output by the resume similarity model is received.

The resume A and other resumes in the initial candidate resume set, such as the resume M, are input into the resume similarity model, and the similarities between the resume A and the other resumes in the initial candidate resume set can be received.

Step 418: and sequencing the initial candidate resumes in the initial candidate resume set according to the similarity, and selecting a preset number of initial candidate resumes to generate a target candidate resume set.

And sorting the resumes in the initial candidate resume set according to the similarity, selecting a preset number of candidate resumes as target candidate resumes, and generating a target candidate resume set.

In a specific embodiment of the present application, the resume similarity calculation model is used to calculate the resume similarities of the initial candidate resumes, such as the resume G, the resume K, the resume M, and the like in the target resume a and the initial candidate resume set, respectively, so as to obtain that the similarity between the resume a and the resume G is 78%, the similarity between the resume a and the resume K is 87%, and the similarity between the resume a and the resume M is 90%. And sorting the resumes in the initial candidate resume set according to the sequence of the similarity from large to small, and selecting two resumes with the highest similarity as target candidate resumes to form a target candidate resume set.

In the screening method based on the knowledge graph, the target resume is obtained; extracting resume information of the target resume to generate target resume information; adding the target resume information into a talent knowledge graph constructed by a candidate resume set, and determining a target community corresponding to the target resume information, wherein the talent knowledge graph comprises a plurality of communities; determining an initial candidate resume set in the candidate resume set according to the target community; and screening a target candidate resume set from the initial candidate resume set according to the target resume. The screening method based on the knowledge graph recommends the clustered resume based on the knowledge graph and combines the model to calculate the similarity, and the problem that the resume screened by the existing screening method is low in utilization rate is solved. The similarity of the resume is calculated by using the model, the method is independent of the size and the distribution form of data, has high mobility, and can effectively mine the implicit information contained in the resume text, thereby more accurately finishing screening.

Corresponding to the above method embodiments, the present application further provides an embodiment of a knowledge-graph based screening apparatus, and fig. 7 shows a schematic structural diagram of the knowledge-graph based screening apparatus according to an embodiment of the present application. As shown in fig. 7, the apparatus includes:

an acquisition module 702 configured to acquire target object information;

an adding module 704, configured to add the target object information to a knowledge graph constructed by a candidate object set, and determine a target community corresponding to the target object information, where the knowledge graph includes a plurality of communities;

a determination module 706 configured to determine an initial set of candidate objects in the set of candidate objects according to the target community;

a screening module 708 configured to screen a target set of candidate objects from the initial set of candidate objects according to the target object information.

Optionally, the obtaining module 702 is configured to:

acquiring a target object text;

determining at least one region to be identified in the target object text;

identifying and extracting object information in each area to be identified;

and splicing each object information to generate target object information.

Optionally, the adding module 704 is further configured to:

Optionally, the determining module 706 is further configured to:

each community corresponds to at least one object;

determining an initial candidate object set in the candidate object set according to the target community, including:

Optionally, the screening module 708 is further configured to:

inputting the object information pair into the object similarity model;

receiving the predicted similarity output by the object similarity model;

Optionally, the screening module 708 is further configured to:

The screening device based on the knowledge graph comprises: an acquisition module configured to acquire target object information; the adding module is configured to add the target object information to a knowledge graph constructed by a candidate object set and determine a target community corresponding to the target object information, wherein the knowledge graph comprises a plurality of communities; a determination module configured to determine an initial set of candidate objects in the set of candidate objects according to the target community; a screening module configured to screen a target candidate set from the initial candidate set according to the target object information.

According to the screening device based on the knowledge graph, the objects similar to the target objects are determined in the knowledge graph for dividing the community to serve as the candidate objects, the initial candidate object set is generated, the target object set which is more accordant with the actual requirement is screened out through similarity calculation, and the efficiency of determining the candidate objects through the knowledge graph is improved. The similarity of the object texts is calculated by combining the community and the model, the method is independent of the size and the distribution form of data, has high mobility, can effectively mine the hidden information contained in the object texts, and can label the object with a more accurate label through similarity calculation, thereby more accurately finishing screening.

The above is an illustrative scheme of the knowledge-graph-based screening apparatus of this embodiment. It should be noted that the technical solution of the screening apparatus based on the knowledge-graph is the same as the technical solution of the screening method based on the knowledge-graph, and details of the technical solution of the screening apparatus based on the knowledge-graph, which are not described in detail, can be referred to the description of the technical solution of the screening method based on the knowledge-graph.

It should be noted that the components in the device claims should be understood as functional blocks which are necessary to implement the steps of the program flow or the steps of the method, and each functional block is not actually defined by functional division or separation. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.

There is also provided in an embodiment of the present application a computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the method for knowledge-graph based screening when executing the computer instructions.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned screening method based on the knowledge graph belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the above-mentioned screening method based on the knowledge graph.

An embodiment of the present application also provides a computer readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the knowledge-graph based screening method as described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium is the same as that of the above-mentioned screening method based on the knowledge graph, and for details that are not described in detail in the technical solution of the storage medium, reference may be made to the description of the technical solution of the above-mentioned screening method based on the knowledge graph.

The embodiments of the present application disclose a chip storing computer instructions which, when executed by a processor, implement the steps of the aforementioned knowledge-graph based screening method.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. a screening method based on knowledge graph, is characterized in that, comprises:

Get target object information;

adding the target object information to the knowledge graph constructed by the candidate object set, and determining the target community corresponding to the target object information, wherein the knowledge graph includes multiple communities;

Determine an initial candidate object set in the candidate object set according to the target community;

The target candidate object set is screened in the initial candidate object set according to the target object information.

2. The method according to claim 1, wherein the acquiring target object information comprises:

Get the target object text;

determining at least one to-be-recognized area in the text of the target object;

Identify and extract object information in each of the to-be-identified regions;

Each of the object information is spliced to generate target object information.

3. The method according to claim 1, wherein the adding the target object information to the knowledge graph constructed by the candidate object set, and determining the target community corresponding to the target object information, comprises:

The target object information is stored in a graph database, and the knowledge graph corresponding to the target object information is updated, wherein the knowledge graph is constructed from a set of candidate objects, and the knowledge graph is clustered according to the candidate object information, Generate multiple communities;

A target community corresponding to the target object information is determined from the plurality of communities based on the candidate object clustering result.

4. The method according to claim 1, wherein each of the communities corresponds to at least one object;

Determining an initial candidate object set in the candidate object set according to the target community includes:

An object in the target community is determined as an initial candidate object, and an initial candidate object set is generated.

5 . The method according to claim 1 , wherein the screening the target candidate object set in the initial candidate object set according to the target object comprises: 5 .

calculating the similarity between the target object and each initial candidate object in the initial candidate object set;

A target candidate object set is determined from the initial candidate object set according to the similarity.

6. The method according to claim 5, wherein the calculating the similarity between the target object and each initial candidate object in the initial candidate object set comprises:

obtaining target object information of the target object and target initial candidate object information of the target initial candidate object;

Perform embedding processing on the target object information and the target initial candidate object information, respectively, to obtain a target object vector and a target initial candidate object vector;

Calculate the similarity between the target object vector and the target initial candidate object vector.

7. The method according to claim 5, wherein the calculating the similarity between the target object and each initial candidate object in the initial candidate object set comprises:

inputting the target object information of the target object and the target initial candidate object information of the target initial candidate object into the object similarity model;

The similarity between the target object and the target initial candidate object output by the object similarity model is received.

8. The method according to claim 7, wherein the object similarity model is generated by the following steps:

obtaining sample data and sample labels, wherein the sample data includes object information pairs, and the sample labels include object similarity of the object information pairs;

inputting the object information pair into the object similarity model;

receiving the predicted similarity output by the object similarity model;

Calculate a loss value according to the predicted similarity and the object similarity;

The object similarity model is iteratively trained based on the loss value until a training stop condition is reached.

9. The method according to claim 5, wherein the determining the target candidate object set from the initial candidate object set according to the similarity comprises:

Sort the initial candidate objects in the initial candidate object set according to the similarity, and select a preset number of initial candidate objects to generate a target candidate object set; or

An initial candidate object whose similarity is greater than a preset threshold is selected from the initial candidate object set to generate a target candidate object set.

10. A screening device based on knowledge graph, characterized in that, comprising:

The acquisition module is configured to acquire target object information;

The adding module is configured to add the target object information to the knowledge graph constructed by the candidate object set, and determine the target community corresponding to the target object information, wherein the knowledge graph includes multiple communities;

a determining module, configured to determine an initial candidate object set in the candidate object set according to the target community;

The screening module is configured to screen the target candidate object set in the initial candidate object set according to the target object information.

11. A computing device comprising a memory, a processor and a computer instruction stored on the memory and running on the processor, wherein the processor implements any one of claims 1-9 when the processor executes the computer instruction the steps of the method described in item.

12. A computer-readable storage medium storing computer instructions, wherein when the computer instructions are executed by a processor, the steps of the method according to any one of claims 1-9 are implemented.