CN108805290A - A kind of determination method and device of entity class - Google Patents
A kind of determination method and device of entity class Download PDFInfo
- Publication number
- CN108805290A CN108805290A CN201810691032.9A CN201810691032A CN108805290A CN 108805290 A CN108805290 A CN 108805290A CN 201810691032 A CN201810691032 A CN 201810691032A CN 108805290 A CN108805290 A CN 108805290A
- Authority
- CN
- China
- Prior art keywords
- entity
- classification
- candidate data
- category
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application provides a kind of determination method and devices of entity class in knowledge mapping, scheme obtains candidate data to set, wherein, the candidate data includes at least one candidate data pair to set, and each of described candidate data centering includes at least one entity and at least one classification;Attribute information based on entity attributes information, entity attributes value and/or classification, judge whether at least one candidate data belongs to the candidate data at least one classification for being included at least one entity for being included, if it is, by the classification that at least one category label is at least one entity.It is thus achieved that determining the classification of entity, expanding the classification number of entity and/or improving the accuracy of the classification of entity.
Description
Technical field
This application involves technical field of data processing, in particular in a kind of knowledge mapping, the determination of entity class
Method and device.
Background technology
Information network of the knowledge mapping as a structuring, has broken the limitation of original relevant database, has strong
Big ability to express plays important role in the fields such as information retrieval and information integration.Wherein, classify to entity,
The classification belonging to entity can be specified, the network topology structure of knowledge mapping is improved, improves the ability to express of knowledge mapping, for
Knowledge mapping builds and using (such as:Knowledge reasoning, entity link, intelligent answer etc.) there are important meaning and value.
And all there is the granularity of classification of entity either by the way of artificial mark or automatic marking in the prior art
It is relatively rough, the poor problem of precision so that the knowledge covering surface of the knowledge mapping built is bad, limits knowledge mapping
Application.
Invention content
In view of this, the embodiment of the present application provides a kind of determination method and device of entity class, entity can be expanded
Classification number, and/or refine classification to entity, solve the problems, such as that entity classification granularity is excessively thick and/or inaccurate.
The embodiment of the present application provides a kind of determination method of entity class, the method includes:
Candidate data is obtained to set, wherein the candidate data includes at least one candidate data pair to set, described
Each of candidate data centering includes at least one entity and at least one classification;
Attribute information based on entity attributes information, entity attributes value and/or classification judges at least one time
Select whether data belong to the candidate data at least one classification for being included at least one entity for being included, if so,
It is then the classification of at least one entity by least one category label.
Optionally, the candidate data is built to set, including:For the entity sets and category set of initial acquisition,
If a certain entity attributes information aggregate and the attribute of a certain classification in the category set in the entity sets are believed
The repeat element that breath is gathered meets certain condition, then is based on the entity and builds candidate data pair with the category, and build or add
To the candidate data to set.
Optionally, the certain condition refers to:The weight of the attribute information set of the entity attributes information aggregate and the category
The number of complex element accounts for three one-tenth or three one-tenth or more of the attribute information set whole element number of the category.
Optionally, the judgement is executed by convolutional neural networks model.
Optionally, the attribute information based on entity attributes information, entity attributes value and/or classification, judges institute
Stating at least one candidate data, whether to belong to the candidate data at least one entity for being included at least one to included
Classification specifically includes:
Based at least one candidate data pair, vector matrix set is built, the vector matrix set includes at least
One vector matrix, the vector matrix include that the candidate data believes vector, the entity attributes of the entity for being included
Breath vector, entity attributes value vector, the vector of other entities associated with the entity, other entity attributes information
Vector, this other entity attributes values vector, the candidate data are to the vector of the classification that is included and/or the attribute of the category
Information vector;
The vector matrix set is input in the convolutional neural networks model, is handled by convolution algorithm, pondization,
Export judging result.
Optionally, further include the steps that the attribute information of classification is obtained based on entity attributes information:When known multiple realities
When body belongs to a certain classification, and it is more than the shared one or more attribute informations of a certain proportion of entity in the multiple entity, then
Shared one or more attribute informations are defined as to the attribute information of the category.
The embodiment of the present application also provides a kind of determining device of entity class, described device includes:
Acquisition module, for obtaining candidate data to set, wherein the candidate data includes at least one time to set
It includes at least one entity and at least one classification to select data pair, each of described candidate data centering;
Judgment module is used for the attribute information based on entity attributes information, entity attributes value and/or classification, judges
At least one candidate data at least one entity for being included whether belong to the candidate data to included at least one
A classification, if it is, by the classification that at least one category label is at least one entity.
Optionally, the acquisition module, is specifically used for:
For the entity sets and category set of initial acquisition, if a certain entity attributes letter in the entity sets
Breath set and the repeat element of the attribute information set of a certain classification in the category set meet certain condition, then being based on should
Entity builds candidate data pair with the category, and builds or be added to the candidate data to set.
Optionally, the certain condition refers to:The weight of the attribute information set of the entity attributes information aggregate and the category
The number of complex element account for the attribute information set whole element number of the category half or more than half.
Optionally, the judgment module, specifically for being executed by convolutional neural networks model.
The determination method and device of entity class provided by the embodiments of the present application is obtained first comprising at least one candidate number
According to candidate data to set, be then based on the entity attributes information, entity attributes value and/or class of candidate data centering
Other attribute information, judges whether at least one candidate data belongs to the candidate data at least one entity for being included
To at least one classification for being included, if it is, by the classification that at least one category label is at least one entity.By
This realizes the classification of determining entity, expands the classification number of entity and/or improves the accuracy of the classification of entity, solves real
The excessively thick and/or inaccurate problem of body granularity of classification.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate
Appended attached drawing, is described in detail below.
Description of the drawings
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of flow chart of the determination method for entity class that the embodiment of the present application is provided;
The flow chart of the determination method of another entity class provided Fig. 2 shows the embodiment of the present application;
Fig. 3 shows a kind of schematic diagram for hum pattern that the embodiment of the present application is provided;
Fig. 4 shows a kind of structural schematic diagram of the determining device for entity class that the embodiment of the present application is provided;
Fig. 5 shows a kind of structural schematic diagram for computer equipment that the embodiment of the present application is provided.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
Middle attached drawing, technical solutions in the embodiments of the present application are clearly and completely described, it is clear that described embodiment is only
It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real
Applying the component of example can be arranged and designed with a variety of different configurations.Therefore, below to the application's for providing in the accompanying drawings
The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application
Apply example.Based on embodiments herein, institute that those skilled in the art are obtained without making creative work
There is other embodiment, shall fall in the protection scope of this application.
In view of there is a problem of that granularity of classification is excessively thick and/or inaccurate to the classification of entity in the related technology.In view of
This, a kind of embodiment of the application provides a kind of determination method of entity class, can expand entity classification number and/or
Improve the accuracy of the classification of entity.
It is the flow chart of the determination method of entity class provided by the embodiments of the present application, the determination of the body classification referring to Fig. 1
The executive agent of method can be computer equipment, and the above method specifically comprises the following steps:
S101, candidate data is obtained to set, wherein candidate data includes at least one candidate data pair to set, is waited
It includes at least one entity and at least one classification to select each of data centering.
Here, for the ease of to above-mentioned candidate data centering entity and classification understand, (such as in conjunction with online encyclopaedia
Baidupedia) this application scenarios obtains above-mentioned entity and class method for distinguishing to be specifically described the embodiment of the present application.In Baidu hundred
In this application scenarios of section, by taking soccer star's Oscar Cristi Asia promise Rhoneldo as an example, above-mentioned entity can be Baidupedia information
First property value (i.e. Portugal) corresponding with the first attribute information (such as nationality) in frame, above-mentioned classification can be then the first categories
Property information namely nationality.In addition, above-mentioned entity can also be the corresponding title of heading label (such as Portugal football player)
(i.e. Oscar Cristi Asia promise Rhoneldo), above-mentioned classification can be then centre word namely sportsman in heading label.For
Above-mentioned entity and classification, the embodiment of the present application can be from internet site (such as Baidupedia) accurately open data-interface into
Row obtains, and can also be using web crawlers technology, such as a kind of python (explanation type computer program design languages of object-oriented
Speech) function of realizing reptile, the entity and classification of desired acquisition are crawled local computer equipment.
In the embodiment of the present application, if entity is Oscar Cristi Asia promise Rhoneldo, classification can not only correspond to move
Member, football player, Portugal football player etc. can also be singer, basket baller etc..That is, for candidate data
To each candidate data in set for, the classification corresponding to entity may be correct, it is also possible to mistake.
S102, the attribute information based on entity attributes information, entity attributes value and/or classification, judge at least one
Whether candidate data belongs to the candidate data at least one classification for being included at least one entity for being included, if
It is then the classification of at least one entity by least one category label to be.
Here it is possible to judge entity based on the attribute information of entity attributes information, entity attributes value and/or classification
Generic it is whether correct, if so, by the classification that category label is entity.In general, the category of entity attributes information, entity
Property value and/or classification attribute information, can reflect to a certain extent entity and entity, entity and classification and/or classification with
Incidence relation between classification, using these information, can on the basis of considering above-mentioned incidence relation, judge target entity with
The correspondence of target category, such as:Determine whether target entity belongs to target classification, etc..When building candidate data to collection,
In view of when entity and category pair are correct, there may be more weights for the association attributes of entity and the association attributes of classification
Multiple attribute, and in entity and category pair mistake, there may be less for the association attributes of entity and the association attributes of classification
Duplicate attribute, in this way, certain of the embodiment of the present application regulation in a certain entity attributes information aggregate and category set is a kind of
When the repeat element of other attribute information set meets certain condition, the candidate data pair for including the entity and the category is obtained,
And it builds or is added to candidate data to set.
Wherein, above-mentioned certain condition refers to:The repetition of the attribute information set of the entity attributes information aggregate and the category
The number of element accounts for three one-tenth or three one-tenth or more of the attribute information set whole element number of the category.Of above-mentioned repeat element
Number accountings can be controlled 30% or more, to ensure accounting that the candidate data centering of structure is correctly matched.Preferably, it will weigh
The ratio of complex element is more than 50% entity and classification, as candidate data to being added to candidate data in set, in this way, energy
Enough ensure that candidate data is relatively high to correct pairing relationship in set.
In order to judge whether at least one candidate data belongs to the candidate data to institute at least one entity for being included
Including at least one classification, the determination method of entity class provided by the embodiments of the present application can be based on convolutional neural networks into
Row judges, as shown in Fig. 2, above-mentioned deterministic process specifically includes:
S201, it is based at least one candidate data pair, builds vector matrix set, vector matrix set includes at least one
Vector matrix, vector matrix include candidate data to the vector for the entity for being included, the entity attributes information vector, the entity
Attribute value vector, the vector of associated with the entity other entities, other entity attributes information vectors, other realities
The attribute value vector of body, candidate data are to the vector of the classification that is included and/or the attribute information vector of the category;
S202, vector matrix set is input in convolutional neural networks model, it is defeated by convolution algorithm, pondization processing
Go out judging result.
Here, it before candidate data is input to convolutional neural networks model to corresponding vector matrix set, needs
The convolutional neural networks model is first trained based on training data.It, can be from known specifically, the convolutional neural networks model stage
Scene obtains data, such as:From Baidupedia, entity, classification and entity and the correspondence of classification, structure first are obtained
Data to set, the second data to set (or based on first data to set structure the second data to set), by structure
And the first data are to the corresponding primary vector set of matrices of set and with the second data to the corresponding secondary vector matrix stack of set
Cooperation is the input feature vector of convolutional neural networks model to be trained, with first data to set/or the second data to set
As output as a result, training obtains the model parameter etc. of convolutional neural networks model, namely obtains trained convolutional Neural net
Network model.Model training stage namely trains some unknown model parameters etc. in neural network model in the embodiment of the present application
Process.
Wherein, the first data include at least one first data pair to set, and each of first data centering includes at least
One first instance and the corresponding at least one first category (such as correct classification) of at least one first instance, the second data pair
Set includes at least one second data pair, and each of second data centering includes at least one second instance and this is at least one
The corresponding at least one second category (such as error category) of second instance.In addition, the embodiment of the present application can be based on the first number
According to centering first instance and its relevant information (such as attribute information, attribute value information), first category and its relevant information (such as attribute
Information) and setting path rule to build primary vector matrix, and the primary vector matrix is added to primary vector square
Battle array set, structure secondary vector set of matrices is similar with structure primary vector set of matrices, referring to above-mentioned particular content, herein
It repeats no more.
After waiting for that model training is good, so that it may be predicted with being based on convolutional neural networks model progress entity classification, at this time
It only needs the entity of candidate data centering and classification being input in trained convolutional neural networks model, you can determine and be somebody's turn to do
The corresponding judging result of the corresponding classification of entity.
It is worth noting that the embodiment of the present application by candidate data to before being input to convolutional neural networks model, structure
It builds with candidate data to corresponding vector matrix set, which includes at least one vector matrix, moment of a vector
Battle array comprising candidate data to the vector for the entity for being included, the entity attributes information vector, entity attributes value vector, with
The vector of other associated entities of the entity, other entity attributes information vectors, this other entity attributes values vector,
Candidate data is to the vector of the classification that is included and/or the attribute information vector of the category, that is, above-mentioned vector matrix set is
With candidate data centering entity and its relevant information (such as attribute information, attribute value information), classification and its relevant information (such as attribute
Information) it is relevant.
When realizing, vector matrix set can be generated by way of hum pattern.By entity, the entity in candidate collection
Attribute information, entity property value, classification, category attribute information build hum pattern respectively as node.Here it is possible to by classification
Attribute information and entity attributes information categorization be attribute information.As shown in figure 3, for a specific example of hum pattern,
In the hum pattern, including this four node sets of entity, attribute, attribute value, classification and the incidence relation between them.It is each
Node refers to an object, that is, refers to an entity or an attribute or an attribute value or a classification.
After the good hum pattern of component, according to the path rule of setting, the node set in hum pattern is selected, is based on set of node
The expression vector of each node builds vector matrix in conjunction, builds or be added to vector matrix set.It, can be in the embodiment of the present application
Insertion is done using correlation model such as metapath2vec, by the digital information that Node is vector form.
In addition, the above-mentioned setting path rule of the following two kinds mode may be used in the embodiment of the present application.First way is
By entity (e)->Attribute value (v)->Attribute (a)->Classification (t), the second way are:Entity (e1)->Attribute value (v1)->Belong to
Property (a1)->Attribute value (v2)->Entity (e2)->Attribute value (v3)->Attribute (a2)->Classification (t), in this way, being based on any candidate
Each nodal information that data are passed through to can determine corresponding path is carried out whole by all nodal informations for passing through path
It closes, you can obtain corresponding vector matrix.Herein, it is not intended to limit the selection of other path rules, it specifically can be with the input of model
It is required that adjusting, increasing or decreasing path rule, corresponding vector matrix is obtained.
By corresponding setting path rule in hum pattern, mulitpath can be obtained, each path all includes
Multiple nodes, presentation-entity, entity attribute and/or category attribute, attribute value, classification respectively, by path all nodes to
To get to corresponding vector matrix, vector matrix set can be formed by combining obtained vector matrix for amount combination.In this way, every
A path all corresponds to a vector matrix, the i.e. corresponding vector matrix set of the set of paths in hum pattern.
Next description carries out the determination process of entity class according to the vector matrix set of structure:
In this way, path rule will be based on, one group of set of paths from entity to classification can be obtained in hum pattern, by road
The vector combination of all nodes is to get to vector matrix set in the respective paths of diameter set.Wherein it is possible to according to the defeated of model
Enter requirement, for the shorter path of length in set of paths (i.e. number of nodes is less on path), zero filling can be selected to operate, or
It is supplied using specific vector, path length is become consistent, facilitate the Regularization of mode input data.The vector that will be obtained
In Input matrix to model, the determination of classification is carried out.
First, convolution algorithm is carried out to the vector matrix set of input using the convolution kernel of setting, obtains multiple features and reflects
Set is penetrated, here, the number of Feature Mapping set is consistent with the number of convolution kernel used by convolution algorithm.That is, passing through
The convolution algorithm of each convolution window (corresponding convolution kernel), you can obtain corresponding Feature Mapping.
Then, maximum pondization is done to these Feature Mappings to operate, can capture in these Feature Mappings most in the layer of pond
Important feature, such as:Here zero filled when vector matrix set is built can be filtered out.In addition, in order to make capture
Feature there is diversity, the convolutional neural networks model in the embodiment of the present application uses multiple volumes with different windows size
Core is accumulated to capture multiple features.
Next, multiple single argument feature vectors corresponding with convolution kernel are joined together to form single feature vector,
And transmit to the end be fully connected layer Sigmod layer, according to operation, output result is two kinds of results of yes/no.That is, passing through
The output of the convolutional neural networks model can determine whether the classification belonging to its entity is correct for any candidate data.
In the embodiment of the present application, the attribute information of classification can be obtained based on entity attributes information, for example, when known more
When a entity belongs to a certain classification, and it is more than the shared one or more attribute informations of a certain proportion of entity in multiple entities, then
Shared one or more attribute informations are defined as to the attribute information of the category, which can be one third, i.e., one
In entity corresponding to a classification, when having the entity of one third to share certain attributes, which can be determined as category attribute.
Based on same inventive concept, the embodiment of the present application provides a kind of entity corresponding with the determination method of entity class
The determining device of classification, the principle solved the problems, such as due to the device in the embodiment of the present application and the above-mentioned entity class of the embodiment of the present application
Other determining method is similar, therefore the implementation of device may refer to the implementation of method, and overlaps will not be repeated.
As shown in figure 4, the structural schematic diagram of the determining device for the entity class that the embodiment of the present application is provided, the entity class
Other determining device specifically includes:
Acquisition module 401, for obtaining candidate data to set, wherein candidate data includes at least one time to set
It includes at least one entity and at least one classification to select data pair, each of candidate data centering;
Judgment module 402 is used for the attribute information based on entity attributes information, entity attributes value and/or classification, sentences
It is at least one to included whether at least one candidate data that breaks belongs to the candidate data at least one entity for being included
Classification, if it is, by the classification that at least one category label is at least one entity.
In one embodiment, acquisition module 401 are specifically used for:
For the entity sets and category set of initial acquisition, if a certain entity attributes information collection in entity sets
It closes and meets certain condition with the repeat element of the attribute information set of a certain classification in category set, be then based on the entity and be somebody's turn to do
Classification builds candidate data pair, and builds or be added to candidate data to set.
Wherein, certain condition refers to:The repeat element of the attribute information set of the entity attributes information aggregate and the category
Number account for three the one-tenth or three one-tenth or more of attribute information set whole element number of the category.
In another embodiment, judgment module 402, specifically for being executed by convolutional neural networks model.
In another embodiment, judgment module 402 is specifically used for:
Based at least one candidate data pair, vector matrix set is built, vector matrix set includes at least one vector
Matrix, vector matrix include candidate data to the vector for the entity for being included, the category of the entity attributes information vector, the entity
Property value vector, the vector of associated with the entity other entities, other entity attributes information vectors, other entities
Attribute value vector, candidate data are to the vector of the classification that is included and/or the attribute information vector of the category;
Vector matrix set is input in convolutional neural networks model, by convolution algorithm, pondization processing, output judges
As a result.
In another embodiment, the determining device of above-mentioned entity class further includes:
Determining module 403 is more than a certain proportion of reality when known multiple entities belong to a certain classification, and in multiple entities
The shared one or more attribute informations of body, the then attribute that shared one or more attribute informations are defined as to the category are believed
Breath.
As shown in figure 5, the schematic device of the computer equipment provided by the embodiment of the present application, the computer equipment packet
It includes:Processor 501, memory 502 and bus 503, the storage of memory 502 executes instruction, when device is run, processor 501
It is communicated by bus 503 between memory 502, what is stored in the execution memory 502 of processor 501 executes instruction as follows:
Candidate data is obtained to set, wherein candidate data includes at least one candidate data pair, candidate data to set
Each of centering includes at least one entity and at least one classification;
Attribute information based on entity attributes information, entity attributes value and/or classification judges at least one candidate number
Whether belong to the candidate data at least one classification for being included according to at least one entity for being included, if it is, will
At least one category label is the classification of at least one entity.
In one embodiment, in the processing that above-mentioned processor 501 executes, candidate data is built to set, including:It is right
In the entity sets and category set of initial acquisition, if a certain entity attributes information aggregate in entity sets and classification collection
The repeat element of the attribute information set of a certain classification in conjunction meets certain condition, then is based on the entity and is waited with category structure
Data pair are selected, and builds or is added to candidate data to set.
Wherein, certain condition refers to:The repeat element of the attribute information set of the entity attributes information aggregate and the category
Number account for three the one-tenth or three one-tenth or more of attribute information set whole element number of the category.
In another embodiment, in the processing that above-mentioned processor 501 executes, sentenced by the execution of convolutional neural networks model
It is disconnected.
In another embodiment, in the processing that above-mentioned processor 501 executes, based on entity attributes information, entity
Attribute value and/or classification attribute information, judge whether at least one candidate data belongs at least one entity for being included
In the candidate data at least one classification for being included, specifically include:
Based at least one candidate data pair, vector matrix set is built, vector matrix set includes at least one vector
Matrix, vector matrix include candidate data to the vector for the entity for being included, the category of the entity attributes information vector, the entity
Property value vector, the vector of associated with the entity other entities, other entity attributes information vectors, other entities
Attribute value vector, candidate data are to the vector of the classification that is included and/or the attribute information vector of the category;
Vector matrix set is input in convolutional neural networks model, by convolution algorithm, pondization processing, output judges
As a result.
Further include being based on entity attributes information in the processing that above-mentioned processor 501 executes in another embodiment
The step of obtaining the attribute information of classification:It is more than certain ratio when known multiple entities belong to a certain classification, and in multiple entities
Shared one or more attribute informations, then are defined as the category of the category by the shared one or more attribute informations of the entity of example
Property information.
The embodiment of the present application also provides a kind of computer readable storage medium, stored on the computer readable storage medium
There is computer program, which executes the determination method of above-mentioned entity class when being run by processor the step of.
Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium
Computer program when being run, be able to carry out the determination method of above-mentioned entity class, it is related using artificial mark to solve
Means generate the mode of sample data, and the granularity of classification of existing entity is relatively rough, the poor problem of precision, to expand
The classification number of big entity, and/or refine the classification to entity.
In embodiment provided herein, it should be understood that disclosed device and method, it can be by others side
Formula is realized.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, only one kind are patrolled
Volume function divides, formula that in actual implementation, there may be another division manner, in another example, multiple units or component can combine or can
To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some communication interfaces, device or unit
It connects, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple
In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in embodiment provided by the present application can be integrated in a processing unit, also may be used
It, can also be during two or more units be integrated in one unit to be that each unit physically exists alone.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer read/write memory medium.Based on this understanding, the technical solution of the application is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be expressed in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be
People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of step.
And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing, in addition, term " the
One ", " second ", " third " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
Finally it should be noted that:Embodiment described above, the only specific implementation mode of the application, to illustrate the application
Technical solution, rather than its limitations, the protection domain of the application is not limited thereto, although with reference to the foregoing embodiments to this Shen
It please be described in detail, it will be understood by those of ordinary skill in the art that:Any one skilled in the art
In the technical scope that the application discloses, it can still modify to the technical solution recorded in previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of the embodiment of the present application technical solution.The protection in the application should all be covered
Within the scope of.Therefore, the protection domain of the application shall be subject to the protection scope of the claim.
Claims (10)
1. a kind of determination method of entity class, which is characterized in that the method includes:
Candidate data is obtained to set, wherein the candidate data includes at least one candidate data pair, the candidate to set
Each of data centering includes at least one entity and at least one classification;
Attribute information based on entity attributes information, entity attributes value and/or classification judges at least one candidate number
Whether belong to the candidate data at least one classification for being included according to at least one entity for being included, if it is, will
At least one category label is the classification of at least one entity.
2. according to the method described in claim 1, it is characterized in that, build the candidate data to set, including:For initial
The entity sets and category set of acquisition, if a certain entity attributes information aggregate in the entity sets and the classification
The repeat element of the attribute information set of a certain classification in set meets certain condition, then is based on the entity and is built with the category
Candidate data pair, and build or be added to the candidate data to set.
3. according to the method described in claim 2, it is characterized in that, the certain condition refers to:The entity attributes information aggregate
The three of the attribute information set whole element number of the category are accounted for the number of the repeat element of the attribute information set of the category
At or three one-tenth or more.
4. according to the method described in claim 1, it is characterized in that, executing the judgement by convolutional neural networks model.
5. according to the method described in claim 4, it is characterized in that, described based on entity attributes information, entity attributes value
And/or the attribute information of classification, judge whether at least one candidate data belongs to this at least one entity for being included
Candidate data specifically includes at least one classification for being included:
Based at least one candidate data pair, vector matrix set is built, the vector matrix set includes at least one
Vector matrix, the vector matrix include the candidate data to the vector of the entity for being included, the entity attributes information to
Amount, entity attributes value vector, the vector of other entities associated with the entity, other entity attributes information to
Amount, other entity attributes value vectors, the candidate data believe the vector of classification and/or the attribute of the category that are included
Breath vector;
The vector matrix set is input in the convolutional neural networks model, by convolution algorithm, pondization processing, output
Judging result.
6. method according to any one of claims 1 to 5, which is characterized in that further include being obtained based on entity attributes information
The step of attribute information of classification:It is more than certain ratio when known multiple entities belong to a certain classification, and in the multiple entity
Shared one or more attribute informations, then are defined as the category of the category by the shared one or more attribute informations of the entity of example
Property information.
7. a kind of determining device of entity class, which is characterized in that described device includes:
Acquisition module, for obtaining candidate data to set, wherein the candidate data includes at least one candidate number to set
According to right, each of described candidate data centering includes at least one entity and at least one classification;
Judgment module is used for the attribute information based on entity attributes information, entity attributes value and/or classification, described in judgement
Whether at least one candidate data belongs to the candidate data at least one class for being included at least one entity for being included
Not, if it is, being the classification of at least one entity by least one category label.
8. device according to claim 7, which is characterized in that the acquisition module is specifically used for:
For the entity sets and category set of initial acquisition, if a certain entity attributes information collection in the entity sets
It closes and meets certain condition with the repeat element of the attribute information set of a certain classification in the category set, be then based on the entity
Candidate data pair is built with the category, and builds or is added to the candidate data to set.
9. device according to claim 8, which is characterized in that the certain condition refers to:The entity attributes information aggregate
The one of the attribute information set whole element number of the category is accounted for the number of the repeat element of the attribute information set of the category
Half or more than half.
10. device according to claim 7, which is characterized in that the judgment module is specifically used for by convolutional neural networks
Model executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810691032.9A CN108805290B (en) | 2018-06-28 | 2018-06-28 | Entity category determination method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810691032.9A CN108805290B (en) | 2018-06-28 | 2018-06-28 | Entity category determination method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108805290A true CN108805290A (en) | 2018-11-13 |
CN108805290B CN108805290B (en) | 2021-03-12 |
Family
ID=64072388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810691032.9A Active CN108805290B (en) | 2018-06-28 | 2018-06-28 | Entity category determination method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108805290B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919175A (en) * | 2019-01-16 | 2019-06-21 | 浙江大学 | An Entity Multi-Classification Method Combining Attribute Information |
CN112805715A (en) * | 2019-07-05 | 2021-05-14 | 谷歌有限责任公司 | Identifying entity attribute relationships |
CN119692351A (en) * | 2025-02-25 | 2025-03-25 | 中国科学院自动化研究所 | Text generation model evaluation method and device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050027664A1 (en) * | 2003-07-31 | 2005-02-03 | Johnson David E. | Interactive machine learning system for automated annotation of information in text |
US7493251B2 (en) * | 2003-05-30 | 2009-02-17 | Microsoft Corporation | Using source-channel models for word segmentation |
CN104484461A (en) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | Method and system based on encyclopedia data for classifying entities |
WO2016081500A1 (en) * | 2014-11-17 | 2016-05-26 | Google Inc. | Structured entity information page |
CN106484675A (en) * | 2016-09-29 | 2017-03-08 | 北京理工大学 | Fusion distributed semantic and the character relation abstracting method of sentence justice feature |
CN106682220A (en) * | 2017-01-04 | 2017-05-17 | 华南理工大学 | Online traditional Chinese medicine text named entity identifying method based on deep learning |
CN107102989A (en) * | 2017-05-24 | 2017-08-29 | 南京大学 | A kind of entity disambiguation method based on term vector, convolutional neural networks |
CN107133220A (en) * | 2017-06-07 | 2017-09-05 | 东南大学 | Name entity recognition method in a kind of Geography field |
CN107729497A (en) * | 2017-10-20 | 2018-02-23 | 同济大学 | A kind of word insert depth learning method of knowledge based collection of illustrative plates |
US20180157643A1 (en) * | 2016-12-06 | 2018-06-07 | Siemens Aktiengesellschaft | Device and method for natural language processing |
-
2018
- 2018-06-28 CN CN201810691032.9A patent/CN108805290B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7493251B2 (en) * | 2003-05-30 | 2009-02-17 | Microsoft Corporation | Using source-channel models for word segmentation |
US20050027664A1 (en) * | 2003-07-31 | 2005-02-03 | Johnson David E. | Interactive machine learning system for automated annotation of information in text |
WO2016081500A1 (en) * | 2014-11-17 | 2016-05-26 | Google Inc. | Structured entity information page |
CN104484461A (en) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | Method and system based on encyclopedia data for classifying entities |
CN106484675A (en) * | 2016-09-29 | 2017-03-08 | 北京理工大学 | Fusion distributed semantic and the character relation abstracting method of sentence justice feature |
US20180157643A1 (en) * | 2016-12-06 | 2018-06-07 | Siemens Aktiengesellschaft | Device and method for natural language processing |
CN106682220A (en) * | 2017-01-04 | 2017-05-17 | 华南理工大学 | Online traditional Chinese medicine text named entity identifying method based on deep learning |
CN107102989A (en) * | 2017-05-24 | 2017-08-29 | 南京大学 | A kind of entity disambiguation method based on term vector, convolutional neural networks |
CN107133220A (en) * | 2017-06-07 | 2017-09-05 | 东南大学 | Name entity recognition method in a kind of Geography field |
CN107729497A (en) * | 2017-10-20 | 2018-02-23 | 同济大学 | A kind of word insert depth learning method of knowledge based collection of illustrative plates |
Non-Patent Citations (1)
Title |
---|
刘峤等: "基于语义一致性的集成实体链接算法", 《计算机研究与发展》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919175A (en) * | 2019-01-16 | 2019-06-21 | 浙江大学 | An Entity Multi-Classification Method Combining Attribute Information |
CN109919175B (en) * | 2019-01-16 | 2020-10-23 | 浙江大学 | Entity multi-classification method combined with attribute information |
CN112805715A (en) * | 2019-07-05 | 2021-05-14 | 谷歌有限责任公司 | Identifying entity attribute relationships |
CN112805715B (en) * | 2019-07-05 | 2025-01-14 | 谷歌有限责任公司 | Identifying entity-attribute relationships |
CN119692351A (en) * | 2025-02-25 | 2025-03-25 | 中国科学院自动化研究所 | Text generation model evaluation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN108805290B (en) | 2021-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110473083B (en) | Tree risk account identification method, device, server and storage medium | |
TWI689871B (en) | Gradient lifting decision tree (GBDT) model feature interpretation method and device | |
CN108171280A (en) | A kind of grader construction method and the method for prediction classification | |
CN111931505A (en) | Cross-language entity alignment method based on subgraph embedding | |
CN110263162A (en) | Convolutional neural networks and its method of progress text classification, document sorting apparatus | |
CN110309840A (en) | Risk trade recognition methods, device, server and storage medium | |
CN109902222A (en) | Recommendation method and device | |
CN114758288A (en) | A kind of distribution network engineering safety management and control detection method and device | |
CN108427708A (en) | Data processing method, device, storage medium and electronic device | |
KR20180063189A (en) | Selective back propagation | |
CN109299258A (en) | A kind of public sentiment event detecting method, device and equipment | |
WO2020220692A1 (en) | Deep neural network and training therefor | |
CN109117380A (en) | A kind of method for evaluating software quality, device, equipment and readable storage medium storing program for executing | |
CN108805290A (en) | A kind of determination method and device of entity class | |
CN114330135B (en) | Classification model construction method and device, storage medium and electronic equipment | |
CN107545038A (en) | A kind of file classification method and equipment | |
CN109670927A (en) | The method of adjustment and its device of credit line, equipment, storage medium | |
CN105843931A (en) | Classification method and device | |
CN112712383A (en) | Potential user prediction method, device, equipment and storage medium of application program | |
CN108647714A (en) | Acquisition methods, terminal device and the medium of negative label weight | |
CN108921213A (en) | A kind of entity classification model training method and device | |
CN109886299A (en) | A kind of user draws a portrait method, apparatus, readable storage medium storing program for executing and terminal device | |
CN115018627A (en) | A credit risk assessment method and device, storage medium and electronic equipment | |
CN114091446A (en) | Method and apparatus for generating text | |
CN110837847A (en) | User classification method and device, storage medium and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 101-8, 1st floor, building 31, area 1, 188 South Fourth Ring Road West, Fengtai District, Beijing Applicant after: Guoxin Youyi Data Co., Ltd Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing Applicant before: SIC YOUE DATA Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |