[go: up one dir, main page]

CN120893547B - Dynamic collaborative construction method and system for historical document version knowledge ontology - Google Patents

Dynamic collaborative construction method and system for historical document version knowledge ontology

Info

Publication number
CN120893547B
CN120893547B CN202511398957.0A CN202511398957A CN120893547B CN 120893547 B CN120893547 B CN 120893547B CN 202511398957 A CN202511398957 A CN 202511398957A CN 120893547 B CN120893547 B CN 120893547B
Authority
CN
China
Prior art keywords
version
conflict
entity
document
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202511398957.0A
Other languages
Chinese (zh)
Other versions
CN120893547A (en
Inventor
李劲
杨义
邹璞
田维
李薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China South Publishing & Media Group Co ltd
Original Assignee
China South Publishing & Media Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China South Publishing & Media Group Co ltd filed Critical China South Publishing & Media Group Co ltd
Priority to CN202511398957.0A priority Critical patent/CN120893547B/en
Publication of CN120893547A publication Critical patent/CN120893547A/en
Application granted granted Critical
Publication of CN120893547B publication Critical patent/CN120893547B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a dynamic collaborative construction method and a dynamic collaborative construction system for historical document version knowledge ontology, which relate to the technical field of human knowledge graph, and comprise the steps of inputting a version feature set into a multi-node cloud collaborative processing architecture, calculating local similarity scores among versions, outputting a version similarity matrix, carrying out consistency evaluation on the version similarity matrix by using a distributed conflict detection mechanism, marking entity mapping pairs with confidence degrees lower than confidence degree threshold values, generating conflict transaction logs, inputting the conflict transaction logs into a five-level conflict hierarchical processing protocol, outputting arbitration results, converting the arbitration results into corresponding OWL (open web language) ontology update sentences, attaching version traceability notes, outputting historical document update instructions, dynamically expanding an entity relation network of the historical document knowledge ontology according to the historical document update instructions, and outputting the historical document version knowledge ontology through a three-layer intelligent analysis framework. The invention significantly enhances the automation and reliability of multi-version document knowledge management.

Description

Dynamic collaborative construction method and system for historical document version knowledge ontology
Technical Field
The invention relates to the technical field of human knowledge graph, in particular to a dynamic collaborative construction method and system for historical literature version knowledge ontology.
Background
Under the background of continuous development of digital humane research and historical document information engineering, how to realize structural modeling and semantic unification between historical document versions becomes an important research direction in the crossing field of knowledge engineering and semantic calculation. In recent years, knowledge graph and ontology construction technology is widely applied to semantic extraction, relational modeling and semantic pushing of literature resources, and particularly has strong knowledge fusion capability in the aspect of processing large-scale heterogeneous text resources. In a historical literature scene, aiming at the problems of entity naming difference, information addition and deletion evolution, semantic drift and the like existing among different versions of literature, a plurality of preliminary solutions based on ontology alignment, entity mapping or version comparison are provided in the academy.
In the prior art, when complex semantic problems such as entity mapping conflict and version tracing are processed, a simple merging rule or a manual correction-based mode is often adopted, so that rough processing granularity and low processing efficiency are caused in terms of version consistency evaluation and semantic transition tracking. In comparison, when a knowledge ontology with source traceability, dynamic evolution and conflict self-processing capability is required to be constructed for multiple versions in a history document, the existing scheme still has an optimization space in the aspects of multi-node collaborative processing, semantic consistency verification, automatic ontology update expression and the like.
Disclosure of Invention
The present invention has been made in view of the above-described problems occurring in the prior art.
Therefore, the invention provides a dynamic collaborative construction method for historical document version knowledge ontology, which solves the problems of automatic arbitration of multi-version entity conflict and efficient collaboration of dynamic ontology update.
In order to solve the technical problems, the invention provides the following technical scheme:
In a first aspect, the invention provides a dynamic collaborative construction method for a historical document version ontology, which comprises the steps of collecting a multi-source historical document digitized version, and preprocessing to generate a version structured document set;
extracting knowledge entity characteristics in each version document from the version structured document set to construct a version characteristic set;
Inputting the version feature set into a multi-node cloud co-processing architecture, calculating local similarity scores among versions, and outputting a version similarity matrix;
carrying out consistency evaluation on the version similarity matrix by using a distributed conflict detection mechanism, marking entity mapping pairs with confidence degrees lower than a confidence degree threshold value, and generating a conflict transaction log;
inputting the conflict transaction log into a five-level contradiction hierarchical processing protocol, outputting an arbitration result, converting the arbitration result into a corresponding OWL body update statement, attaching a version tracing annotation, and outputting a history document update instruction;
And dynamically expanding an entity relation network of the historical document ontology according to the historical document updating instruction, and outputting the historical document version ontology through a three-layer intelligent analysis framework.
As a preferable scheme of the dynamic collaborative construction method of the historical document version ontology, the method comprises the steps of preprocessing to generate a version structured document set, specifically comprising the following steps,
Performing unified format conversion on the digital version of the multi-source historical document to obtain a standardized document input data set, and performing character recognition operation to obtain an initial recognition result;
and performing text error correction and layout structure recovery processing on the initial recognition result to obtain a text information set, and extracting and marking metadata information to obtain a version structured document set.
As a preferable scheme of the dynamic collaborative construction method of the historical document version ontology, the method comprises the steps of constructing a version feature set,
Carrying out named entity recognition on each historical document in the version structured document set, extracting and normalizing five types of knowledge entities to form a normalized entity set, and constructing a subject-predicate-object triplet by combining semantic role labeling and dependency syntactic analysis;
Matching and associating the same five types of knowledge entities in different versions by using a subject-predicate-object triplet, outputting an aligned entity relation diagram, extracting multidimensional feature vectors of the five types of knowledge entities, performing standardization processing, and outputting a standardized feature vector set;
and carrying out entity characteristic weighted average aggregation on the standardized characteristic vector set according to the document version, constructing a publication level characteristic matrix, establishing a mapping relation between row vectors in the version level characteristic matrix and corresponding version identifiers, and outputting the version characteristic set.
As a preferable scheme of the dynamic collaborative construction method of the historical document version ontology, the method comprises the following specific steps of outputting a version similarity matrix,
Inputting the version feature set into a multi-node cloud cooperative processing architecture, receiving the version feature set by using a main control node in the multi-node cloud cooperative processing architecture, uniformly distributing the version feature set to a cooperative node cluster according to a version identifier, respectively calculating Euclidean distances of version feature vectors of each historical document, generating a local similarity score through nonlinear conversion, and adding a confidence weight label to generate a local similarity matrix;
And aggregating all local similarity matrixes according to confidence weights through distributed reduction operation, generating a global version similarity matrix, reversely mapping the version identifiers to the row indexes and the column indexes of the global version similarity matrix, and outputting the version similarity matrix.
As a preferable scheme of the dynamic collaborative construction method of the historical document version ontology, the method comprises the following specific steps of generating conflict transaction logs,
Dividing the version similarity matrix into a plurality of entity mapping pairs by utilizing a main control node, and transmitting the entity mapping pairs to each cooperative node;
After receiving the entity mapping pairs, each cooperative node performs version consistency verification based on an adjacent version entity consistency policy, performs semantic consistency analysis by combining entity context semantic information, and identifies semantic conflict candidate entity pairs to form a structural conflict candidate set and a semantic conflict candidate set;
and carrying out intersection processing on the structural conflict candidate set and the semantic conflict candidate set, obtaining an intersection entity mapping pair, extracting corresponding confidence coefficient, judging, marking the confidence coefficient as a conflict entity mapping pair when the confidence coefficient is lower than a confidence coefficient threshold value, and carrying out deduplication merging and mapping relation integration to generate a conflict transaction log.
As a preferable scheme of the dynamic collaborative construction method of the historical document version ontology, the method outputs the historical document updating instruction, comprises the following specific steps,
Resolving the conflict transaction log into a conflict evolution map, and mapping and inputting a five-level conflict layering processing protocol according to five-level conflict types of grammar, structure, version, semantics and value;
Under a five-level contradiction hierarchical processing protocol, based on the entity up-down Wen Yuyi track and the semantic stability index, performing conflict arbitration through a context weighted arbiter, and outputting an arbitration result list;
according to the judging result list, converting different judging types into corresponding OWL body update sentences;
processing node information, version identifiers and conflict level notes are added to the OWL body update statement, and the historical document update instructions are formed in an integrating mode.
As a preferable scheme of the dynamic collaborative construction method of the historical document version ontology, the method comprises the steps of outputting the historical document version ontology,
Analyzing the historical document update instruction into an incremental update event of the entity and the relation;
prioritizing the incremental update events according to the conflict level annotation and the version identifier to generate a pending update event queue;
According to the to-be-processed update event queue, performing entity and relationship insertion, deletion and attribute update operations on the entity relationship network of the historical document knowledge body through each cooperative node to obtain an updated entity relationship network;
And carrying out consistency verification on the updated entity relation network by using an entity-level semantic verification layer, carrying out deep semantic reasoning and version difference analysis by using a version-level association reasoning layer, carrying out global structural optimization by using an overall structural optimization layer, and outputting a historical document version knowledge body.
The invention provides a dynamic collaborative construction system of a historical document version knowledge body, which comprises a document acquisition module, a document collection module and a document collection module, wherein the document acquisition module is used for acquiring a multi-source historical document digitized version and preprocessing the multi-source historical document digitized version to generate a version structured document set;
The entity extraction module is used for extracting knowledge entity characteristics in each version document from the version structured document set to construct a version characteristic set;
the similarity calculation module is used for inputting the version feature set into the multi-node cloud cooperative processing architecture, calculating local similarity scores among versions and outputting a version similarity matrix;
the conflict detection module is used for carrying out consistency evaluation on the version similarity matrix by utilizing a distributed conflict detection mechanism, marking entity mapping pairs with confidence coefficient lower than a confidence coefficient threshold value and generating a conflict transaction log;
The conflict processing module is used for inputting the conflict transaction log into a five-level conflict layering processing protocol, outputting an arbitration result, converting the arbitration result into a corresponding OWL body update statement, attaching a version tracing annotation and outputting a history document update instruction;
And the ontology construction module is used for dynamically expanding the entity relation network of the historical document ontology according to the historical document updating instruction and outputting the historical document version ontology through the three-layer intelligent analysis framework.
In a third aspect, the invention provides a computer device comprising a memory and a processor, the memory storing a computer program, wherein the computer program when executed by the processor implements any step of the dynamic collaborative construction method for historical document version ontology according to the first aspect of the invention.
In a fourth aspect, the present invention provides a computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements any step of the dynamic collaborative construction method for historical document version ontology according to the first aspect of the present invention.
The method has the advantages that efficient and accurate conflict identification of the multi-version entity of the historical document is realized through a multi-node distributed conflict detection mechanism, intelligent arbitration and standardized dynamic update of complex conflicts are realized by combining a five-level contradiction layering processing protocol and a context weighted arbiter, update transparency is ensured by additional version tracing, meanwhile, dynamic collaborative construction and deep semantic reasoning of a knowledge body of the historical document are completed by means of unified preprocessing, multi-dimensional semantic feature extraction and weighted aggregation, multi-version similarity parallel calculation and three-layer intelligent analysis frameworks, accuracy, consistency and maintainability of the knowledge system are effectively improved, and automation and reliability of knowledge management of the multi-version document are remarkably enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a dynamic collaborative construction method for historical document version ontology.
FIG. 2 is a schematic diagram of a dynamic collaborative building system for historical literature version ontologies.
FIG. 3 is a flow chart of version structured document set generation.
FIG. 4 is a flow chart of historical document version ontology generation.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Referring to fig. 1 to fig. 4, in an embodiment of the present invention, the embodiment provides a dynamic collaborative construction method for a historical document version ontology, which includes the following steps:
s1, collecting a digital version of the multi-source historical document, and preprocessing to generate a version structured document set.
It should be noted that the digitized version of the multi-source historical document includes a digitized historical document copy of scanned images, OCR text, structured data and metadata information from different institutions, carriers and periods.
S1.1, carrying out unified format conversion on the digital version of the multi-source historical document to obtain a standardized document input data set, and carrying out character recognition operation to obtain an initial recognition result.
It should be noted that, format recognition is performed on each document file in the collected digital version of the multi-source historical document, different formats such as document image data, structured text files or PDF are uniformly converted into a uniform image+text composite format, and numbering and structure labeling are performed according to pages, so as to obtain a standardized document input data set, and each page in the standardized document input data set contains document image contents with uniform sizes and corresponding metadata frames.
After format conversion is completed, image preprocessing operation is performed on document image contents in a standardized document input data set, wherein the image preprocessing operation comprises the steps of image denoising, inclination correction, contrast enhancement, edge sharpening and the like, so that the accuracy of subsequent character recognition and the definition of an image structure are improved, an optical character recognition method is adopted to detect and recognize the preprocessed document image area row by row, text information contained in the document image is extracted, and preliminary text area aggregation and sequential reconstruction are performed by combining a document image structure, so that an initial recognition result is obtained.
S1.2, performing text error correction and layout structure recovery processing on the initial recognition result to obtain a text information set, and extracting and marking metadata information to obtain a version structured document set.
It should be noted that, text error correction processing based on dictionary comparison and language context is performed on character content in an initial recognition result, by performing spelling correction and semantic matching repair on wrongly recognized words, accuracy of the text content is improved, layout structure recovery processing is performed according to position information of each text line in a standardized historical document image page in the initial recognition result, and by combining line spacing, paragraph alignment mode, page typesetting clues and difference of title and text fonts, original chapter structure, paragraph level and title level relation of the historical document is reconstructed, and a text information set containing a complete document level structure is output.
After the text information set is obtained, according to the six types of metadata information in the document content, including keywords, paragraph labels, page marks, author information, time marks and document source identifiers, metadata information extraction operation is carried out on the text information set, six types of metadata information elements are extracted and stored in a structured mode by using an entity positioning method based on rules and position relations, semantic marks are carried out on each type of metadata information through a label system, metadata information extraction and marking are completed, and finally the version structured document set containing three types of structures of the text content, the structure information and the metadata is output.
S2, extracting knowledge entity features in each version document from the version structured document set to construct a version feature set.
S2.1, carrying out named entity recognition on each historical document in the version structured document set, extracting and normalizing five types of knowledge entities to form a normalized entity set, and constructing a subject-predicate-object triplet by combining semantic role labeling and dependency syntax analysis.
It should be noted that, the named entity identification operation is performed on each historical document text in the version structured document set, the named entity identification adopts an identification method based on part-of-speech tagging and context feature rules, continuous phrases with semantic independence in the historical document text are scanned and tagged, five kinds of knowledge entities are identified from the continuous phrases, the categories of the five kinds of knowledge entities comprise character entities, event entities, place entities, mechanism entities and time entities, and various expression modes exist in the identified five kinds of knowledge entities in the original text, so that normalization processing is performed on the preliminarily identified five kinds of knowledge entities, and expressions with the same semantic directions are unified into canonical expressions through modes such as morphological reduction, synonym dictionary mapping and entity alias matching, and a normalized entity set is generated.
After the normalized entity set is constructed, semantic functions of all components in the sentence are identified by combining semantic role labeling operation, such as agent, accidents, tools, targets and the like, and further, based on dependency syntax, the syntax tree structure of the sentence is analyzed, the dependency relationship among words, such as the syntactic relationship of a master-predicate relationship (such as 'meeting-holding'), a movable guest relationship (such as 'passing through a resolution') and a state-in-state relationship (such as 'rapid-developing'), and the like, is identified, the dependency syntax path similarity is obtained, the semantic role labeling result and the dependency syntax path similarity are subjected to joint matching, a subject-predicate-object triplet formed by the entity and the behavior relationship is extracted, each subject-predicate-object triplet is composed of one normalized entity serving as a master, one semantic behavior word serving as a predicate and the other normalized entity serving as an object, and the semantic relationship among the entities in the history literature is comprehensively characterized.
S2.2, matching and associating the same five types of knowledge entities in different versions by utilizing a subject-predicate-object triplet, outputting an aligned entity relation diagram, extracting multidimensional feature vectors of the five types of knowledge entities, carrying out standardization processing, and outputting a standardized feature vector set.
It should be noted that, the subject-predicate-object triples in each version are grouped according to five kinds of knowledge entities, the expression mode and semantic context characteristics of each kind of entity in different historical document versions are extracted, one-to-one comparison is performed on the five kinds of similar knowledge entities in different historical document versions, and the matching judgment is performed on the five kinds of similar knowledge entities in different historical document versions one by one based on canonical expression, semantic context characteristics, subject-predicate-object triples and dependency syntax path similarity, when the matching condition is satisfied, the matching condition is identified as a semantic equivalent entity, and a corresponding relation is established between the versions, so as to construct an aligned entity relation graph consisting of entity nodes and matching edges.
And extracting multi-dimensional semantic, syntactic, structural and contextual characteristics of each five-class knowledge entity based on semantic context, word frequency statistics, word vector representation, syntactic dependency structure, roles in a subject-predicate-object triplet and other information in different versions, constructing multi-dimensional characteristic vectors, and carrying out unified standardization processing on all the multi-dimensional characteristic vectors of the five-class knowledge entity for improving the consistency of subsequent calculation, wherein the unified standardization processing comprises the steps of value domain normalization, distribution transformation, characteristic compression and the like, so that a standardized characteristic vector set with consistent format and unified dimension is finally obtained.
It should also be noted that the matching condition refers to normalized expressions that are consistent or highly similar and have similar context semantics, identical or similar predicate relationships, and dependency syntax path structural similarity.
S2.3, carrying out entity characteristic weighted average aggregation on the standardized characteristic vector set according to the literature version, constructing a publication level characteristic matrix, establishing a mapping relation between row vectors in the version level characteristic matrix and corresponding version identifiers, and outputting the version characteristic set.
It should be noted that, grouping the five types of knowledge entities in the standardized feature vector set according to the document version, ensuring that each group corresponds to all five types of knowledge entities in one historical document version, weighting the standardized feature vectors of all five types of knowledge entities in each group, and distributing weights according to the occurrence frequency, semantic role importance or structural position of the five types of knowledge entities in the version document, for example, the entity with higher occurrence frequency or acting as a subject role can be given higher weight, and weighting average calculation is performed on the standardized feature vectors of all five types of knowledge entities in the version, so as to obtain the version-level feature vector representing the semantic features of the whole version.
The version level feature vectors of each historical document version are sequentially arranged to form a version level feature matrix, each row of version level feature vectors is bound with the corresponding version identifier, each row in the version level feature matrix has definite version directivity, and finally a version feature set is output.
And S3, inputting the version feature set into a multi-node cloud cooperative processing architecture, calculating local similarity scores among versions, and outputting a version similarity matrix.
It should be noted that the "multi-node cloud cooperative processing architecture" is a prior art in the distributed computing field, and its general components include 1 master control node (responsible for task scheduling and result aggregation) and multiple cooperative nodes (executing parallel computing), implementing data interaction between nodes through distributed communication middleware (such as gRPC/MPI), and dynamically allocating computing resources by adopting a resource manager (such as yacn/Kubernetes).
S3.1, inputting the version feature set into a multi-node cloud cooperative processing architecture, receiving the version feature set by utilizing a main control node in the multi-node cloud cooperative processing architecture, uniformly distributing the version feature set to a cooperative node cluster according to a version identifier, respectively calculating Euclidean distances of version feature vectors of each historical document, generating a local similarity score through nonlinear conversion, and adding a confidence weight label to generate a local similarity matrix.
It should be noted that, the master node receives the complete version feature set, uniformly divides each version level feature vector in the version feature set according to the version identifier, and uniformly distributes the divided version level feature vectors to the cooperative node cluster.
After the cooperative nodes respectively receive version-level feature vectors of different historical document versions, euclidean distance between the version feature vectors is calculated, and feature difference degrees between versions are quantized, wherein the expression is as follows:
;
In the formula, Is the firstReference version feature vectorEuclidean distances between the individual target version feature vectors,Is the firstThe first reference version feature vectorThe feature values of the individual version level feature vectors dimensions,Is the firstThe first target version feature vectorThe feature values of the individual version level feature vectors dimensions,Is an index variable of the version-level feature vector dimension,Is the number of dimensions of the version-level feature vector dimensions,Is an index variable referencing the version feature vector,Is the index variable of the target version feature vector.
After the calculation is completed, the cooperative node converts the euclidean distance into a local similarity score by using a nonlinear conversion function, for example, a conversion method based on exponential decay or a gaussian kernel function, and the expression is as follows:
;
In the formula, Is the firstReference version feature vectorLocal similarity scores between the individual target version feature vectors,And a tuning parameter for nonlinear conversion for controlling the rate of similarity decay.
Based on the quality index and the data integrity condition of the version-level feature vector, adding a corresponding confidence weight label for each local similarity score so as to reflect the reliability of the similarity score, and finally outputting a local similarity matrix containing the local similarity score and the confidence weight label by each cooperative node.
It should also be noted that the confidence weight is obtained by comprehensively calculating indexes such as the integrity of the evaluation version feature vector, the data quality, the text recognition accuracy and the like, reflects the reliability degree of the local similarity score, and is used for enhancing the credibility labeling of each element in the similarity matrix.
The quality index of the version level feature vector is obtained through integrity check of the content of the version level feature vector, and mainly evaluates whether the feature vector completely contains dimension information of five types of knowledge entities (characters, events, places, institutions and time), and simultaneously examines the distribution rationality of each dimension feature and the corresponding relation between each dimension feature and the principle semantics.
The data integrity of the version level feature vector is determined by an integrity check of the original document data. The method mainly evaluates the integrity degree of document metadata fields, the accuracy degree of text recognition and the coverage degree of five types of knowledge entity matching among different versions. These checks verify the integrity and reliability of the data source from different perspectives.
S3.2, aggregating all local similarity matrixes according to confidence coefficient weights through distributed reduction operation, generating a global version similarity matrix, reversely mapping the version identifier to a global version similarity matrix row-column index, and outputting the version similarity matrix.
It should be noted that, summarizing the local similarity matrixes output by the plurality of cooperative nodes according to the corresponding version identifiers, performing confidence weighted average on the local similarity scores between the same version pairs, and determining weights according to confidence labels of the local similarity scores, so as to ensure that the similarity scores with high confidence occupy larger influence in aggregation.
And aggregating the local similarity matrix data on all the cooperative nodes into a unified global version similarity matrix through distributed reduction operation, wherein each element in the global version similarity matrix represents a global similarity score between corresponding two historical document versions, accurately mapping the version identifier to a row-column index position of the global version similarity matrix according to the mapping relation between the version identifier and the matrix index, ensuring that the version identifiers corresponding to the matrix rows and columns are unique and correct, and finally outputting the version similarity matrix containing the global similarity score and version identifier mapping information between all the historical document versions.
S4, performing consistency evaluation on the version similarity matrix by using a distributed conflict detection mechanism, marking entity mapping pairs with confidence degrees lower than a confidence degree threshold value, and generating a conflict transaction log.
S4.1, dividing the version similarity matrix into a plurality of entity mapping pairs by using the main control node, and transmitting the entity mapping pairs to each cooperative node.
It should be noted that, the master control node receives the complete version similarity matrix, analyzes the version pair relationship corresponding to each element in the version similarity matrix into entity mapping pairs, namely five types of knowledge entity matching pairs corresponding to two historical document versions, and reasonably divides all entity mapping pairs according to the task allocation policy, so that the divided entity mapping pair sets are ensured to be uniformly distributed and are convenient for collaborative processing.
After the division is completed, the main control node respectively transmits the entity mapping pairs of different batches to a plurality of cooperative nodes according to the division result, so that the entity mapping pairs received by each cooperative node can support the subsequent version consistency check sum semantic analysis work.
It should be further noted that, the task allocation policy is an intelligent scheduling mechanism that the master node dynamically selects an optimal allocation scheme (uniform allocation or load balancing) according to a real-time load state (CPU/memory/network) and task characteristics (computational complexity/data volume/dependency relationship) of the cooperative node.
And S4.2, after receiving the entity mapping pairs, each cooperative node performs version consistency verification based on an adjacent version entity consistency policy, performs semantic consistency analysis by combining entity context semantic information, and identifies semantic conflict candidate entity pairs to form a structural conflict candidate set and a semantic conflict candidate set.
It should be noted that, for the received entity mapping pair, each cooperative node adopts an adjacent version entity consistency policy, compares adjacent relations and attribute consistency of five types of knowledge entities in different historical document versions, verifies structural consistency of the five types of knowledge entities between the versions, combines context semantic information extracted from the version structured document set by the five types of knowledge entities, performs semantic consistency analysis, evaluates similarity and potential contradiction of entity semantic content, and identifies semantic conflict candidate entity pairs.
And comparing whether direct association entities (such as main-client relations in triples) of the five types of knowledge entities in different versions are kept consistent through structural consistency check, checking whether records of attributes (such as character roles, event time and the like) of the five types of knowledge entities in different versions are the same, and classifying semantic conflict candidate entity pairs into a structural conflict candidate set when the fact that the association relationship of the five types of knowledge entities between the versions is broken (such as a certain version lacks key predicate connection) or core attributes are inconsistent (such as the records of participants of the same event conflict) is found.
And (3) comparing the context descriptions (such as co-occurrence vocabulary, modifier and the like) of the five types of knowledge entities in different versions of texts by utilizing semantic consistency analysis, identifying the logic contradiction of semantic expression (such as that an entity A is the upper level of an entity B in one version and becomes a level relation in the other version), and when the semantic roles of the five types of knowledge entities in the different versions have irreconcilable conflict, incorporating a semantic conflict candidate set.
It should also be noted that potential conflicts include attribute conflicts (e.g., inconsistent documentations of persona roles in different versions), relational conflicts (e.g., opposite descriptions of event causality in different versions), and timing conflicts (e.g., time documented differences of the same event in different versions).
S4.3, performing intersection processing on the structural conflict candidate set and the semantic conflict candidate set, acquiring an intersection entity mapping pair, extracting corresponding confidence level, judging, and marking the confidence level as a conflict entity mapping pair when the confidence level is lower than a confidence level threshold value.
It should be noted that, all entity mapping pairs in the structure conflict candidate set and the semantic conflict candidate set are uniformly encoded, the encoding rule is "entity type+version id+entity ID" (e.g. "character_v1_e005"), a hash table is constructed based on the encoding index, the completely matched entity mapping pair in the two sets is rapidly positioned, the accuracy of the intersection result is ensured through a double verification mechanism, namely, both strict matching of the entity ID and complementation of the conflict type are required (e.g. "time attribute inconsistency" in the structure conflict is required to correspond to "time description contradiction" in the semantic conflict), and finally the intersection entity mapping pair list meeting both the structure conflict and the semantic conflict conditions is output.
And extracting local similarity scores and corresponding confidence weights from the version similarity matrix for each intersection entity mapping pair, combining the performances of the intersection entity mapping pairs in a conflict detection stage (such as 1-3 levels of structural conflict intensity and 0.1-1.0 level of semantic conflict severity score), dynamically adjusting and outputting the confidence degrees through a weighted formula, taking the confidence degrees as a judgment basis, comparing the confidence degrees, and marking the intersection entity mapping pairs as conflict entity mapping pairs if the confidence degrees are lower than a confidence coefficient threshold value.
It should also be noted that, by analyzing the integrity of the version feature vector, the accuracy of text recognition, the stability of similarity calculation, and other indexes, a confidence threshold is set in combination with historical data experience and actual task requirements.
S4.4, utilizing the main control node to perform deduplication merging and mapping relation integration on the conflict entity mapping, and generating a conflict transaction log.
It should be noted that, the main control node is utilized to receive the conflict entity mappings output by the plurality of cooperative nodes, the conflict entity mappings are subjected to de-duplication operation, repeated entity mapping pairs are combined, redundant information is eliminated, and then mapping relation integration is performed on the entity mapping pairs after de-duplication according to the corresponding version identifiers and entity identifiers, so that the complete corresponding relation among versions of all the conflict entity mapping pairs is ensured, and no conflict duplication exists.
And in the integration process, the attribute information and the confidence value in the mapping relation are uniformly updated, the consistency and the accuracy of the mapping relation are ensured, and finally, the conflict entity mapping set with the complete mapping relation and the deduplication and merging are organized to generate a structured conflict transaction log.
S5, inputting the conflict transaction log into a five-level contradiction layering processing protocol, outputting an arbitration result, converting the arbitration result into a corresponding OWL body update statement, attaching a version tracing annotation, and outputting a history document update instruction.
It should be noted that the five-level contradiction hierarchical processing protocol is a conflict resolution mechanism in the construction of the knowledge graph of the historical literature, and the core composition comprises five levels, namely a grammar layer (L1) for processing surface problems such as character coding, misspelling and the like, a structure layer (L2) for repairing triplet breaks or attribute contradictions, a version layer (L3) for calibrating time or content deviation caused by literature revision, a semantic layer (L4) for judging context logic opposition (such as inconsistency of expression or narrative angles), and a value layer (L5) for final examination of core disputes related to the sensitivity of historical evaluation.
The technical implementation of five-level contradiction hierarchical processing protocol:
Conflict detection, namely automatically classifying the entity mapping pairs into five-level hierarchy based on version identifiers, confidence and evolution characteristics (such as upper and lower Wen Yuyi vectors).
Hierarchical arbitration-L1-L3 is automatically processed through an algorithm (such as Neo4j graph alignment), L4 adopts a mixed score of 'semantic similarity + historical attribute weight', and L5 is automatically arbitrated through a historical value decision engine (based on an authoritative historic rule base).
Closed loop feedback-the conflict level annotation (e.g. "L4 semantic conflict confidence 0.72") is appended when the result of the resolution is converted to an OWL statement, and passed through the test case generator (e.g. timing consistency verification).
Five-level contradictory hierarchical processing protocol functions:
Accurate disambiguation, namely solving full lineage contradiction from a grammar layer (L1) to a value layer (L5) through five-level layering, and ensuring the historical compliance and academic rigor of a ontology.
And dynamically tracing, namely carrying version identifiers and processing node information by all the arbitration results, supporting entity-conflict-correction full-link tracking, and providing an audit baseline for subsequent ontology expansion. The innovation is characterized in that the history evaluation sensitivity assessment specific to the history document is incorporated into a standardized processing flow.
S5.1, resolving the conflict transaction log into a conflict evolution map, and mapping and inputting a five-level conflict hierarchical processing protocol according to five-level conflict types of grammar, structure, version, semantics and value.
It should be noted that, each conflict entity mapping pair in the conflict transaction log is taken as a basic unit, item-by-item analysis is performed according to the version identifier, the entity content, the conflict type and the confidence information, relevant contents including a conflict source, a conflict relation, the conflict entity pair and an evolution path are extracted, and an evolution relation chain of the conflict entity in a plurality of historical document versions is constructed.
And taking each group of conflict entity pairs with historical evolution characteristics as conflict evolution nodes, and connecting version evolution information and structure change, semantic change or context transfer relations on a time axis to form a conflict evolution map containing structure transition and semantic evolution processes among the entities.
And performing type mapping on the conflict entity mapping pairs based on five dimensions of grammar difference, structure change, version divergence, semantic conflict and historical value according to information marked in the conflict transaction log, and dividing the corresponding conflict evolution map into corresponding conflict type levels in a five-level conflict hierarchical processing protocol.
It should also be noted that grammar differences identify superficial language contradictions such as spelling, order of language, etc. by comparing morphology (e.g., complex/simplified conversion) and syntax structure (e.g., passive/active language differences) of five types of knowledge entities in different versions.
And analyzing role changes (such as subject changing object) and attribute field increases and decreases (such as differences among versions of character roles) of five types of knowledge entities in cross-version triples, and detecting the structural inconsistency of the knowledge graph.
Version disambiguation-locating five types of knowledge entity expression differences due to version changes (e.g., 1951 version delete 1935 version specific event description) according to the time axis and revision background of the literature version.
Semantic conflict, namely judging the deep semantic contradiction by comparing emotion tendencies (such as recognition and detraction modifier) and behavior logics (such as ' supporting ' vs ' objection) of the contexts of the five types of knowledge entities.
Historical value-based on historic evaluation system (such as historic authority and social influence), evaluating the damage degree of conflicting entities to the core value of the document.
S5.2, under a five-level contradiction hierarchical processing protocol, based on the entity context Wen Yuyi track and the semantic stability index, performing conflict resolution through a context weighted arbiter, and outputting a resolution result list.
It should be noted that, under a five-level contradiction hierarchical processing protocol, for an input conflict evolution map, firstly extracting the up-down Wen Yuyi tracks of conflict entity mapping pairs in historical document versions, including semantic change paths, context collocation words and syntax dependency features of five types of knowledge entities in different historical document versions, and combining version evolution sequences and semantic transition trends of the five types of knowledge entities in a time dimension to form a complete semantic track expression, and meanwhile, calculating a semantic stability index of each conflict entity mapping pair based on confidence information and semantic change amplitude attached in the conflict transaction log, wherein the semantic stability index is used for measuring the consistency degree of semantic pointing of the conflict entities in different historical document versions, and the expression is as follows:
;
In the formula, Representing conflicting entity mapping pairsIs used for the semantic stability index of (1),Representing conflicting entity mapping pairs at the firstUpper and lower Wen Yuyi vectors extracted from the historical document versions,Representing conflicting entity mapping pairs at the firstThe upper and lower Wen Yuyi vectors extracted from the historical literature version,Index variables representing the version of the history document,Representing the total number of historical literature versions.
After the track of the entity context Wen Yuyi and the semantic stability index are obtained, the entity context Wen Yuyi is arbitrated through a context weighted arbiter, the context weighted arbiter performs weighted scoring on conflict entity mapping pairs according to the conflict entity mapping pairs in different contexts, such as the frequency, version semantic continuity, semantic stability index, semantic evolution direction and other dimensions, and judges the retention priority and version direction of the conflict entity mapping pairs in the historical literature knowledge ontology according to the scoring result, and finally, an arbitrating result list containing all conflict entity mapping pairs arbitrating results is output.
S5.3, converting different arbitration types into corresponding OWL body update sentences according to the arbitration result list.
It should be noted that, classifying each conflict entity mapping pair in the arbitration result list, according to the arbitration type output by the five-level conflict layering processing protocol, the specific types corresponding to the entity update operation include five basic entity operations of adding entity, deleting entity, updating attribute, adding relationship and deleting relationship.
For the result of the arbitration determined as the newly added entity, a newly added description sentence semantically provided with an explicit category label and a unique identification is constructed.
For the arbitration result of the entity to be deleted, an invalidation statement with a flag state is generated to indicate that the entity is no longer used.
And generating corresponding updated expression content according to the type of the attribute for the judging result of the attribute to be modified, and labeling new attribute values and format description.
For the result of the arbitration of the new relationship, a structural description representing the relationship between the entities is generated according to the behavior or logic relationship between the two entities involved.
For the result of the arbitration of the relation to be deleted, an explanatory sentence reflecting that the original relation is canceled is constructed.
After all the conversion operations are completed, each conversion result is expressed as an ontology update statement with complete semantic structure and grammar conforming to the OWL 2 specification.
And S5.4, adding processing node information, version identifiers and conflict level notes to the OWL body update statement, and integrating to form a historical document update instruction.
It should be noted that, each OWL ontology update statement generated in the previous refinement step is taken as basic content, and three-dimensional auxiliary annotation information is added, wherein the processing node information is used for marking a cooperative node identifier for specifically executing the ontology update operation, the version identifier is used for recording a historical document version number corresponding to the current ontology update statement, and the conflict level annotation is used for marking a conflict level type of a conflict entity mapping pair related to the current update statement in a five-level contradiction hierarchical processing protocol.
Based on all OWL body update sentences of the added information, merging and integrating each update sentence according to entity structure consistency and time sequence continuity to generate a historical document update instruction with complete attribute annotation and traceable version information.
It should also be noted that, according to the coordinated node allocation table recorded during task allocation, the master control node automatically extracts the node identifier for executing the update operation in the OWL statement generation stage, and binds with the operation timestamp to obtain the processing node information.
S6, dynamically expanding an entity relation network of the historical document ontology according to the historical document updating instruction, and outputting the historical document version ontology through a three-layer intelligent analysis framework.
It should be noted that the three-layer intelligent analysis framework is composed of an entity-level semantic check layer, a version-level correlation reasoning layer and an overall structure optimization layer.
And an entity-level semantic verification layer for realizing attribute consistency check by expanding a history document special rule base (such as character job logic constraint) based on the existing ontology verification tool (such as a Pellet inference engine).
And the version-level association reasoning layer is used for mining the evolution rules of the five types of knowledge entities in different versions by utilizing a time sequence knowledge graph technology and combining cross-version semantic track analysis.
And the overall structure optimization layer adopts a complex network optimization algorithm (such as community detection) to adjust the topological structure, and newly-increased historical literature specificity indexes guide optimization.
S6.1, analyzing the historical document updating instruction into an incremental updating event of the entity and the relation.
It should be noted that, extracting entity description structure, attribute update content or relationship expression content between entities contained in each OWL ontology update statement from the historical document update instruction, identifying the triple elements of subjects, predicates and objects in the statement, distinguishing the update operation types, including adding, modifying or deleting entities, updating attribute values or the association relationship between newly added entities, determining the version time sequence position and semantic conflict strength related to the update operation according to the processing node information, version identifier and conflict level annotation added in the update statement, for guiding the priority determination of the subsequent increment update event, aggregating a plurality of update statements analyzed in the same historical document update instruction into a structural unit according to entity identifiers, integrating a plurality of attribute update or relationship update operations belonging to the same subject entity, and generating a entity and relationship increment update event set with clear structure and clear semantics.
And S6.2, prioritizing the increment update event according to the conflict level annotation and the version identifier to generate a pending update event queue.
It should be noted that the conflict level annotation and version identifier information attached to each incremental update event is read. And simultaneously, the incremental update events at the same conflict level are sequentially ordered according to the version time sequence by combining the version identifier, thereby ensuring that the time sequence of version update is honored, and the ordered incremental update events are sequentially arranged to form an update event queue to be processed.
And S6.3, according to the to-be-processed update event queue, performing entity and relationship insertion, deletion and attribute update operations on the entity relationship network of the historical document knowledge body through each cooperative node to obtain an updated entity relationship network.
It should be noted that, entity and relation change information in the incremental update event is read one by one, each cooperative node executes corresponding operation on the entity relation network of the historical document knowledge body according to the operation type described in the incremental update event, wherein the entity or relation is inserted into the entity relation network, the entity or relation is added into the entity relation network, the entity or relation to be deleted is deleted, the corresponding node or side is removed from the entity relation network, and the attribute update content is replaced or supplemented according to the attribute type and the new attribute value, so that the accuracy and the integrity of the attribute information are ensured.
Each operation is strictly executed according to the sequence in the update event queue, the order and version consistency of the update process are ensured, and after the sequential processing of all increment update events, an update state entity relation network containing the latest entity, relation and attribute information is generated.
And S6.4, performing consistency verification on the updated entity relation network by using an entity-level semantic verification layer, performing deep semantic reasoning and version difference analysis by using a version-level association reasoning layer, performing global structure optimization by using an overall structure optimization layer, and outputting a historical document version knowledge body.
It should be noted that, the entity-level semantic verification layer checks each entity and the corresponding attribute in the updated entity relationship network one by one, verifies whether the attribute value of the entity accords with the predefined semantic standard through semantic rules and constraint conditions, identifies contradiction or unreasonable phenomena among the structured field contents of the five types of knowledge entities, eliminates repeated or conflicting entity information, ensures that the semantic expression of the five types of knowledge entities is complete and accurate, and thus improves the overall consistency and reliability of the knowledge expression.
The version level association reasoning layer is used for combining the evolution of entity relations of all versions in the historical document, reasoning and identifying semantic evolution paths among mapping entities, such as attribute value replacement, relation addition or deletion and the like, by constructing a semantic association chain among the versions and based on rules of semantic hierarchy, attribute value change, relation structure adjustment and the like, judging whether logic conflict or meaning deviation exists in the changes according to semantic constraint, identifying semantic difference and potential conflict among the versions, mining knowledge increment or correction content caused by version update, and supporting deep understanding and semantic consistency guarantee of historical document versions.
The overall structure optimization layer analyzes the structural characteristics of the entity relation network based on the overall view angle, adopts a network topology optimization method to adjust the connection relation among the entities, eliminates the redundant relation, balances the connection degree of the entity nodes, optimizes the hierarchical structure and path length of the knowledge body, improves the query efficiency and the reasoning performance, and finally generates the historical document version knowledge body with reasonable structure and excellent performance.
It should also be noted that by analyzing the existing entity types, attribute structures and upper and lower semantic relationships in the history document, the semantic standards are formed in a unified way by combining the field definition rules, attribute value ranges and constraint conditions among the entities in the version body.
The embodiment also provides a dynamic collaborative construction system for the historical document version knowledge ontology, which comprises the following steps: the document collection module is used for collecting a digital version of the multi-source historical document, preprocessing the digital version, and generating a version structured document set;
The entity extraction module is used for extracting knowledge entity characteristics in each version document from the version structured document set to construct a version characteristic set;
the similarity calculation module is used for inputting the version feature set into the multi-node cloud cooperative processing architecture, calculating local similarity scores among versions and outputting a version similarity matrix;
the conflict detection module is used for carrying out consistency evaluation on the version similarity matrix by utilizing a distributed conflict detection mechanism, marking entity mapping pairs with confidence coefficient lower than a confidence coefficient threshold value and generating a conflict transaction log;
The conflict processing module is used for inputting the conflict transaction log into a five-level conflict layering processing protocol, outputting an arbitration result, converting the arbitration result into a corresponding OWL body update statement, attaching a version tracing annotation and outputting a history document update instruction;
And the ontology construction module is used for dynamically expanding the entity relation network of the historical document ontology according to the historical document updating instruction and outputting the historical document version ontology through the three-layer intelligent analysis framework.
The embodiment also provides computer equipment, which is suitable for the situation of the dynamic collaborative construction method of the historical document version ontology, and comprises a memory and a processor, wherein the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the dynamic collaborative construction method of the historical document version ontology.
The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
The present embodiment also provides a storage medium having a computer program stored thereon, which when executed by a processor implements the method for implementing the dynamic collaborative construction of the historical document version ontology as proposed in the above embodiment, where the storage medium may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as a static random access Memory (Static Random Access Memory, SRAM for short), an electrically erasable Programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), an erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM for short), a Programmable Read-Only Memory (PROM for short), a Read-Only Memory (ROM for short), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In conclusion, the method and the system realize high-efficiency and accurate conflict identification of the multi-version entity of the historical document by a multi-node distributed conflict detection mechanism, combine a five-level contradiction layering processing protocol and a context weighted arbiter, realize intelligent arbitration and standardized dynamic update of complex conflicts, ensure update transparency by additional version tracing, and simultaneously complete dynamic collaborative construction and deep semantic reasoning of the knowledge body of the historical document by means of unified preprocessing, multi-dimensional semantic feature extraction and weighted aggregation, multi-version similarity parallel calculation and three-layer intelligent analysis frameworks, effectively improve accuracy, consistency and maintainability of a knowledge system and remarkably enhance automation and reliability of knowledge management of the multi-version document.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims (7)

1.一种历史文献版本知识本体动态协同构建方法,其特征在于:包括,1. A method for dynamic collaborative construction of a historical document version knowledge ontology, characterized in that it includes: 采集多源历史文献数字化版本,并进行预处理,生成版本结构化文献集;Collect digitized versions of historical documents from multiple sources, preprocess them, and generate a structured collection of documents based on the versions. 从版本结构化文献集中,提取各版本文献中的知识实体特征,构建版本特征集;From the version structured document set, extract the knowledge entity features of each version of the document to construct a version feature set; 将版本特征集输入多节点云端协同处理架构中,计算版本间局部相似度分值,输出版本相似矩阵,具体步骤如下,The version feature set is input into a multi-node cloud collaborative processing architecture to calculate the local similarity scores between versions and output a version similarity matrix. The specific steps are as follows. 将版本特征集输入多节点云端协同处理架构中,利用多节点云端协同处理架构中的主控节点接收版本特征集,按版本标识符均匀分发至协同节点集群,分别计算各历史文献版本特征向量的欧氏距离,并通过非线性转换生成局部相似度分值,附加置信度权重标注,生成局部相似度矩阵;The version feature set is input into the multi-node cloud collaborative processing architecture. The master node in the multi-node cloud collaborative processing architecture receives the version feature set, distributes it evenly to the collaborative node cluster according to the version identifier, calculates the Euclidean distance of the feature vector of each historical document version, and generates local similarity scores through nonlinear transformation. Confidence weight labels are added to generate a local similarity matrix. 通过分布式归约操作按置信度权重聚合所有局部相似度矩阵,生成全局版本相似度矩阵,并将版本标识符反向映射至全局版本相似度矩阵行列索引,输出版本相似矩阵;All local similarity matrices are aggregated by confidence weight through distributed reduction operation to generate a global version similarity matrix. The version identifier is then mapped inversely to the row and column indices of the global version similarity matrix to output the version similarity matrix. 利用分布式冲突检测机制,对版本相似矩阵进行一致性评估,标记置信度低于置信度阈值的实体映射对,生成冲突事务日志,具体步骤如下,A distributed conflict detection mechanism is used to perform consistency evaluation on the version similarity matrix, mark entity mapping pairs with confidence scores below a confidence threshold, and generate conflict transaction logs. The specific steps are as follows. 利用主控节点,将版本相似矩阵划分为多个实体映射对,并下发至各协同节点;Using the master node, the version similarity matrix is divided into multiple entity mapping pairs and distributed to each collaborating node; 各协同节点在接收到实体映射对后,基于邻接版本实体一致性策略进行版本一致性校验,结合实体上下文语义信息进行语义一致性分析,识别出语义冲突候选实体对,形成结构冲突候选集与语义冲突候选集;After receiving the entity mapping pair, each collaborative node performs version consistency verification based on the adjacent version entity consistency strategy, performs semantic consistency analysis in combination with entity context semantic information, identifies semantic conflict candidate entity pairs, and forms a structural conflict candidate set and a semantic conflict candidate set. 对结构冲突候选集与语义冲突候选集进行交集处理,获取交集实体映射对,并提取出对应的置信度进行判断,当置信度低于置信度阈值则标记为冲突实体映射对,并进行去重归并与映射关系整合,生成冲突事务日志;The intersection of the structural conflict candidate set and the semantic conflict candidate set is processed to obtain the intersection entity mapping pairs. The corresponding confidence scores are extracted for judgment. When the confidence score is lower than the confidence score threshold, it is marked as a conflict entity mapping pair. Deduplication and integration with the mapping relationship are performed to generate a conflict transaction log. 将冲突事务日志输入五级矛盾分层处理协议,输出仲裁结果转换为对应的OWL本体更新语句,并附加版本溯源注解,输出历史文献更新指令;Input the conflict log into the five-level conflict hierarchical processing protocol, output the arbitration result into the corresponding OWL ontology update statement, attach version tracing annotations, and output historical document update instructions; 根据历史文献更新指令,动态扩展历史文献知识本体的实体关系网络,并通过三层智能分析框架,输出历史文献版本知识本体,具体步骤如下,Based on historical document update instructions, the entity relationship network of the historical document knowledge ontology is dynamically expanded, and a three-layer intelligent analysis framework is used to output the historical document version knowledge ontology. The specific steps are as follows. 将历史文献更新指令解析为实体与关系的增量更新事件;The historical document update instructions are parsed into incremental update events for entities and relationships; 按照冲突等级注解和版本标识符对增量更新事件进行优先级排序,生成待处理更新事件队列;Incremental update events are prioritized according to conflict level annotations and version identifiers to generate a queue of pending update events. 根据待处理更新事件队列,通过各协同节点对历史文献知识本体的实体关系网络进行实体和关系的插入、删除及属性更新操作,得到更新态实体关系网络;Based on the queue of pending update events, the entity relationship network of the historical document knowledge ontology is updated by inserting, deleting and updating the attributes of entities and relationships through each collaborative node. 利用实体级语义校验层对更新态实体关系网络进行一致性验证,通过版本级关联推理层进行深层语义推理和版本差异分析,采用整体结构优化层进行全局结构优化,输出历史文献版本知识本体。The system utilizes an entity-level semantic verification layer to perform consistency verification on the updated entity relationship network, a version-level association reasoning layer to perform deep semantic reasoning and version difference analysis, and an overall structure optimization layer to perform global structure optimization, outputting a historical document version knowledge ontology. 2.如权利要求1所述的历史文献版本知识本体动态协同构建方法,其特征在于:所述进行预处理,生成版本结构化文献集,具体步骤如下,2. The method for dynamic collaborative construction of historical document version knowledge ontology as described in claim 1, characterized in that: the preprocessing to generate a version-structured document set comprises the following specific steps: 对多源历史文献数字化版本进行统一格式转换,得到标准化文献输入数据集,并进行字符识别操作,获取初始识别结果;A standardized document input dataset is obtained by converting the digitized versions of historical documents from multiple sources into a unified format, and then performing character recognition operations to obtain initial recognition results. 对初始识别结果进行文本纠错与版面结构恢复处理,得到文本信息集,并进行元数据信息抽取与标注,获得版本结构化文献集。The initial identification results are processed for text correction and layout restoration to obtain a text information set. Metadata information is then extracted and annotated to obtain a version structured document set. 3.如权利要求1所述的历史文献版本知识本体动态协同构建方法,其特征在于:所述构建版本特征集,具体步骤如下,3. The method for dynamic collaborative construction of historical document version knowledge ontology as described in claim 1, characterized in that: the specific steps for constructing the version feature set are as follows: 对版本结构化文献集中的每篇历史文献进行命名实体识别,提取并规范化五类知识实体,形成规范化实体集合,并结合语义角色标注和依存句法分析,构建主体-谓词-客体三元组;Named entity recognition is performed on each historical document in the version structured document set. Five types of knowledge entities are extracted and normalized to form a normalized entity set. Combined with semantic role labeling and dependency parsing, subject-predicate-object triplet is constructed. 利用主体-谓词-客体三元组对不同版本中的同一五类知识实体进行匹配关联,输出对齐实体关系图,并提取五类知识实体的多维特征向量,并进行标准化处理,输出标准化特征向量集;The subject-predicate-object triples are used to match and associate the same five types of knowledge entities in different versions, outputting an aligned entity relationship graph. Multidimensional feature vectors of the five types of knowledge entities are extracted, and standardized processing is performed to output a standardized feature vector set. 对标准化特征向量集按文献版本进行实体特征加权平均聚合,构建出版本级特征矩阵,并将版本级特征矩阵中的行向量与对应版本标识符建立映射关系,输出版本特征集。The standardized feature vector set is aggregated by entity feature weighting average according to document version to construct a version-level feature matrix. The row vectors in the version-level feature matrix are mapped to the corresponding version identifiers, and the version feature set is output. 4.如权利要求1所述的历史文献版本知识本体动态协同构建方法,其特征在于:所述输出历史文献更新指令,具体步骤如下,4. The method for dynamic collaborative construction of historical document version knowledge ontology as described in claim 1, characterized in that: the specific steps for outputting historical document update instructions are as follows: 将冲突事务日志解析为冲突演化图谱,按语法、结构、版本、语义和价值五级冲突类型映射输入五级矛盾分层处理协议;The conflict transaction log is parsed into a conflict evolution graph, and the input of the five-level contradiction layering processing protocol is mapped according to the five-level conflict types of syntax, structure, version, semantics and value. 在五级矛盾分层处理协议下,基于实体上下文语义轨迹与语义稳定性指标,通过上下文加权仲裁器进行冲突裁决,输出裁决结果清单;Under the five-level conflict resolution protocol, conflict adjudication is carried out through a context-weighted arbitrator based on entity context semantic trajectory and semantic stability index, and a list of adjudication results is output. 根据裁决结果清单,将不同裁决类型转换为对应的OWL本体更新语句;Based on the list of rulings, convert different ruling types into corresponding OWL ontology update statements; 对OWL本体更新语句附加处理节点信息、版本标识符及冲突等级注解,并整合形成历史文献更新指令。Add processing node information, version identifiers, and conflict level annotations to the OWL ontology update statements, and integrate them to form historical document update instructions. 5.一种历史文献版本知识本体动态协同构建系统,基于权利要求1~4任一所述的历史文献版本知识本体动态协同构建方法,其特征在于:包括,5. A dynamic collaborative construction system for historical document version knowledge ontology, based on the dynamic collaborative construction method for historical document version knowledge ontology as described in any one of claims 1 to 4, characterized in that it includes: 文献采集模块,用于采集多源历史文献数字化版本,并进行预处理,生成版本结构化文献集;The document acquisition module is used to collect digital versions of historical documents from multiple sources, perform preprocessing, and generate a structured document collection. 实体提取模块,用于从版本结构化文献集中,提取各版本文献中的知识实体特征,构建版本特征集;The entity extraction module is used to extract knowledge entity features from each version of the document in the version structured document collection and construct a version feature set. 相似计算模块,用于将版本特征集输入多节点云端协同处理架构中,计算版本间局部相似度分值,输出版本相似矩阵;The similarity calculation module is used to input the version feature set into the multi-node cloud collaborative processing architecture, calculate the local similarity score between versions, and output the version similarity matrix. 冲突检测模块,用于利用分布式冲突检测机制,对版本相似矩阵进行一致性评估,标记置信度低于置信度阈值的实体映射对,生成冲突事务日志;The conflict detection module is used to perform consistency evaluation on the version similarity matrix using a distributed conflict detection mechanism, mark entity mapping pairs with confidence scores below the confidence threshold, and generate conflict transaction logs. 矛盾处理模块,用于将冲突事务日志输入五级矛盾分层处理协议,输出仲裁结果转换为对应的OWL本体更新语句,并附加版本溯源注解,输出历史文献更新指令;The conflict resolution module is used to input the conflict transaction log into the five-level conflict hierarchical resolution protocol, output the arbitration result into the corresponding OWL ontology update statement, attach version traceability annotations, and output historical document update instructions. 本体构建模块,用于根据历史文献更新指令,动态扩展历史文献知识本体的实体关系网络,并通过三层智能分析框架,输出历史文献版本知识本体。The ontology construction module is used to dynamically expand the entity relationship network of the historical document knowledge ontology according to the historical document update instructions, and output the historical document version knowledge ontology through a three-layer intelligent analysis framework. 6.一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于:所述处理器执行所述计算机程序时实现权利要求1~4任一所述的历史文献版本知识本体动态协同构建方法的步骤。6. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, characterized in that: when the processor executes the computer program, it implements the steps of the dynamic collaborative construction method for historical document version knowledge ontology as described in any one of claims 1 to 4. 7.一种计算机可读存储介质,其上存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现权利要求1~4任一所述的历史文献版本知识本体动态协同构建方法的步骤。7. A computer-readable storage medium storing a computer program thereon, characterized in that: when the computer program is executed by a processor, it implements the steps of the dynamic collaborative construction method for historical document version knowledge ontology as described in any one of claims 1 to 4.
CN202511398957.0A 2025-09-28 2025-09-28 Dynamic collaborative construction method and system for historical document version knowledge ontology Active CN120893547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511398957.0A CN120893547B (en) 2025-09-28 2025-09-28 Dynamic collaborative construction method and system for historical document version knowledge ontology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511398957.0A CN120893547B (en) 2025-09-28 2025-09-28 Dynamic collaborative construction method and system for historical document version knowledge ontology

Publications (2)

Publication Number Publication Date
CN120893547A CN120893547A (en) 2025-11-04
CN120893547B true CN120893547B (en) 2025-12-26

Family

ID=97497292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511398957.0A Active CN120893547B (en) 2025-09-28 2025-09-28 Dynamic collaborative construction method and system for historical document version knowledge ontology

Country Status (1)

Country Link
CN (1) CN120893547B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120354927A (en) * 2025-06-18 2025-07-22 成都大学 Method and system for constructing multi-modal knowledge graph of non-material cultural heritage based on big data

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080177782A1 (en) * 2007-01-10 2008-07-24 Pado Metaware Ab Method and system for facilitating the production of documents
CN112328810B (en) * 2020-11-11 2022-10-14 河海大学 Knowledge graph fusion method based on self-adaptive mixed ontology mapping
CN113449118B (en) * 2021-06-29 2022-09-20 华南理工大学 Standard document conflict detection method and system based on standard knowledge graph
CN116306933B (en) * 2023-02-13 2025-08-12 中山大学 An entity matching method, device and medium for multi-version knowledge graph
CN119397036A (en) * 2024-11-04 2025-02-07 四川旅游学院 Method and system for constructing intangible cultural heritage knowledge graph based on multimodality
CN120471039A (en) * 2025-05-09 2025-08-12 中山市鸿奇科技有限公司 Intelligent document update processing method and system
CN120492636B (en) * 2025-05-20 2025-12-09 邹平科惠信息咨询有限公司 AI intelligent-based document information retrieval analysis system and method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120354927A (en) * 2025-06-18 2025-07-22 成都大学 Method and system for constructing multi-modal knowledge graph of non-material cultural heritage based on big data

Also Published As

Publication number Publication date
CN120893547A (en) 2025-11-04

Similar Documents

Publication Publication Date Title
CN114880483A (en) A metadata knowledge graph construction method, storage medium and system
CN114003791B (en) Method and system for automatic classification of medical data elements based on depth map matching
CN113987199A (en) BIM intelligent image examination method, system and medium with standard automatic interpretation
CN120493159A (en) Multi-source heterogeneous corpus fusion method and system based on government service data
CN120687574A (en) A government document information extraction and question-answering method, device and medium
CN119829022A (en) Method for generating front-end prototype based on artificial intelligence technology
CN120387437A (en) An automated financial document parsing and question-answering system based on a large model
CN105045933A (en) Method for mapping between ship equipment maintenance and guarantee information relation data base mode and ship equipment maintenance and guarantee information body
CN120874999B (en) Knowledge base enhancement generation method and system based on mixed retrieval and fact verification
CN120893547B (en) Dynamic collaborative construction method and system for historical document version knowledge ontology
CN114841668A (en) Production strategy determination method and device, electronic equipment and storage medium
CN121092700B (en) Case-related property query method and system based on natural language processing technology
CN121117191B (en) Financial knowledge retrieval method and system based on vector database
CN120892536B (en) Greenhouse gas emission analysis method and system based on multi-mode intelligent agent
CN120086356B (en) Highway standard relationship construction, query method, equipment, media and program products
CN120295978B (en) File analysis method, device, medium and product
KR102843407B1 (en) Method and system for text classification and dataset generation by topic
CN119961496B (en) A data management method and system based on data governance and data value extraction
CN119849616A (en) Knowledge fusion driving-based emergency examination knowledge base construction method
CN121188139A (en) AI knowledge base generation method and system based on triples
CN120932840A (en) Data asset management system and method for medical system
CN121390238A (en) Knowledge graph construction method and device, storage medium and terminal
CN121092700A (en) Case-related property query method and system based on natural language processing technology
CN120930739A (en) Multistage construction method, system, equipment and medium for energy policy knowledge graph and intelligent recall policy implementation method, system, equipment and medium
CN121456140A (en) Intelligent data asset processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant