Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the present disclosure, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
Knowledge maps are widely applied in various fields, and the existing personal knowledge base system has the following defects:
defect one, content type is single, only single type data is supported, for example only text is supported;
The second defect and the searching mode are limited and depend on vector similarity and keyword matching;
And thirdly, the computing capacity is limited, and the computing type query such as counting, aggregation and the like is difficult to execute.
The above-mentioned drawbacks can seriously affect the usability and user experience of the personal knowledge base system.
In order to solve the above problems, embodiments of the present invention provide a cross-domain knowledge graph system and a data processing method based on a multi-modal calculation, which construct a multi-modal knowledge graph by using multi-modal data, analyze and optimize a query using at least one query mode, and execute the optimized query based on the multi-modal knowledge graph to obtain an execution result, support multiple types of data, expand a search mode, and improve computing power, thereby improving the practicality and user experience of a knowledge base system.
Specifically, the cross-domain knowledge graph system based on the multi-mode calculation in the super-dimension mode is used for uniformly representing multi-mode data through the super-dimension calculation, the memory center calculation architecture is used for improving complex query efficiency, the advanced calculation query and the cross-mode reasoning are supported based on the multi-mode knowledge graph in the enhanced mode, the cross-domain knowledge graph system based on the multi-mode calculation is suitable for various scenes such as personal knowledge arrangement, academic research and enterprise knowledge management, and the scheme is described in detail through the embodiments.
The method and the device have the advantages that the operations such as data grabbing and data collecting are related, the operations such as data grabbing and data collecting are performed on the premise that the corresponding data provider authorization is obtained, namely, the operations such as data grabbing and data collecting are performed on the premise that legal compliance is achieved.
Referring to fig. 1, a structural block diagram of a cross-domain knowledge graph system based on a multi-modal calculation according to an embodiment of the present invention is shown, where the cross-domain knowledge graph system at least includes a multi-modal data acquisition and super-dimensional encoding module 100, a dynamic knowledge graph construction module 200, and a calculation query processing module 300.
Specifically, the multi-modal data acquisition and super-dimensional encoding module 100 is configured to acquire multi-modal data from multiple paths and convert the multi-modal data into a super-dimensional vector.
The dynamic knowledge graph construction module 200 is configured to identify a multi-modal entity based on the super-dimensional vector, and construct a multi-modal knowledge graph using the multi-modal entity.
The computing query processing module 300 is configured to parse and optimize a query of a user, execute the optimized query based on the multi-modal knowledge graph to obtain an execution result, and use one or more preset query modes.
The plurality of preset query modes include, but are not limited to, natural language query, structured query language, graph query mode, and the like.
In some embodiments, the multi-modality data acquired by the multi-modality data acquisition and super-dimensional encoding module 100 includes, but is not limited to, data of multiple modalities such as text data, image data, audio data, and video data.
The ways of acquiring the multi-modal data by the multi-modal data acquisition and super-dimensional encoding module 100 include, but are not limited to, manual import, automatic capture, API integration, and real-time capture (i.e., supporting multi-way data acquisition), and these ways of acquiring the multi-modal data are described below.
Manual import, that is, a user can upload local files (upload multi-mode data) through a system interface, and the uploaded local files support multiple formats.
For example, uploaded text supports TXT, PDF, DOCX, EPUB and HTML formats, uploaded images support JPG, PNG, BMP, TIFF, SVG formats, uploaded audio supports MP3, WAV, FLAC, OGG formats, and uploaded video supports MP4, AVI, MOV, MKV formats.
And providing a Web grabbing tool, wherein a user can specify a URL or a website, and the Web grabbing tool automatically extracts the content.
For example, the main content is identified and extracted by a web page text extractor, the embedded images, audio and video are identified and downloaded by a multimedia element detector, and information such as title, author, release date, etc. is collected by a metadata extractor.
And the API integration is used for providing a standardized API interface and supporting the integration with a third party application.
For example, the storage connector is connected with a designated cloud service, the application integration interface supports docking with a designated note application, and the browser plug-in realizes one-key storage of Web content.
Real-time capturing, namely supporting a real-time data capturing function.
For example, a screenshot tool captures screen content, a voice recorder records voice notes, and a context awareness system collects contextual information through device sensors.
The multi-modal data acquisition and super-dimensional encoding module 100 acquires multi-modal data through the multiple paths, each path for acquiring multi-modal data includes a standardized preprocessing step, ensures data quality, and extracts basic metadata (such as source, timestamp, and format information). For text data, segmentation, deduplication and format unification are automatically performed, and for multimedia data, basic feature extraction and quality assessment are performed.
The process of multi-modal data acquisition and super-dimensional encoding module 100 acquiring multi-modal data from multiple pathways may be implemented by a "data acquisition system".
In some embodiments, the multi-modal data acquisition and super-dimensional encoding module 100 is specifically implemented to extract multi-modal features from the multi-modal data, map the multi-modal features to the same intermediate dimensions using a projection layer and assign base vectors to the multi-modal features, map the multi-modal features mapped to the intermediate dimensions to a super-dimensional space using the base vectors to obtain super-dimensional vectors, and normalize the super-dimensional vectors.
In other embodiments, the multi-modal data acquisition and super-dimensional encoding module 100 is further configured to perform vector binding, relational encoding, context integration, and metadata attachment on the normalized super-dimensional vector.
That is, after the multi-modal data acquisition and super-dimensional encoding module 100 acquires multi-modal data (i.e., data of a plurality of modalities), the multi-modal data is converted into a unified super-dimensional vector representation, the super-dimensional vector is normalized, and vector binding, relationship encoding, context integration and metadata attachment are performed on the normalized super-dimensional vector.
In the process of converting the multi-mode data into the super-dimensional vector, the multi-mode feature extraction, the super-dimensional projection and unification, the super-dimensional operation and binding, the metadata attachment and the like are involved, and the contents are respectively described below.
(1) Multi-modal feature extraction:
When multi-modal features are extracted from multi-modal data, data (text, image, audio, video, etc.) of different modalities adopts different processing modes to extract corresponding features.
Text processing, namely extracting text semantic features by adopting an improved transducer architecture. Specifically, the input text is first word-segmented and then context-related is captured by a multi-layer self-attention mechanism, resulting in context-sensitive word embedding. Then, a monolithic representation of the text passage is generated by a special pooling operation. This process preserves the semantic structure, emotional tendency, and topic information of the text.
Image processing, namely extracting visual characteristics by combining CNN and Vision Transformer architecture. The image is first divided into overlapping region blocks, each of which is converted into an initial vector by position coding and linear projection. These vectors are then processed through a multi-headed self-attention mechanism and feed forward network to generate a hierarchical visual feature. Among other things, special attention is paid to high-level semantic information such as object recognition, scene understanding, and visual relationships.
Audio processing-extracting acoustic features using multi-scale spectral analysis and a network of deep audio features. The original audio signal is firstly converted into a Mel spectrogram, and then the time-frequency characteristics are extracted through a specially designed neural network. The method can identify voice content, music structures, environmental sounds and acoustic events, and provides a basis for subsequent cross-modal correlation.
And (3) video processing, namely integrating a space-time feature extraction network and capturing dynamic and static information of the video. The video frames are first processed by a spatial feature extractor and then the inter-frame relationship and motion information is captured by a temporal modeling network. In addition, scene transitions, key events, and timing relationships can be identified, forming a comprehensive understanding of the video content.
(2) And (3) super-dimensional projection and unification:
The super-dimensional projection and unification specifically relates to the parts of dimension matching, orthogonal basis transformation, super-dimensional mapping and super-dimensional normalization.
Dimension matching, namely, original feature vectors (multi-modal features) of all modes are firstly mapped to the same intermediate dimension (intermediate representation) through a special projection layer, so that dimension consistency is ensured.
Orthogonal basis transformation, namely distributing a group of approximately orthogonal basis vectors for each mode, and ensuring that different modes occupy different subspaces in a multidimensional space. This design allows information sources to be distinguished and tracked even after modality fusion. In particular, a random projection matrix is used to initialize the basis vectors, and then fine tuning is performed through a Gram-Schmidt orthogonalization process.
Super-dimensional mapping-the intermediate representation maps to a 10000-dimensional super-dimensional space by nonlinear transformation. The transformation keeps the key information of the original feature vector, introduces redundancy of the distributed representation, and improves the tolerance of the system to noise and errors. The transformation function adopts a sectional design, combines nonlinear units such as sigmoid, tanh and ReLU, and ensures the rich expression capability of the feature space.
And (3) performing super-dimensional normalization, namely performing special normalization processing on the final super-dimensional vector to ensure that contents with different modes and different complexity have comparable representations. The normalization process takes into account the sparsity, direction, and magnitude of the vectors so that the representation of similar concepts in the md space is also similar.
(3) Md operation and binding:
The multidimensional operation and binding specifically relates to vector binding, relational encoding and context integration.
Vector binding-a variety of multidimensional vector operations are implemented, including binding, permutation, and binding. The binding operation (implemented by element-level multiplication) is used to associate two concepts, the permutation operation (implemented by vector element rearrangement) is used to represent sequences and relationships, and the binding operation (implemented by vector addition) is used to aggregate multiple related concepts.
And (3) relational coding, namely representing edges in the multi-modal knowledge graph through a special relational coder. Given two entity vectors e1 and e2 and a relationship type r, a relationship vector v_r=Φ (e 1, r, e 2) is calculated, where Φ is a learned encoding function that is capable of capturing semantics and characteristics of different types of relationships.
Context integration-the system supports context-sensitive representation, modulating the basic representation of entities and relationships by special context vectors. The context vector captures information of time, location, provenance, etc., such that the same concept has slightly different representations in different contexts.
(4) Metadata attachment:
each super-dimensional vector is attached with structural metadata comprising sources, creation time, mode types, processing parameters and the like, a system maintains a vector lineage diagram, records conversion and operation histories of the vector, supports tracing and interpretation, and the metadata is stored in a compact binary format and indexed together with the super-dimensional vector without affecting vector operation.
The multi-modal feature extraction, the super-dimensional projection and unification, the super-dimensional operation and binding, the super-dimensional coding mode given by metadata addition and the like can be realized through a super-dimensional coding system, and the super-dimensional coding mode enables data of different modalities to be represented and processed in a unified high-dimensional space, thereby laying a foundation for subsequent multi-modal knowledge graph construction and calculation query.
In some embodiments, the dynamic knowledge graph construction module 200 utilizes the multi-modal entity to construct the multi-modal knowledge graph in a specific implementation manner that the multi-modal relationship (which may be simply referred to as a relationship in the following) is extracted from the multi-modal entity, the multi-modal relationship is verified and evaluated, and the multi-modal entity (which may be simply referred to as an entity in the following) is utilized to construct the multi-modal knowledge graph in combination with the multi-modal relationship which passes the verification and evaluation.
In the process of constructing the multi-mode knowledge graph by the dynamic knowledge graph constructing module 200, the entity identification and management, the relation extraction, the graph construction and the like are involved, and the following descriptions are provided.
(1) Entity identification and management:
Entity identification and management are carried out based on a super-dimensional vector (super-dimensional coded data), and the entity identification and management method particularly relates to multi-mode entity detection, entity linking and standardization, entity evolution and version control.
Wherein the multi-modal entity detection part further relates to text entity recognition, visual entity recognition, audio entity recognition and cross-modal entity association of the contents.
Text entity recognition, namely, combining named entity recognition, phrase extraction and keyword extraction technologies to recognize entities such as concepts, characters, places, organizations, time and the like from the text. The recognition process takes into account context and domain knowledge, and can handle ambiguities and new concepts. For example, the term "apple" may refer to a fruit or company, the system determining its actual meaning by contextual analysis.
Visual entity recognition-applying object detection, scene classification, and visual relationship extraction techniques to images and video frames. The system not only identifies individual objects, but also captures the spatial relationships and interactions between objects. For example, in a family photo, the system can identify entities such as people, pets, furniture, and the like, as well as their relative locations and possible interactions.
Audio entity recognition-speech recognition, sound event detection and music structure analysis are applied to the audio. The system can identify the speaker and subject matter from the conversation, and the sound source and event from the environmental recordings. In particular, the system implements speaker separation and character recognition functions for multi-person conversations.
Cross-modal entity association-the system can associate the same entities represented in different modalities, such as associating characters in video with their voices and names, which uses a cross-modal attention mechanism based on spatial-temporal alignment and semantic matching between modalities.
The entity link and standardization part also relates to the content of internal knowledge base matching, external knowledge base link, entity standardization and entity classification.
And matching the internal knowledge base, namely, matching the newly detected entity with the known entity in the internal knowledge base of the system. The matching process uses similarity calculations of the md vectors, taking into account entity names, attributes and context information.
External knowledge base linking-when the internal match is uncertain, the system attempts to link an entity to the external knowledge base, and the linking process uses a method of combining semantic matching and rule matching of entity descriptions.
Entity normalization, namely, the system performs normalization processing on entity representation, solves the problems of aliases, abbreviations, variants and the like, and the normalization process utilizes a synonym dictionary and an entity analysis algorithm.
And classifying the identified entities according to a predefined ontology structure to form a layered entity type system. The classification process combines rule reasoning and machine learning methods, and can handle the situation of multi-type attribution.
The entity evolution and version control part also relates to the contents of time perception representation, difference detection, conflict resolution and change propagation.
The time-aware representation is that the entity representation contains a time dimension that is capable of tracking attributes and relationships of the entity over time. The system maintains a version history of the entity, recording the content and reasons of each change.
And detecting the difference, namely automatically detecting the difference between the new information and the representation of the known entity by the system, and determining an updating strategy according to the type of the difference. The difference detection is based on semantic comparison and structural comparison, and can identify incremental information, conflict information and outdated information.
Conflict resolution when there is a conflict between new and old information, the system applies a conflict resolution policy. The policy takes into account factors such as information source reliability, time old and new, context correlation, etc., and requests user confirmation when necessary.
And after the entity is updated, the system evaluates the influence of the change on related entities and relations, and if necessary, triggers linkage update, and the propagation mechanism ensures the overall consistency of the multi-mode knowledge graph.
(2) Relation extraction and map construction:
Based on the identified multi-modal entity, multi-modal relation extraction and multi-modal knowledge graph construction are carried out, and the multi-modal relation extraction, relation verification and scoring, knowledge graph construction and maintenance, knowledge storage and indexing are specifically involved.
The multi-modal relation extraction part also relates to text relation extraction, visual relation extraction, time sequence relation extraction and cross-modal relation integration.
Text relation extraction, namely extracting various relations among entities from the text by using dependency syntactic analysis, semantic role labeling and relation classification technology by the system. For example, the "invention" relationship is extracted from the sentence "A invents B", connecting the two entities "A" and "B".
Visual relationship extraction, namely, analyzing the spatial layout, visual characteristics and scene context of objects in an image by the system, and identifying the relationship among visual entities. This includes spatial relationships (e.g. "above", "beside" and "), functional relationships (e.g." use "," operation ") and social relationships (e.g." handshake "," hug ").
And extracting time sequence relation, namely analyzing time sequence modes in video and audio by the system, and extracting the sequence, cause and effect and inclusion relation of the event. For example, a dialogue mode of "question-answer" is identified from a conference recording, or a sequence of actions of "prepare-execute-complete" is identified from an operation video.
And integrating the relation extracted by different modes through a system to form a consistent relation representation. When the conflict occurs, the system resolves the conflict according to the reliability of the modes and the information integrity.
The relationship verification and scoring part further relates to the integration of the logic consistency check, evidence evaluation, relationship confidence calculation and user feedback.
Logical consistency check-the system checks whether the newly extracted relationships are consistent with the known knowledge logic. This includes symmetry, transitivity, and mutual exclusivity checks of the relationship, as well as compliance checks with domain constraints.
Evidence assessment the system assesses the strength of evidence supporting the relationship, taking into account evidence sources, quantity and quality. Each relationship carries a link of evidence and the user can view the original support material.
And calculating the confidence coefficient of the relation, namely, based on the reliability, the evidence intensity and the logic consistency of the extraction method, distributing a confidence coefficient score for each relation by the system. The confidence score affects the weight of the relationship in queries and reasoning.
And the user feedback integration, namely the system collects and integrates the feedback of the user on the correctness of the relationship, dynamically adjusts the relationship scoring and verification model, and the human-computer collaboration mechanism remarkably improves the knowledge quality.
The knowledge graph construction and maintenance part also relates to the contents of graph structure initialization, incremental graph updating, graph quality optimization and multi-view management.
And initializing the graph structure, namely initializing the multi-modal knowledge graph structure based on the extracted multi-modal entity and the multi-modal relation by the system. Nodes in the graph correspond to the multi-modal entity's md representation, edges correspond to the multi-modal relationships between the multi-modal entities, both of which are accompanied by metadata and confidence information.
Incremental graph update, namely, as new data is added, the system executes the incremental graph update to efficiently integrate new multi-modal entities and multi-modal relationships. The update process uses change logs and difference calculations, minimizing computational overhead.
And (3) the graph quality optimization process is periodically executed by the system, wherein the process comprises redundant relation elimination, relation inference completion and isolated node processing. The optimization process is based on graph structure analysis and pattern matching, and connectivity and consistency of graphs are improved.
And the system supports multi-view representation of the multi-mode knowledge graph, and provides customized views for different application scenes. For example, a temporal view emphasizes event timing, a topic view focuses on conceptual relationships, and a social view highlights person interactions.
The knowledge storage and indexing part further relates to distributed storage architecture, multi-level index structure, multidimensional vector index, change log and recovery.
And the distributed storage architecture is adopted by the multi-mode knowledge graph, so that horizontal expansion and high availability are supported. The core entity and the relation are stored in a main graph database, and frequently accessed subgraphs are cached in a memory.
Multistage index structure the system builds a multistage index structure supporting efficient entity lookup, relationship traversal and pattern matching. The indexes include an entity index, a relationship type index, an attribute index, and a graph schema index.
The super-dimensional vector index is that the super-dimensional representation of the entity and the relation is managed by a special vector index engine to support the approximate nearest neighbor search and the semantic similarity query. The index adopts a hierarchical structure, and the query efficiency and the index maintenance cost are balanced.
And (3) change log and recovery, namely maintaining a detailed change log by the system and recording all modification operations of the multi-mode knowledge graph. The log supports the functions of time point recovery, change rollback and operation audit, and the reliability and traceability of the system are enhanced.
Through the dynamic construction and maintenance mechanism provided by the multi-modal relation extraction, relation verification and scoring, knowledge graph construction and maintenance, knowledge storage and indexing, multi-modal data can be organized into a structured knowledge graph, and a foundation is provided for subsequent calculation inquiry and reasoning.
In some embodiments, referring to another block diagram of a cross-domain knowledge-graph system based on a multi-modal calculation in a super-dimensional manner, as shown in FIG. 2, the cross-domain knowledge-graph system further includes a memory-centric computing engine module 400.
Specifically, the memory hub computing engine module 400 is configured to construct a memory hub architecture for storing at least a multi-modal knowledge graph, and configure a computing scheduling and optimization mechanism.
It should be noted that, the memory center computing engine module 400 relates to the design of a memory center architecture, computing scheduling, and optimization, and these contents are described below.
(1) And (3) designing a memory center architecture:
The limitation of the traditional computing architecture is broken, and the computing capability is directly integrated into a storage layer, and particularly relates to the memory fusion design, the data layout and the partition of a processor and the primitives of memory computation.
The part of the memory fusion design of the processor also relates to the contents of a storage computing unit, near storage processing, a memory mapping mechanism and a processing pipeline.
The system is embedded with special computing units in the storage array, and each unit can directly execute basic operation on the stored data. These computational units employ heterogeneous designs, including vector processors, graph processing cores, and scalar processing units, that are optimized for different types of operations.
And the near storage processing is that the system realizes a data near processing mechanism, and decomposes and pushes the query operation to the position of the data. For example, operations such as entity filtering, relationship matching, and local aggregation are performed directly at the storage layer, and only the results are returned to the upper processing unit.
The system adopts a special memory mapping mechanism to map the logic structure of the multi-mode knowledge graph to the physical storage layout. The related multi-modal entities and the frequently commonly accessed multi-modal relationships are arranged in close physical locations, reducing data movement overhead.
Processing pipeline the system designs a multi-stage processing pipeline to decompose complex operations into basic steps that can be executed in parallel in the storage layer. The pipeline comprises data access, filtering, transformation, aggregation, sequencing and other stages, and each stage is optimized for a memory center architecture.
The data layout and partition part also relates to the graph structure partition, vector quantization storage, hot spot data management and dynamic redistribution.
And partitioning the graph structure, namely partitioning the multi-mode knowledge graph according to the topological structure and the access mode. The system uses a graph partitioning algorithm to allocate closely connected sub-graphs to the same memory region, minimizing cross-partition access.
Vector quantization storage, namely, a special quantization storage scheme is adopted for the super-dimensional vector, so that the storage space is reduced on the premise of maintaining the vector semantics. The quantization scheme customizes compression policies for different types of entities and relationships, balancing storage efficiency and semantic fidelity.
And (5) hotspot data management, namely continuously monitoring a data access mode by the system, and identifying hotspot entities and relations. The hot spot data is automatically copied to a higher speed storage layer, such as non-volatile memory or on-chip cache, to speed up frequent queries.
Dynamic redistribution the system periodically executes data redistribution operation according to the access statistics and the query mode, and adjusts the data layout to adapt to the changing workload. The redistribution operation is performed in an incremental manner when the system load is low, minimizing the impact on normal operation.
The memory computation primitive part also relates to vector operation, graph operation, aggregation function and filtering condition.
Vector operation the system implements a set of efficient vector computation primitives including similarity computation, vector synthesis and decomposition operations. These primitives are optimized for the characteristics of the super-dimensional vector, supporting massive parallel computation.
The system provides specialized graph processing primitives including node selection, edge traversal, path exploration, and pattern matching. These primitives directly operate on the graph structure in memory, avoiding the data movement overhead of conventional graph algorithms.
Aggregation function the system implements a series of efficient aggregate computation primitives including counting, summing, averaging, maximum/minimum, packet statistics, etc. The aggregation operation supports distributed execution and partial result merging, and is suitable for large-scale data processing requirements.
Filtering conditions, namely supporting complex in-memory filtering operations by the system, wherein the filtering operations comprise attribute matching, range constraint, mode matching and semantic similarity filtering. The filtering conditions can be pushed down to the storage layer, significantly reducing the amount of data that needs to be processed.
(2) Computing schedule and optimization:
In order to fully utilize the memory center architecture, a special calculation scheduling and optimizing mechanism is realized, and the method particularly relates to the parts of query decomposition and planning, resource management and load balancing and execution optimizing technology.
The query decomposition and planning part also relates to operation decomposition, data positioning, parallelism planning and execution sequence optimization.
Operation decomposition-the system decomposes the complex query into a basic sequence of operations, each operation corresponding to a memory computation primitive or combination thereof. The decomposition process generates an optimized execution plan taking into account the operational dependencies and the data flow.
Data positioning, namely analyzing data distribution required by query by the system, and determining the optimal execution position and strategy. For cross-partition operations, the system evaluates the overhead of data movement and remote execution, selecting the solution with the lowest total cost.
And (3) parallelism planning, namely determining the parallelism of each operation by the system according to the operation characteristics and the available resources. Resource-intensive operations (e.g., complex aggregations) allocate more parallel units, while simple operations (e.g., basic filtering) are performed in a pipelined fashion.
And optimizing the execution sequence, namely, applying query rewrite rules to the system, adjusting the execution sequence of the operation, and maximizing the filtering effect and the multiplexing of intermediate results. For example, a highly selective filtering condition is performed in advance, reducing the amount of data to be processed later.
The resource management and load balancing part further relates to resource allocation, load monitoring, task migration and batch optimization.
And (3) resource allocation, namely dynamically allocating computing and storage resources by the system, ensuring that enough resources are obtained for key queries, and simultaneously maintaining the overall throughput of the system. The resource allocation policy takes into account query priority, complexity, and expected execution time.
Task migration, namely when local overload is detected, the system triggers a task migration mechanism to transfer part of computing tasks to a processing unit with lighter load. The migration process takes into account data dependence and migration overhead, avoiding unnecessary data movement.
Batch optimization-the system identifies similar operations that can be batched, and performs them in combination to increase resource utilization. For example, multiple similar entity queries may be combined into one batch retrieval operation, sharing data access and intermediate results.
The optimization technique is performed in part by algorithm selection, approximation calculation, result caching, pre-calculation and materialized view.
And (3) algorithm selection, namely maintaining a plurality of algorithm realizations for each operation by the system, and selecting the most suitable algorithm according to the data characteristics and the resource condition. For example, for dense subgraph traversal, the system selects an optimized graph traversal algorithm, and for sparse graph regions, index-driven point interrogation is used.
Approximation calculation for queries that allow approximation results, the system applies approximation calculation techniques to accelerate the process by sampling, sketch algorithms, or quantization methods. The approximation degree and the calculation cost can be controlled by parameters, so that different precision requirements can be met.
And the result caching, namely the system caches the frequently queried result and intermediate state, so that repeated calculation is reduced. The cache management considers the data update frequency, the query heat and the result size, and the intelligent replacement strategy is used for maximizing the cache benefit.
Pre-computing and materialized views the system analyzes the query pattern to create pre-computed results or materialized views for common aggregate operations and complex relationships. The pre-calculated content is automatically updated or disabled according to the data change, and query acceleration and maintenance overheads are balanced.
By the above mentioned memory centric computing architecture, the system is able to efficiently perform complex computing queries, particularly aggregation, statistics, and counting type operations, overcoming limitations of conventional knowledge base systems in this regard.
In some embodiments, the computing query processing module 300 analyzes and optimizes the user's query, and the specific implementation manner of executing the optimized query based on the multimodal knowledge graph to obtain the execution result is that the user's query is analyzed to obtain the analysis result; generating an execution plan according to the analysis result, and carrying out optimization operation on the query, wherein the optimization operation comprises a rewriting operation, an index matching operation and a parallel execution strategy generation operation, and executing the optimized query based on the execution plan and the multi-mode knowledge graph to obtain an execution result.
In other embodiments, the computing query processing module 300 is further configured to aggregate, convert, and validate execution results and output the aggregated, converted, and validated execution results.
It should be noted that, the computing query processing module 300 relates to the contents of query language and interface, query understanding and optimization, execution and result processing, and the like, and these contents are described below respectively.
(1) Query language and interface:
Providing multiple query interfaces, supporting complex computing and statistical requirements, and particularly relates to multi-modal query languages, computing expression capabilities and multi-modal query support.
The multi-mode query language part relates to natural language query, structured query language, graph query mode and mixed query mechanism.
Natural language query the system supports the direct use of natural language to express query intent. The query parser analyzes sentence structure and semantics, identifying entities, relationships, conditions, and computational requirements. For example, the user may directly ask "how many photos that i take in 2020 contain a mountain scene," or "what three people i most commonly meet at a cafe.
Structured query language the system provides structured query language like SQL, supporting the accurate expression of complex query logic. Language support SELECT, FROM, WHERE, GROUP BY, HAVING, etc., as well as various aggregation functions and computational expressions. Suitable for users with a programming context or who need to control the query behavior precisely.
The system supports a query mode based on the graph mode, and a user can construct a query template through a graphical interface to specify node types, relationship conditions and calculation requirements. The visual query mode intuitively displays the association between the knowledge and is suitable for exploring a complex relationship network.
Hybrid query mechanism-the system allows for the mixed use of different query patterns in a single query. For example, the user may describe the base requirements in natural language and then add precise computational requirements through structured grammar or adjust the relationship conditions through a graphical interface.
The expression capability calculation part also relates to basic aggregation operation, advanced statistical function, time dimension calculation and space dimension calculation.
Basic aggregation operation the system supports standard aggregation operation, including COUNT, SUM, AVG, MIN, MAX, MEDIAN. These operations may be applied to entity attributes, relational properties, or computational expressions.
Advanced statistics functionality the system provides a series of advanced statistics functions including distribution analysis, correlation computation, trend detection and anomaly identification. These functions are based on underlying aggregation, providing more sophisticated statistical insight.
And the time dimension calculation is that the system particularly optimizes the time-related calculation capability and supports time sequence analysis, periodic detection, change rate calculation and history comparison. The user may conveniently query for e.g. "increasing trend of monthly reading" or "weekday versus weekend activity patterns".
And the space dimension calculation is that the system integrates a geospatial calculation function and supports distance calculation, regional statistics, path analysis and position clustering. These functions enable the user to explore issues such as "distribution range of frequently accessed sites" or "best meeting point between two sites".
The multi-modal query support part also relates to cross-modal condition expression, similarity query, modal conversion query and multi-modal result presentation.
Cross-modality condition expression query language supports setting conditions and associations between different modalities. The user may combine cross-modal conditions such as "video containing this person" (visual conditions) and "dialog discussing financial topics" (text conditions).
Similarity query-the system supports similarity queries based on a md vector, and users can find items that are semantically similar through example entities or content segments. This ability to search through graphs "," search through text "or" search through views "breaks the boundaries between modalities.
Modal transformation queries the system is able to handle queries requiring modal transformation, such as "transform my lecture content into a list of points" or "find all pictures mentioned in this article". These queries involve information extraction and transformation between modalities.
The system selects the most suitable presentation mode, possibly a text abstract, a chart visualization, an image collection or a multimedia combination, according to the query nature and the result type. The presentation style takes into account information integrity, intuitiveness, and user preferences.
(2) Query understanding and optimization:
The system adopts advanced query processing technology to ensure efficient and accurate query execution, and particularly relates to query understanding and analysis, query planning and rewriting and optimization technology for calculating query.
The query understanding and analyzing part further relates to the contents of intention recognition, entity and relation analysis, condition decomposition and calculation requirement analysis.
Intent recognition the system first recognizes the basic intent of the query, such as retrieval, counting, aggregation, comparison, or prediction. The intent recognition is based on semantic analysis and query pattern matching, taking into account query language characteristics and contextual information.
And resolving the entity and the relation, namely resolving the multi-modal entity and the multi-modal relation mentioned in the query by the system, and mapping the natural language expression to specific nodes and edges in the multi-modal knowledge graph. The parsing process handles ambiguities, aliases, and ambiguous expressions, inferring implicit entities from context if necessary.
Conditional decomposition-complex query conditions are decomposed into basic logical expressions forming a condition tree structure. The system analyzes the logical relationships (AND, OR, NOT) and priorities among the conditions, providing a basis for subsequent execution planning.
The computing requirement analysis is that the system identifies computing requirements in the query, including indexes to be computed, aggregate dimensions, filtering conditions, and ordering requirements. The computational demand analysis takes into account explicit instructions and implicit expectations to ensure that the query results conform to the user's intent.
The query planning and the rewriting part also relate to the contents of execution plan generation, query rewriting, index matching and parallel strategies.
Execution plan generation the system generates an optimal execution plan for each query, describing the order of operations, algorithm selection, and resource allocation. The plan generation uses a cost-based optimization model that takes into account data characteristics, operational complexity, and system state.
Query rewrite-the system applies a series of query rewrite rules to transform the original query to increase execution efficiency. The rewriting technology comprises predicate push, common sub-expression extraction, connection reordering, aggregation advance and the like, so that data access and calculation paths are optimized.
Index matching, namely analyzing query conditions by the system, and selecting the most suitable index structure to accelerate execution. Index selection takes into account coverage, selectivity, and access costs, possibly using multiple indices in combination to optimize the query path.
Parallel policy the system formulates a parallel execution policy for the query, determines which operations can be parallelized and how to distribute the parallel tasks. The policy balances parallelism and coordination overhead based on dependencies between operations, data distribution, and available resources.
The optimization technology of the calculation query involves the contents of incremental calculation, hierarchical aggregation, approximate query and calculation result caching.
Incremental computation for repeatedly executed calculation inquiry, the system adopts an incremental calculation method to only process the data changed compared with the previous inquiry. The method is particularly suitable for periodic statistics and real-time dashboards, and the calculation cost is remarkably reduced.
Hierarchical aggregation, namely, the system realizes a hierarchical aggregation strategy and decomposes large-scale aggregation operation into a plurality of layers. The bottom layer processing unit performs local aggregation, the middle layer merges partial results, and the top layer generates a final aggregate value. The layering method improves parallelism and expansibility.
Approximation queries for large-scale computational queries, the system supports approximation modes, which generate approximation results quickly by sampling, probabilistic data structures (e.g., hyperLogLog, count-MIN SKETCH), or distributed estimation algorithms. The user can control the balance of accuracy and response time by parameters.
And (3) calculating the result cache, namely maintaining an intelligent cache of the calculation query result by the system, and managing cache invalidation according to the data dependency relationship and the update frequency. The caching strategy takes the query repetition mode, the result size and the computational complexity into consideration, and maximizes the caching benefit.
(3) Execution and result processing:
After the query planning is completed, the system efficiently executes the query and processes the results, and particularly relates to the steps of query execution control, result processing and conversion, and result presentation and interpretation.
Wherein, the part of query execution control also relates to the contents of execution coordination, resource control, execution feedback and staged processing.
And (3) performing coordination, namely inquiring the execution coordinator to schedule each operation step according to an execution plan and managing the workflow and the intermediate result. The coordinator monitors the execution progress, processes abnormal conditions and adjusts the execution strategy if necessary.
Resource control the system implements fine-grained resource control to ensure that query execution does not exceed allocated resource limitations. The control mechanism comprises a memory limit, a processing time quota and a concurrent operation quantity limit, so that a single query is prevented from occupying excessive resources.
The execution feedback the system provides real-time execution feedback including percentage completion, intermediate result statistics, and predicted time remaining. For long running queries, the user may decide whether to continue waiting, adjusting the query, or obtaining partial results based on feedback.
And (3) stage processing, namely, for large-scale inquiry, the system adopts a stage processing method to preferentially generate a summary or a sample result, and then the detailed processing is finished in the background. The progressive execution mode improves interaction experience, and a user can quickly obtain preliminary insight.
The result processing and conversion part also relates to the contents of result aggregation, post-processing conversion, multi-mode result generation and result verification.
And (3) result aggregation, namely merging partial results from different processing units by the system to form a complete query result set. The aggregation process handles the problems of data repetition, consistency and complementation, and ensures the accuracy of results.
Post-processing conversion, namely, the system carries out post-processing on the original results according to the query requirement, and the post-processing comprises sequencing, grouping, formatting and derivative calculation. The post-processing step enhances the usability and interpretability of the results.
Multimodal results generation for queries involving multimodal data, the system generates results representations that integrate different modalities. Such as a summary of the drawing and metallocene, a set of images with audio annotations, or a sequence of multimedia events on a timeline.
And (3) result verification, namely executing a result verification flow by the system, and checking whether the result meets the query constraint and the data consistency requirement. The validation process may include range checking, sum validation and sample validation, ensuring the quality of the results.
The presentation and interpretation of results in turn involves context-aware presentation, interpretive enhancement, interactive exploration, output of results, and sharing of these items of content.
Context-aware presentation, the system selects the most appropriate presentation based on the query context and the result characteristics. The quantized data is visualized by a graph, the relationship network is shown by an interactive graph, and the time sequence data is expressed by a time line or a dynamic graph.
The system provides explanatory information for the calculation result, including calculation basis, data source and confidence evaluation. The interpretation function helps the user understand the generation process and reliability of the result, and enhances transparency and trust.
Interactive exploration-the system supports interactive exploration of results, and users can go deep into looking at detailed data, adjust view parameters or extend queries. The interactive functions include drill down, screening, reorganization and comparison, enabling the user to analyze the results from multiple angles.
Results output and sharing the system provides a variety of results export and sharing options including structured data export, report generation, and collaborative sharing. Export formats support CSV, JSON, PDF, etc. standard formats and system-specific semantic enhancement formats.
Through the computing query processing module 300, the system is able to handle a variety of complex computing query requirements, from simple counting statistics to complex multidimensional analysis, providing deep knowledge insight to users.
In some embodiments, referring to FIG. 3, a further structural block diagram of a cross-domain knowledge-graph system based on a multi-modal calculation is shown, the cross-domain knowledge-graph system further comprising a cross-modal reasoning and correlation module 500.
Specifically, the cross-modal reasoning and association module 500 is configured to identify an implicit mode of the multi-modal knowledge graph and generate association knowledge, perform reasoning according to the implicit mode and the association knowledge to obtain a reasoning result, and enhance and evolve the multi-modal knowledge graph by using the reasoning result.
In a specific implementation, the cross-modal reasoning and correlation module 500 performs reasoning according to the implicit mode and the correlation knowledge to obtain a reasoning result, namely, performs reasoning according to the implicit mode and the correlation knowledge by combining a preset reasoning mechanism and a multi-hop reasoning strategy to obtain the reasoning result.
It should be noted that, the cross-modal reasoning and association module 500 relates to pattern recognition and knowledge discovery, multi-hop reasoning and knowledge inference, knowledge enhancement and evolution, and the following descriptions are respectively provided.
(1) Pattern recognition and knowledge discovery:
the implicit mode and associated knowledge in the multi-modal knowledge graph are mined, and the method particularly relates to cross-modal mode mining, associated knowledge generation, knowledge verification and scoring.
Wherein, the cross-modal pattern mining part also relates to co-occurrence pattern analysis, semantic consistency analysis, structure pattern recognition and space-time pattern mining.
And (3) co-occurrence mode analysis, namely detecting co-occurrence modes in different mode data by the system, such as frequent combination of a specific text theme and image elements or corresponding relation between an audio event and a video scene. The analysis adopts a multi-level association rule mining algorithm, and the time adjacency and the semantic relativity are considered.
Semantic consistency analysis, namely evaluating semantic consistency of different modal expressions by the system, and identifying complementary or interpreted contents. For example, the system can associate a textual description with a corresponding chart visualization, or associate presentation content with a presentation slide.
Structural pattern recognition the system recognizes feature subgraphs and topology patterns in the structure of the multimodal knowledge graph, such as star structures (central entity and multiple related entities), path patterns (specific relationship chains) or community structures (tightly connected groups of entities).
Space-time pattern mining, in which the system analyzes the temporal and spatial distribution of entities and events, and identifies periodic patterns, movement trajectories, hot spot areas, and space-time aggregations. These patterns reveal rules of user behavior, interest changes, and environmental interactions.
The generation of the associated knowledge also relates to knowledge completion, entity enrichment, label propagation and relationship reasoning.
Knowledge completion-the system generates missing entity relationships based on known patterns. The completion process uses pattern matching and reasoning rules to fill in the information blank in the multi-pattern knowledge graph. For example, upon identifying the "paper-author-organization" pattern, the system can infer the missing "author-organization" relationship.
Entity enrichment-the system adds supplemental attributes and context to the multimodal entity through cross-modal association. For example, visual features extracted from a photograph may enrich the description of a persona entity, or speech content extracted from a meeting recording may supplement the context of a project entity.
Tag propagation-the system propagates tags and classification information through the graph structure. When part of the multi-mode entities are marked, the system propagates the labels to unmarked entities based on the entity similarity and the relation mode, and the classification coverage range is expanded.
Relationship reasoning the system applies a series of reasoning rules to derive implicit relationships from known relationships. The reasoning rules comprise a transfer relation, a symmetrical relation, an inverse relation and a combination relation, so that the connectivity and the reasoning capability of the multi-mode knowledge graph are enhanced.
The knowledge verification and scoring part further relates to consistency verification, multi-source verification, confidence calculation and man-machine cooperation verification.
Consistency check the system checks whether the newly generated knowledge remains consistent with the existing knowledge. The verification process applies domain constraints and logic rules to identify and resolve potential conflicts or contradictions.
Multisource verification-the system looks for knowledge that multiple sources support newly generated. The sources may be data of different modalities, observations at different times, or conclusions of different inference paths. Multi-source verification significantly improves the reliability of inferred knowledge.
Confidence calculation, namely, the system distributes confidence scores for each item of inferred knowledge and reflects the reliability and the certainty of the inferred knowledge. The calculation takes into account the reasoning path length, rule reliability, supporting evidence strength and consistency level.
Man-machine cooperation verification that the system designs a man-machine cooperation verification mechanism, and requests user confirmation under specific conditions (such as high-impact but low-confidence inference). User feedback not only validates specific inferences, but also helps the system improve the inference rules and confidence models.
(2) Multi-hop reasoning and knowledge inference:
the system realizes complex multi-step reasoning capability, and particularly relates to the parts of reasoning mechanism design, multi-hop reasoning strategy and reasoning result integration.
The reasoning mechanism design relates to the contents of path-based reasoning, rule-based reasoning, case-based reasoning and uncertainty reasoning.
Path-based reasoning the system supports reasoning along the relationship path in the multimodal knowledge graph. Given the initial entity and target entity types, the system explores possible relationship paths to find out the indirect relationship of the connected entities. For example, from a character, the country in which the character work is located is found by a "create-belong-locate" path.
Rule-based reasoning the system integrates a set of reasoning rule base including domain specific rules and general logic rules. Rules take the form of "if-then", defining a mapping from preconditions to conclusions. The rules engine efficiently matches and applies these rules, extending the knowledge range.
Case-based reasoning the system accumulates typical reasoning cases for analogy reasoning. When a new problem is encountered, the system searches for a history case with similar structure, and adapts to the reasoning process to solve the current problem. This approach is particularly suited to handling special modes and exceptions.
Uncertainty reasoning-the system uses probability graph models to represent and process uncertainty knowledge. The reasoning process calculates conditional probabilities and confidence intervals, and generates possible conclusions and probability distributions thereof. Uncertainty reasoning applies to situations where information is incomplete or contradictory.
The multi-hop reasoning strategy also relates to breadth-first exploration, heuristic depth exploration, bidirectional reasoning and parallel path exploration.
Breadth-first exploration, for an open reasoning task, the system explores the knowledge space using breadth-first policies. This approach takes into account a wide variety of possibilities, suitable for finding diverse indirect associations.
Heuristic deep exploration, namely, for a reasoning task with a definite target, the system adopts heuristic deep search to preferentially explore paths which are more likely to reach the target. Heuristic functions take into account path relevance, entity importance, and relationship strength.
And (3) bidirectional reasoning, namely, the system advances from the starting point and the target to the middle at the same time to form bidirectional reasoning. The method obviously reduces the search space and accelerates the process of long-path reasoning.
Parallel path exploration, namely, the system explores a plurality of reasoning paths in parallel and evaluates different reasoning directions simultaneously under the condition of resource permission. The parallel strategy improves the reasoning efficiency and is particularly suitable for complex reasoning tasks with intensive computation.
The reasoning result integration part also relates to the contents of result aggregation, conflict resolution, result interpretation generation and feedback learning.
And (3) result aggregation, namely integrating results from different reasoning paths by the system to form a comprehensive conclusion. The aggregation process takes path reliability, result consistency and evidence strength into consideration, and balances the contributions of different reasoning sources.
Conflict resolution when different inference paths produce conflicting results, the system applies a conflict resolution policy. Policies include evidence weight comparison, confidence assessment, and domain priority rules, in some cases preserving multiple possible interpretations and designating conditions.
And (3) generating result interpretation, namely generating natural language interpretation for the reasoning result by the system, and describing the reasoning process, key steps and supporting evidence. The interpretation content adjusts the detail degree according to the professional level of the user, and the understandability and the credibility of the result are enhanced.
Feedback learning, namely, the system records the reasoning process and the result use condition, learns which reasoning modes are more effective and which rules are more reliable. The feedback learning mechanism continuously optimizes the reasoning strategy and rule weight, and improves the reasoning capacity of the system.
(3) Knowledge enhancement and evolution:
The system continuously enhances and evolves the multi-mode knowledge graph based on the reasoning result, and particularly relates to knowledge enhancement mechanism, knowledge evolution management and personalized knowledge adaptation.
The knowledge enhancement mechanism also relates to the contents of inference result integration, knowledge network expansion, modal complementation enhancement and context enrichment.
And integrating the reasoning results, namely formally integrating the verified reasoning results into a multi-mode knowledge graph to form a new knowledge unit. The integration process retains the reasoning source and confidence information, enabling the newly added knowledge to be traceable and assessed.
And (3) knowledge network expansion, namely actively expanding the knowledge network by the system and establishing richer entity association. The direction of expansion is guided by user interests and usage patterns, giving preference to developing knowledge areas of high value, high access frequency.
Modality complementarity enhancement-the system uses complementarity of different modalities to enhance the representation of one modality with information of another modality. For example, the text description may enhance image understanding and the speech content may enrich the video context.
Context enrichment, in which the system gathers and integrates contextual information of entities, including temporal context, spatial location, related events, and social environment. The context is rich, so that knowledge representation is more comprehensive, and more accurate retrieval and reasoning are supported.
The knowledge evolution management part also relates to timeliness processing, progressive refining, version control and backtracking, and knowledge forgetting mechanisms.
Timeliness processing, namely timeliness of system management knowledge, marking and updating outdated information. The processing strategy is adjusted according to the knowledge type, the fact knowledge is completely updated, the viewpoint knowledge keeps a historical version and a new view angle is added.
Progressive refinement the system refines knowledge representation by continuous learning. The refining process integrates new observation and user feedback, and adjusts entity attributes, relationship weights and knowledge organization, so that the knowledge structure is more accurate and relevant.
Version control and backtracking, namely, the system realizes complete knowledge version control and records all substantial changes. Version control supports knowledge state backtracking, change comparison and selective recovery, and controllability and reversibility of knowledge evolution are guaranteed.
Knowledge forgetting mechanism, namely, a system simulates the forgetting process of human memory and gradually reduces the weight of rarely accessed and low-value knowledge. The forgetting mechanism prevents performance degradation and information noise caused by excessive accumulation of knowledge, and keeps the system high efficiency.
Personalized knowledge adaptation in turn involves modeling of user interests, adaptation of usage patterns, knowledge presentation customization, collaborative knowledge construction.
User interest modeling, namely, the system establishes a user interest model and tracks topics, entities and relations focused by the user. The interest model influences the direction and priority of knowledge evolution, and ensures that system resources support the knowledge field of interest of the user preferentially.
Usage pattern adaptation-the system analyzes the user's knowledge usage patterns, including query type, browsing path, and access frequency. Based on the use mode, the system optimizes knowledge organization and index structure, and improves the efficiency of common operation.
Knowledge presentation customization, namely customizing a knowledge presentation mode according to user preference and skill level by the system. Customization includes detail level adjustment, term of art use, visualization style and interaction mode, enhancing personalization of user experience.
Collaborative knowledge construction, namely supporting multi-user collaborative construction and perfecting knowledge by the system. The collaboration mechanism includes knowledge sharing, contribution assessment, and conflict resolution, enabling personal knowledge to be selectively integrated into the collective intelligence with privacy preservation.
Through the cross-modal reasoning and association mechanism, the system can continuously find hidden knowledge patterns and infer new relations and insights, so that the knowledge base is not only an information storage, but also a knowledge generator and an evolutionary device.
All the above matters are descriptions of a cross-domain knowledge graph system based on the multi-modal calculation in a super-dimension, and in the overall operation flow of the system of the cross-domain knowledge graph system, the data life cycle management mainly comprises a data intake stage, a knowledge construction stage, a calculation and query stage and a knowledge enhancement stage.
The data intake stage comprises uploading or creating multi-mode content by a user through a self-adaptive interface or collecting data by a system through an automatic acquisition mechanism, carrying out preliminary preprocessing on the data, including format standardization, quality assessment and basic metadata extraction, and enabling the multi-mode data to flow into each special encoder in parallel to generate a preliminary characteristic representation.
The knowledge construction stage comprises the steps of converting characteristic representation into a unified super-dimensional vector format, reserving modal characteristics and realizing calculation compatibility, extracting multi-modal entities from modal data by an entity recognition system, executing link and standardization processing, analyzing relationships among the entities by a relationship extraction system, constructing edge connection of a multi-modal knowledge graph, dynamically updating the multi-modal knowledge graph, integrating new entities and relationships, and maintaining version history and change records.
The calculation and query stage comprises the steps of submitting a query request through an interface by a user, analyzing query intention and parameters by a system, generating an optimized execution plan by a query planning module, considering data distribution and calculation resources, executing the plan by a memory center calculation engine, performing calculation operation at the position of the data, integrating, converting and formatting calculation results by a result processing module, and generating user-friendly output.
The knowledge enhancement stage comprises the steps of analyzing knowledge patterns by a cross-mode reasoning module to find potential association and hiding rules, executing complex reasoning tasks by a multi-hop reasoning system to generate new knowledge insight, evaluating reliability of reasoning results by a knowledge verification system, executing multi-source verification and consistency check, integrating the verified knowledge back into a multi-mode knowledge graph, and starting a new round of knowledge evolution circulation.
The complete flow of the cross-domain knowledge-graph system for processing typical tasks is shown by the following examples:
1. And importing and processing multi-mode content, wherein a user uploads a group of conference materials including PDF documents, presentation slides, conference recordings and live photos through a system interface.
The system performs the processing of a document processor extracting PDF text content, identifying titles, chapters, and key terms, a slide analyzer extracting presentation content, identifying titles, points, and charts, a speech recognition system transcribing conference recordings, marking speakers and subject paragraphs, and an image analyzer processing photographs, identifying participants, presentation content, and scenes.
Wherein each processor generates a modality specific feature representation that is converted to a unified, super-dimensional vector format. The system identifies common entities (people, items, concepts) and relationships (speech, presentation, discussion) across the materials, building a graph structure representing the knowledge of the meeting. The structure is integrated with the prior knowledge graph of the user, and the connection between the newly added content and the historical knowledge is established.
2. Complex computing query processing the user asks "what people are most in discussion in the meeting related to item X in the past year, which technical questions they focus on.
The system processes are that a query analyzer identifies query intent (statistical analysis), target entity (participants of project X related conferences) and calculation requirement (participation frequency statistics and topic analysis), an execution planner generates an optimal plan, the optimal plan comprises related conference identification, participant screening, speaking statistics and topic extraction, a calculation engine executes the plan in a memory, related nodes are rapidly positioned by using a graph structure to execute on-site calculation, a statistical processor generates ranking results, personnel with highest participation degree and discussion topics thereof are displayed, and a result presentation module generates a visual report comprising participant ranking graphs, topic heat graphs and key discussion abstracts.
3. Cross-modal knowledge discovery and enhancement:
the system analyzes the knowledge graph of the user in the background to find potential modes and relations, a mode identifier finds frequent relations between specific project discussions and specific positions, a time analyzer identifies key stages and turning points of project progress, a multi-hop reasoning engine finds implicit cooperative relations among different team members, and a topic evolution tracker analyzes the change trend of a technical discussion topic.
The system integrates the findings into knowledge insight, and adds the knowledge insight to the multi-mode knowledge graph after verification. The next time the user accesses the relevant content, the system actively shows these enhancements like "do you notice a significant change in the discussion pattern of this project at time X.
From the whole, the overall architecture of the cross-domain knowledge graph system based on the super-dimensional multi-modal calculation is shown in fig. 4, the cross-domain knowledge graph system is composed of multi-modal data acquisition and super-dimensional coding, dynamic knowledge graph construction, cross-modal reasoning and association, calculation query processing, memory center calculation engine (memory center calculation architecture) and the like, and an adaptive user interface is provided, wherein the super-dimensional knowledge graph in fig. 4 is the constructed multi-modal knowledge graph.
The specific data processing flow of the multi-mode data acquisition and the super-dimensional coding is shown in fig. 5, multi-mode data (text data, image data, audio data and the like) are acquired, and feature extraction, super-dimensional projection and unification, and super-dimensional operation and binding are performed on the multi-mode data, so that the multi-mode data is converted into unified super-dimensional vector representation.
The specific form of the memory center computing architecture is shown in fig. 6, where fig. 6 includes a conventional von neumann architecture, a memory center computing architecture, a detailed memory center computing engine structure, and the detailed memory center computing engine structure includes contents such as a physical partition, a relational partition, a physical processing unit, a relational processing unit, a computation scheduling and optimizing layer, and the like.
The process flow of the calculation query processing is shown in fig. 7, and specifically comprises the following steps of user query, query analysis and understanding, query planning and optimization, and memory center execution and result processing.
The process flow of cross-modal reasoning and association is shown in fig. 8, wherein the cross-modal mode mining, multi-hop reasoning, knowledge verification and enhancement and other processes are involved.
It should be noted that, the specific implementation principles in the foregoing embodiments have been described in detail in fig. 4 to 8, and will not be described herein.
In general, the cross-domain knowledge graph system based on the multi-modal calculation in the embodiment of the invention realizes unified representation of multi-modal data (text, image, audio, video and the like) through a multi-dimensional calculation technology, adopts a memory center calculation architecture to directly execute calculation operation at a data storage position, supports complex statistics and aggregation query, discovers implicit association and new knowledge through cross-modal reasoning, and all modules are connected with each other through standardized interfaces to form complete data and calculation flow. The method breaks through the limitations of the traditional personal knowledge base in the aspects of single content type, limited retrieval mode, insufficient computing capacity and the like, provides a comprehensive, efficient and intelligent knowledge management solution, and is suitable for various scenes such as personal knowledge arrangement, academic research, enterprise knowledge management and the like.
Corresponding to the above-mentioned cross-domain knowledge graph system based on the multi-modal calculation in the embodiment of the present invention, referring to fig. 9, the embodiment of the present invention further provides a flowchart of a data processing method, where the data processing method is applicable to the cross-domain knowledge graph system based on the multi-modal calculation in the above-mentioned embodiment, and the data processing method includes:
step S901, acquiring multi-modal data from multiple paths and converting the multi-modal data into a super-dimensional vector.
In the specific implementation process of step S901, multi-modal features are extracted from multi-modal data, the multi-modal features are mapped to the same intermediate dimension by using a projection layer, base vectors are allocated to the multi-modal features, the multi-modal features mapped to the intermediate dimension are mapped to a super-dimensional space by using the base vectors to obtain super-dimensional vectors, and the super-dimensional vectors are normalized.
In other embodiments, vector binding, relational encoding, context integration, and metadata attachment are performed on the normalized md vector.
Step S902, identifying a multi-modal entity based on the super-dimensional vector, and constructing a multi-modal knowledge graph by utilizing the multi-modal entity.
In the specific implementation process of step S902, a multi-modal relationship is extracted from the multi-modal entity, the multi-modal relationship is verified and evaluated, and a multi-modal knowledge graph is constructed by using the multi-modal entity and combining the multi-modal relationship through verification and evaluation.
Step 903, analyzing and optimizing the user' S query, executing the optimized query based on the multi-modal knowledge graph to obtain an execution result, wherein the query uses one or more preset query modes.
In the specific implementation process of step S903, the user' S query is parsed to obtain a parsed result, an execution plan is generated according to the parsed result, and the query is optimized, wherein the optimization operation comprises a rewriting operation, an index matching operation and a parallel execution strategy generation operation, and the optimized query is executed based on the execution plan and the multi-modal knowledge graph to obtain an execution result.
Preferably, in other embodiments, the execution results are aggregated, converted, and validated, and the aggregated, converted, and validated execution results are output.
Preferably, in other embodiments, a memory hub architecture for storing at least the multimodal knowledge-graph is constructed and a computational scheduling and optimization mechanism is configured.
Preferably, in other embodiments, the implicit mode of the multi-modal knowledge graph is identified and associated knowledge is generated, reasoning is performed according to the implicit mode and the associated knowledge to obtain a reasoning result, and the multi-modal knowledge graph is enhanced and evolved by using the reasoning result.
The specific implementation mode for obtaining the reasoning result is to conduct reasoning according to the implicit mode and the associated knowledge and combining a preset reasoning mechanism and a multi-hop reasoning strategy to obtain the reasoning result.
It should be noted that, the execution principles of step S901 to step S903 have been described in detail in the above embodiments of the "cross-domain knowledge graph system based on the multi-modal calculation in a super-dimension" and will not be described herein.
In summary, the embodiment of the invention provides a cross-domain knowledge graph system and a data processing method based on super-dimensional multi-mode calculation, which are used for constructing a multi-mode knowledge graph by utilizing multi-mode data, analyzing and optimizing the query using at least one query mode, executing the optimized query based on the multi-mode knowledge graph to obtain an execution result, supporting various types of data, expanding a retrieval mode and improving the calculation capability, and further improving the practicability and the user experience of a knowledge base system.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.