[go: up one dir, main page]

CN119396919B - Automatic metadata management method, system, equipment and storage medium - Google Patents

Automatic metadata management method, system, equipment and storage medium Download PDF

Info

Publication number
CN119396919B
CN119396919B CN202510014520.6A CN202510014520A CN119396919B CN 119396919 B CN119396919 B CN 119396919B CN 202510014520 A CN202510014520 A CN 202510014520A CN 119396919 B CN119396919 B CN 119396919B
Authority
CN
China
Prior art keywords
data
metadata
unit
meta
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510014520.6A
Other languages
Chinese (zh)
Other versions
CN119396919A (en
Inventor
袁存发
张海东
郑豹
张强
何招亮
江家杰
岳希沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Luculent Smart Technologies Co ltd
Original Assignee
Luculent Smart Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Luculent Smart Technologies Co ltd filed Critical Luculent Smart Technologies Co ltd
Priority to CN202510014520.6A priority Critical patent/CN119396919B/en
Publication of CN119396919A publication Critical patent/CN119396919A/en
Application granted granted Critical
Publication of CN119396919B publication Critical patent/CN119396919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2448Query languages for particular applications; for extensibility, e.g. user defined types
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种自动化元数据管理方法、系统、设备和存储介质,涉及数据治理、元数据管理技术领域,包括系统对不同数据源进行适配,通过数据源配置自动连接数据库,根据元模型和采集任务配置,将与元模型匹配的元数据自动采集到元数据库中;定义元模型库自动从数据库中采集元模型的当前元数据信息,并校验采集的元数据是否需要入库和升版;通过关联元数据库中的表和字段进行数据采集,配置过滤条件后使用数据开发组件和将数据自动采集数据仓库并进行开发,自动构建血缘关系。本发明减少人工干预,提高元数据管理的效率,确保元数据的一致性和准确性,建立元数据表的血缘关系,清晰展示数据在不同表间的流转和依赖关系,方便数据的管理和追溯。

The present invention discloses an automated metadata management method, system, device and storage medium, and relates to the technical fields of data governance and metadata management, including the system adapting different data sources, automatically connecting to a database through data source configuration, automatically collecting metadata matching the metamodel into a metadata database according to a metamodel and collection task configuration; defining a metamodel library to automatically collect current metadata information of a metamodel from a database, and verifying whether the collected metadata needs to be stored and upgraded; collecting data by associating tables and fields in a metadata database, using data development components and automatically collecting data from a data warehouse and developing after configuring filtering conditions, and automatically building a blood relationship. The present invention reduces manual intervention, improves the efficiency of metadata management, ensures the consistency and accuracy of metadata, establishes blood relationships of metadata tables, clearly displays the flow and dependency of data between different tables, and facilitates data management and tracing.

Description

Automatic metadata management method, system, equipment and storage medium
Technical Field
The invention belongs to the technical field of power engineering, and particularly relates to an automatic metadata management method, an automatic metadata management system, automatic metadata management equipment and a storage medium.
Background
In modern power systems, stable operation of the turbine generator set is critical to ensure power supply continuity and reliability. However, the units inevitably encounter equipment failure, maintenance requirements or outages caused by external disturbances during long-term operation. According to regulations in the power industry, the shutdown of a generator set is classified into two types, namely planned shutdown and unplanned shutdown. Planned outages are pre-scheduled shutdown operations, typically for equipment maintenance or system upgrades, intended to ensure long-term safe and stable operation of the unit. Unplanned outages are unintended outages due to sudden failures or other unexpected events, which not only affect the stability of the power supply, but may also pose a threat to the overall safety of the power system.
The traditional unit outage monitoring and management method mainly relies on manual inspection and data recording. Although this approach can meet basic management requirements to some extent, with the complexity of the power system and the dramatic increase in data volume, this manual monitoring approach faces serious efficiency and accuracy issues.
The large amount of data needs to be manually collected and processed, which is time-consuming and labor-consuming, and also easily causes data omission or errors, thereby affecting timely reporting and processing of outage events.
Because the information after the unit fails can not be uploaded in time due to the fact that manual filling is relied on, scheduling and power balance can be adversely affected, and the operation pressure of a power grid is increased.
The manual operation is easy to cause missing report or false report, so that the supervision department can not accurately grasp the real running condition of the unit, thereby influencing the implementation of decision-making and follow-up measures.
In view of the above problems, the invention provides a model-driven unit non-stop supervision method, which aims to overcome the defects of the traditional method by constructing an intelligent model and automatically processing real-time data, and realize accurate monitoring of the running state of a generator unit and automatic classification and reporting of a stop event.
Disclosure of Invention
The present invention has been made in view of the above-described problems.
Therefore, the technical problem solved by the invention is that the existing method for monitoring the shutdown of the generator set has the problems of low manual monitoring efficiency, information feedback lag and insufficient data accuracy, and the optimization problem of real-time monitoring and automatic classification of the running state of the generator set is realized.
The technical scheme includes that the automatic metadata management method comprises the steps of performing data source adaptation, connecting a database, collecting metadata matched with a meta model, defining a meta model base, collecting current meta data information of the meta model through the database, checking whether the collected meta data need to be put in storage and lifted, automatically collecting data in a data warehouse for data development, obtaining index information appointed by a user to construct blood-edge relation, and predicting dependency relation positioning abnormal data.
As a preferred embodiment of an automated metadata management method according to the present invention, the collecting metadata matched with a meta model includes,
The metadata acquisition comprises data source definition, acquisition task configuration, acquisition task execution and metadata information warehousing;
the acquisition task configuration comprises data sources, meta-model classification, acquisition strategies, automatic warehousing, acquisition frequency and acquisition content;
generating a timing task of the acquisition task after the acquisition task is released, automatically starting the acquisition task when the acquisition task reaches a designated time, and acquiring and warehousing metadata according to a configured meta model;
And the acquisition content is that when a new operation is clicked, the configured meta model is selected from a meta model library, and after the meta model is added, the acquisition task acquires the meta data information of the configured meta model.
As a preferable scheme of the automatic metadata management method, the method for verifying whether the collected metadata needs to be put in storage and lifted comprises the following steps of,
Inquiring metadata acquired before the acquisition task and acquiring a metadata object when the acquisition task is started by itself;
Storing ids of all metadata objects as keywords into a metadata set map;
the metadata acquisition task acquires metadata information corresponding to the meta model from the databaseMetaData object according to the meta model;
Metadata information generates a metadata object, defines the id of the metadata object as a keyword, and inquires whether the metadata set map contains the keyword of the current metadata object;
When the map does not contain the key words, the current metadata object does not exist in the metadata database, and data insertion is performed.
As a preferable scheme of the automatic metadata management method, the invention comprises the steps of checking whether the collected metadata needs to be put in storage and lifted, and further comprises,
When the map contains the key words, judging whether the current metadata object is equal to the metadata object in the metadata set map or not;
If the current metadata object is equal to the metadata object in the metadata set map, no processing is needed;
if the current metadata object is not equal to the metadata object in the metadata set map, recording the metadata object in the metadata database as a historical version, and warehousing the current metadata object as the latest version, wherein the version number is increased by 1.
As a preferred embodiment of an automated metadata management method according to the present invention, the data development includes,
Data acquisition is carried out through tables and fields in the associated metadata base;
storing all data in the metadata corresponding table into a data warehouse;
The data warehouse includes ODS, DWD, DWS, ADS;
Determining metadata collected in the system according to the data source;
automatically writing the acquired result into an ODS layer of a specified data warehouse by the provided output component;
Cleaning and converting, and automatically writing into the DWD layer through an output assembly;
performing preliminary aggregation and calculation according to the user data and writing the data into a DWS layer of a designated data warehouse;
carrying out high latitude aggregation and operation on the data of the DWS layer again and writing the data into the ADS layer;
a chart is automatically generated by a chart component and presented on a web page.
As a preferred embodiment of an automated metadata management method according to the present invention, the constructing the blood-lineage relationship includes,
In the process of data development in a data warehouse, establishing a blood-edge relation between a metadata table and fields, intercepting SQL execution sentences by using SpringAOP, pushing the intercepted SQL sentences to a Kafka message queue, and automatically identifying the data blood-edge relation;
Screening the intercepted SQL to obtain a query statement, analyzing the SQL query statement, extracting table names, column names, operation types and conditions, generating a lexical unit, and converting the analyzed lexical unit into vectors to be expressed as:
vw=embedding(w)
wherein v w is a vector representation of the word w, constructing an input sequence;
constructing an LSTM model, which comprises a forgetting layer, an input layer, an LSTM layer and an output layer;
The known dependency relationship is extracted from the historical SQL query and log data to serve as training data, output is calculated, and error loss is calculated by using a cross entropy loss function:
loss=-∑yilog(Pi)
gradient was calculated by a back propagation algorithm:
model parameters are updated using Adam optimizer, minimizing loss function a:
wherein y i is the real label of the ith metadata, P i is the prediction probability output of the ith metadata, eta is the learning rate, and i is the variable index;
Converting the new SQL query or log data into a vector sequence, and inputting the vector sequence into a trained LSTM model;
predicting the dependency relationship between data entities by taking the value of an LSTM model output gate as an input value, and constructing a data blood-margin relationship graph;
and storing the constructed data blood relationship into a Neo4j graph database according to the data blood relationship graph, wherein the data blood relationship comprises a table, a field and an operation type.
As a preferred embodiment of the automated metadata management method according to the present invention, the predicting dependency relationship includes,
Judging the dependency relationship through a three-stage dependency model;
in the first stage of the input model, similarity S between data is calculated:
Wherein x i and x j represent the ith and jth data entities, ε is a scaling factor of similarity, and the control distance affects the degree of similarity, d i-dj represents Euclidean distance of feature vectors of data objects d i and d j, the power parameter in the calculation of p distance, ζ is the weight of feature contribution, and the influence of metadata on similarity calculation is determined, U is the total amount of data entities, and i and j are variable indexes;
After the similarity S is obtained, taking the value of an LSTM model output gate as an input value to perform time dynamic modeling D ij (t) of the second stage, integrating the dynamic dependency relationship D ij (t), and generating a third-stage global dependency relationship matrix E:
Wherein E represents the dependency relationship among all data entities, F is a normalization factor, and the value of the matrix is ensured to be in the range of [0,1], I is an index function, when the condition is satisfied, the return value is 1, otherwise 0;S y is a similarity threshold value, and the entity considered as related is determined;
When E >0.75, there is a dependency, and when E.ltoreq.0.75, there is no dependency.
Another object of the present invention is to provide an automated metadata management system, which implements automated collection, management, development and monitoring of metadata to improve the efficiency and quality of data governance, and support data-driven business decision and process optimization. The system solves the technical problems of data source diversity adaptation, metadata acquisition automation, metadata model management, data development and processing efficiency, data blood relationship tracking, abnormal data processing and the like, and by the aid of the technical solutions, the system aims at reducing complexity and cost of metadata management, improving automation level and efficiency of data management, supporting improvement of data quality and data safety, and providing an accurate and reliable data basis for data analysis and business decision.
The technical scheme includes that the automatic metadata management system comprises a data acquisition and adaptation module, a metadata management module, a data development and processing module and a blood relationship management module, wherein the data acquisition and adaptation module is responsible for adapting and acquiring metadata from different data sources, the data acquisition and adaptation module is connected with a database through a configuration data source, matched metadata is automatically acquired into the metadata database according to metadata and acquisition task configuration, the metadata management module is responsible for defining and managing a metadata model library, automatically acquiring metadata information and checking to determine whether warehousing or upgrading is needed, the data development and processing module is used for carrying out development processing on acquired data, the data extraction, conversion and loading into a data warehouse, and carrying out data enhancement and implicit feature recognition, and the blood relationship management module is used for constructing a data blood relationship and predicting a dependency relationship.
As a preferred embodiment of an automated metadata management system according to the present invention, the predicting dependency relationship includes,
The data acquisition and adaptation module comprises a data source adaptation unit, an acquisition task management unit and a metadata acquisition unit, wherein the data source adaptation unit provides connection information for the acquisition task management unit, and the acquisition task management unit calls the metadata acquisition unit to acquire data;
The meta model management module comprises a meta model library management unit, a meta data verification unit and a meta data version control unit, wherein the meta data acquisition unit transmits acquired meta data to the meta data verification unit, the meta data verification unit interacts with the meta model library management unit, and the meta data version control unit carries out version management according to a verification result;
The data development and processing module comprises a data extraction unit, a data conversion unit, a data loading unit, a data enhancement unit and a feature recognition unit, wherein the data extraction unit extracts data and then transmits the data to the data conversion unit, the data processed by the data conversion unit is loaded to a data warehouse by the data loading unit, and the data enhancement unit and the feature recognition unit provide support in the data processing process;
The blood relationship management module comprises a blood relationship construction unit and a dependency relationship prediction unit, wherein the blood relationship construction unit generates data blood relationship through an analysis tool, and the dependency relationship prediction unit predicts the dependency relationship by using the information.
A computer device comprising a memory and a processor, said memory storing a computer program, characterized in that said processor, when executing said computer program, implements the steps of an automated metadata management system for configuring said method.
A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of an automated metadata management system configuration method.
The metadata management method has the beneficial effects that manual intervention is reduced and metadata management efficiency is improved through an automatic metadata acquisition and management method. By defining a unified meta model and automatically establishing the relationship between metadata, consistency and accuracy of the metadata are ensured. By automatically analyzing the SQL script, the blood relationship between metadata tables is established, so that the circulation and the dependency relationship of data between different tables can be clearly displayed, and the management and the tracing of the data are convenient. The system can adapt to different types of data sources and application scenes, and has strong adaptability and expansibility.
Drawings
For a clearer description of the technical solutions of embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art, wherein:
FIG. 1 is a general flow chart of an automated metadata management method based on model matching according to one embodiment of the present invention.
Fig. 2 is a metadata intelligent management flowchart of an automated metadata management method based on model matching according to an embodiment of the present invention.
Fig. 3 is a metadata checksum lifting flow chart of an automated metadata management method based on model matching according to an embodiment of the present invention.
FIG. 4 is a block diagram of a system scenario for an automated metadata management system based on model matching according to one embodiment of the present invention.
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Embodiment 1, referring to fig. 1-3, is a first embodiment of the present invention, which provides an automated metadata management method, including:
And S1, performing data source adaptation and connecting with a database, and collecting metadata matched with the meta model.
As shown in fig. 2, further metadata collection includes data source definition, collection task configuration, collection task execution, and metadata information warehousing.
The acquisition task configuration comprises a data source, meta-model classification, an acquisition strategy, automatic warehousing, acquisition frequency and acquisition content, a timing task of the acquisition task can be generated after the acquisition task is released, the acquisition task is started automatically after the appointed time is reached, metadata is acquired and warehoused according to the configured meta-model, the acquisition content is that when a new operation is clicked, the configured meta-model is selected from a meta-model library, and after the meta-model is added, the acquisition task acquires metadata information of the configured meta-model.
The meta-model is classified as a relational database and the acquisition strategy includes full or incremental updates.
The definition meta-model library comprises attributes of meta-models, combination relations of the meta-models and dependency relations of the meta-models.
The meta-model includes tables, fields, primary keys, unique constraints, foreign keys, indexes, views, libraries, inspection limits, function parameters, functions, stored procedure parameters, stored procedures, triggers, tablespaces, table partitions.
The attributes of the meta-model comprise table space, table type and annotation according to the corresponding characteristics of the meta-model type.
The combination relation of the metamodels comprises constructing a sub-metamodel of the current metamodel.
The dependency of the metamodel includes an associated metamodel that is affected by the current metamodel.
For example, if the attributes such as the field length and the field type of the table change, the field length and the field type of the view created according to the current table also change, that is, the dependency relationship of the meta model of the table includes the view.
The data source definition mainly contains database name, type, address, user name and password. The DRIVERMANAGER, CONNECTION and DatabaseMetaData classes provided by the platform using Java native JDBC can extract all metadata information of tables and fields in the database.
The drivermnager class provides a method getConnection for acquiring database connection, a database connection object can be acquired by transmitting a database address, a user name and a password into the method according to configuration information, and a databaseMetaData object can be acquired by using a GETMETADATA method provided in the Connect class, wherein databaseMetaData contains all metadata information of a database table, a field, a view and the like.
And S2, defining a meta-model library, collecting current meta-data information of the meta-model in the database, and checking whether the collected meta-data needs to be put in storage and lifted.
As shown in fig. 3, further, when the collection task is started, the metadata collected before the collection task is queried and obtained to obtain a metadata object a, the id of the metadata object is used as a keyword to be stored in a Map set provided by Java, and after all metadata records are stored in the Map, the metadata set Map is obtained:
map={IDi:Ai|Ai∈A}
Wherein, A is the metadata object set queried before the current acquisition task, map is the metadata object set, A i is the ith metadata object, ID i is the ID of the ith metadata object, and i is the variable index.
The metadata acquisition task acquires metadata information B corresponding to the metadata from the databaseMetaData objects according to the metadata, generates a metadata object B j according to the metadata, takes the ID of the metadata object B j as a keyword ID j, and inquires whether the metadata set map contains a metadata ID j:
When the map does not contain the ID j, the metadata B j does not exist in the metadata base, and the data insertion operation is performed.
When the map contains an ID j, whether metadata B j and metadata object maps (ID j) in metadata set maps are equal or not is judged, when B j=map(IDj) is carried out, processing is not needed, when B j≠map(IDj) is carried out, map (ID j) in a metadata database is recorded as a historical version, newly collected metadata B j is put in storage as the latest version, and meanwhile, the version number is increased by 1.
Where s 1 and s 2 are two metadata objects, respectively, return 1 if the metadata objects are equal, and return 0 if they are not equal.
val=δ(map(IDj),Bj)
Where val is a comparison result of map (ID j) and Bj, if 1, equal, if 0, unequal, B j is the jth metadata in metadata information B acquired according to the meta model, ID j is the ID of metadata B j, map (ID j) is a metadata record identical to the ID of B j, and j is a variable index.
And S3, automatically collecting data in a data warehouse to develop the data.
Furthermore, the metadata is automatically extracted according to the data source configuration and the metadata information, all data of the metadata corresponding table are stored in a data warehouse, and the data warehouse is divided into ODS, DWD, DWS, ADS layers.
According to metadata collected in a data source determining system, a provided output assembly automatically writes collected results into an ODS layer of a specified data warehouse, cleaning and converting the collected results and automatically writing the collected results into a DWD layer through the output assembly, a data treatment module carries out preliminary aggregation and calculation according to user data and writes the data into a DWS layer of the specified data warehouse, carries out high-latitude aggregation and calculation on the data of the DWS layer again and writes the data into an ADS layer, and various charts are automatically generated and displayed on a webpage through a chart assembly provided by the system.
It should be noted that, in the process of data development in the data warehouse, a blood-edge relation between a metadata table and fields is established, springAOP is used for intercepting SQL execution sentences, the intercepted SQL sentences are pushed to a Kafka message queue, the data blood-edge relation is automatically identified, the intercepted SQL is screened to obtain a query sentence, the SQL query sentence is analyzed, table names, column names, operation types and conditions are extracted, a lexical unit is generated, and the analyzed lexical unit is converted into a vector to be expressed as:
vw=embedding(w)
where v w is a vector representation of the word w, an input sequence is constructed.
And constructing an LSTM model, wherein the LSTM model comprises a forgetting layer, an input layer, an LSTM layer and an output layer.
The forgetting layer:
gt=Sig(Gf·[ht-1,zt]+bf)
wherein G t is the activation value of the forgetting gate, G f is the weight matrix of the forgetting gate, h t-1 is the hidden state of the previous moment, z t is the input vector of the current moment, and b f is the bias vector of the forgetting gate.
The input layer:
int=Sig(Wi·[ht-1,zt]+bi)
where in t is the activation value of the input gate, Is a candidate cell state, W i and W C are input gates and weight matrices of candidate cell states, and b i and b c are bias vectors of input gates and candidate cell states.
Cell status update:
Wherein, C t is the cell state at the current time and C t-1 is the cell state at the previous time.
The output layer:
ot=Sig(Wo·[ht-1,zt]+bo)
ht=ot*tanh(Ct)
Where o t is the activation value of the output gate, h t is the hidden state at the current time, W o is the weight matrix of the output gate, and b o is the bias vector of the output gate.
According to metadata collected in a data source determining system, the metadata is automatically associated to table data by selecting collected table names and field names in the metadata, and a collected metadata set map= { ID i } is specified to enhance metadata information:
Wherein map' is the enhanced metadata set, a j is any metadata object, α j is the scaling parameter of the jth metadata, β j is the translation parameter of the jth metadata, μ j is the mean value of the jth metadata, and σ j is the standard deviation of the jth metadata.
Identifying the implicit features of the data through a dynamic data classification algorithm, performing sparse coding map 'on the implicit features by using the enhanced map' ij, and training a logistic regression model by using sparse representation to obtain a predicted output P i of each sample:
Wherein y i is the real label of the ith metadata, P i is the predictive probability output of the ith metadata, sig is a Sigmoid function, which is commonly used for logistic regression, the output value is between (0, 1), W Z T is the transposition of a dictionary matrix W, the model is prevented from being over-fitted, n is the total amount of a metadata set, i and j are variable indexes, W Z is the dictionary matrix containing all base vectors, lambda Z is a regularization parameter, and the sparsity of Z is controlled.
Based on the predicted output of the model, filtering condition configuration is carried out, and an output component provided by the system automatically writes the acquired result into the ODS layer of the specified data warehouse.
The data input component is provided in the component, the component configures the filtering conditions of the regular expression, the appointed value and the custom function, the data to be cleaned are obtained according to the configured filtering conditions for cleaning and conversion, and the data map clean after conversion cleaning is automatically written into the DWD layer through the output component.
The data management module carries out preliminary aggregation and calculation according to the user data, writes the obtained preliminary aggregation and calculation results into a DWS layer of a designated data warehouse through an output component, carries out high-latitude aggregation and calculation on the data of the DWS layer again according to the decision requirement of a user, finally writes the data into an ADS layer, automatically generates various charts through a chart component provided by a system and displays the charts on a webpage.
It should be noted that, in the process of data development in the data warehouse, a blood-edge relationship between a metadata table and fields is established, springAOP is used for intercepting SQL execution sentences, the intercepted SQL sentences are pushed to a Kafka message queue, the data blood-edge relationship is automatically identified, the intercepted SQL is screened to obtain a query sentence, the SQL query sentence is analyzed, table names, column names, operation types and conditions are extracted, a lexical unit is generated, and the analyzed lexical unit is converted into a vector to be expressed as:
vw=embedding(w)
wherein v w is a vector representation of the word w, constructing an input sequence;
constructing an LSTM model, which comprises a forgetting layer, an input layer, an LSTM layer and an output layer;
Extracting known dependency relations from historical SQL queries and log data as training data, calculating output and calculating error loss using a cross entropy loss function:
loss=-∑yilog(Pi)
gradient was calculated by a back propagation algorithm:
model parameters are updated using Adam optimizer, minimizing loss function a:
The method comprises the steps of (1) converting new SQL query or log data into vector sequences, inputting the vector sequences into a trained LSTM model, predicting the dependency relationship among data entities by taking the value of an output gate of the LSTM model as an input value, constructing a data blood-edge relationship graph, and storing the constructed data blood-edge relationship into a Neo4j graph database, wherein P i is the prediction probability output of the ith metadata, and eta is the learning rate.
And S4, when judging that the data development has problems, acquiring index information designated by a user, automatically constructing blood-margin relation, positioning abnormal data and processing the abnormal data.
Furthermore, the dependency relationship is judged through the three-stage dependency model, the similarity S between the data is calculated in the first stage of the input model:
Where x i and x j represent the ith and jth metadata entities, ε is the scaling factor of the similarity, and the degree of influence of distance on the similarity is controlled. The value d i-dj represents the Euclidean distance of the feature vectors of the data objects d i and d j, the power parameter in the calculation of the p distance, and ζ is the weight of the feature contribution to determine the influence of the metadata on the similarity calculation.
After the similarity S is obtained, taking the value of an LSTM model output gate as an input value to carry out time dynamic modeling of the second stage:
Wherein D ij (t) represents the dynamic dependency relationship between x i and x j at time t, ρ is the time decay coefficient, t is the current time, t tep is the reference time point, and is generally the start time of the analysis.
The third stage integration is carried out on the dynamic dependency relationship D ij (t), and a global dependency relationship matrix E is generated:
Wherein E represents the dependency relationship among all data entities, F is a normalization factor, and the value of the matrix is ensured to be in the range of [0,1 ]. I is an index function, returns a value of 1 when the condition is satisfied, and otherwise is 0.S y is a similarity threshold, determining the entity to be considered relevant.
When E >0.75, there is a dependency, and when E.ltoreq.0.75, there is no dependency.
Constructing a data blood-edge relation diagram E t:
Et=(x,bi),bi={(xi,xj)|Eij>0}
Wherein E t represents the relationship between entities, x represents the data entities in the graph, bi represents the dependency relationship between the data entities, E ij is an element in the global dependency relationship matrix, and represents the dependency of x i and x j.
Embodiment 2 referring to fig. 4, an embodiment of the present invention provides an automated metadata management system, which includes a data acquisition and adaptation module 100, a meta model management module 200, a data development and processing module 300, and a blood relationship management module 400.
The data acquisition and adaptation module 100 is used for adapting and acquiring metadata from different data sources, and automatically acquiring the matched metadata into a metadata base according to a metadata model and acquisition task configuration by configuring the data sources to be connected with the database.
Specifically, the system comprises a data source adapting unit, an acquisition task management unit and a metadata acquisition unit;
the data source adapting unit provides connection information for the acquisition task management unit, and the acquisition task management unit calls the metadata acquisition unit to acquire data.
The meta model management module 200 is used for defining and managing a meta model library, automatically collecting meta data information, and checking to determine whether warehousing or version up is required.
Specifically, the system comprises a meta model library management unit, a meta data verification unit and a meta data version control unit;
the metadata acquisition unit transmits the acquired metadata to the metadata verification unit, the metadata verification unit interacts with the metadata model library management unit, and the metadata version control unit carries out version management according to the verification result.
The data development and processing module 300 is used for performing development processing on the collected data, including data extraction, conversion, loading into a data warehouse, and performing data enhancement and implicit feature recognition.
Specifically, the device comprises a data extraction unit, a data conversion unit, a data loading unit, a data enhancement unit and a feature recognition unit;
The data extraction unit extracts data and transmits the data to the data conversion unit, the data processed by the data conversion unit is loaded to the data warehouse by the data loading unit, and the data enhancement unit and the feature recognition unit provide support in the data processing process.
The blood-edge relationship management module 400 is used for constructing data blood-edge relationships and predicting dependency relationships.
Specifically, the method comprises a blood relationship construction unit and a dependency relationship prediction unit;
the blood relationship construction unit generates data blood relationship by an analysis tool, and the dependency relationship prediction unit predicts the dependency relationship by using the information.
Example 3 an embodiment of the invention, which differs from the first two embodiments, is:
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium include an electrical connection (an electronic device) having one or more wires, a portable computer diskette (a magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of techniques known in the art, discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Claims (9)

1. An automated metadata management method is characterized by comprising,
Performing data source adaptation and connecting with a database, and collecting metadata matched with a meta model;
Defining a meta model library, and collecting current meta data information of a meta model in the database;
checking whether the collected metadata needs to be put in storage and lifted;
automatically acquiring data from a data warehouse to develop the data;
acquiring index information appointed by a user to construct a blood relationship, and predicting a dependency relationship;
the construction of the blood-lineage relationship includes,
In the process of data development in a data warehouse, establishing a blood-edge relation between a metadata table and fields, intercepting SQL execution sentences by using SpringAOP, pushing the intercepted SQL sentences to a Kafka message queue, and automatically identifying the data blood-edge relation;
Screening the intercepted SQL to obtain a query statement, analyzing the SQL query statement, extracting table names, column names, operation types and conditions, generating a lexical unit, and converting the analyzed lexical unit into vectors to be expressed as:
vw=embedding(w)
wherein v w is a vector representation of the word w, constructing an input sequence;
constructing an LSTM model, which comprises a forgetting layer, an input layer, an LSTM layer and an output layer;
The known dependency relationship is extracted from the historical SQL query and log data to serve as training data, output is calculated, and error loss is calculated by using a cross entropy loss function:
loss=-∑yilog(Pi)
gradient was calculated by a back propagation algorithm:
model parameters are updated using Adam optimizer, minimizing loss function a:
wherein y i is the real label of the ith metadata, P i is the prediction probability output of the ith metadata, eta is the learning rate, and i is the variable index;
Converting the new SQL query or log data into a vector sequence, and inputting the vector sequence into a trained LSTM model;
predicting the dependency relationship between data entities by taking the value of an LSTM model output gate as an input value, and constructing a data blood-margin relationship graph;
Storing the constructed data blood relationship into a Neo4j graph database according to the data blood relationship graph, wherein the data blood relationship comprises a table, a field and an operation type;
the prediction dependency relationship may be determined, for example, by,
Judging the dependency relationship through a three-stage dependency model;
in the first stage of the input model, similarity S between data is calculated:
Wherein x i and x j represent the ith and jth data entities, ε is a scaling factor of similarity, and the control distance affects the degree of similarity, d i-dj represents Euclidean distance of feature vectors of data objects d i and d j, the power parameter in the calculation of p distance, ζ is the weight of feature contribution, and the influence of metadata on similarity calculation is determined, U is the total amount of data entities, and i and j are variable indexes;
After the similarity S is obtained, taking the value of an LSTM model output gate as an input value to perform time dynamic modeling D ij (t) of the second stage, integrating the dynamic dependency relationship D ij (t), and generating a third-stage global dependency relationship matrix E:
Wherein E represents the dependency relationship among all data entities, F is a normalization factor, and the value of the matrix is ensured to be in the range of [0,1], I is an index function, when the condition is satisfied, the return value is 1, otherwise 0;S y is a similarity threshold value, and the entity considered as related is determined;
When E >0.75, there is a dependency, and when E.ltoreq.0.75, there is no dependency.
2. An automated metadata management method according to claim 1 wherein said collecting metadata matching a meta-model comprises,
The metadata acquisition comprises data source definition, acquisition task configuration, acquisition task execution and metadata information warehousing;
the acquisition task configuration comprises data sources, meta-model classification, acquisition strategies, automatic warehousing, acquisition frequency and acquisition content;
generating a timing task of the acquisition task after the acquisition task is released, automatically starting the acquisition task when the acquisition task reaches a designated time, and acquiring and warehousing metadata according to a configured meta model;
And the acquisition content is that when a new operation is clicked, the configured meta model is selected from a meta model library, and after the meta model is added, the acquisition task acquires the meta data information of the configured meta model.
3. An automated metadata management method according to claim 2 wherein said verifying whether the collected metadata requires warehousing and upscaling comprises,
Inquiring metadata acquired before the acquisition task and acquiring a metadata object when the acquisition task is started by itself;
Storing ids of all metadata objects as keywords into a metadata set map;
the metadata acquisition task acquires metadata information corresponding to the meta model from the databaseMetaData object according to the meta model;
Metadata information generates a metadata object, defines the id of the metadata object as a keyword, and inquires whether the metadata set map contains the keyword of the current metadata object;
When the map does not contain the key words, the current metadata object does not exist in the metadata database, and data insertion is performed.
4. An automated metadata management method according to claim 3 wherein said verifying whether the collected metadata requires warehousing and upscaling further comprises,
When the map contains the key words, judging whether the current metadata object is equal to the metadata object in the metadata set map or not;
If the current metadata object is equal to the metadata object in the metadata set map, no processing is needed;
if the current metadata object is not equal to the metadata object in the metadata set map, recording the metadata object in the metadata database as a historical version, and warehousing the current metadata object as the latest version, wherein the version number is increased by 1.
5. An automated metadata management method according to claim 4, wherein said data development comprises,
Data acquisition is carried out through tables and fields in the associated metadata base;
storing all data in the metadata corresponding table into a data warehouse;
The data warehouse includes ODS, DWD, DWS, ADS;
Determining metadata collected in the system according to the data source;
automatically writing the acquired result into an ODS layer of a specified data warehouse by the provided output component;
Cleaning and converting, and automatically writing into the DWD layer through an output assembly;
performing preliminary aggregation and calculation according to the user data and writing the data into a DWS layer of a designated data warehouse;
carrying out high latitude aggregation and operation on the data of the DWS layer again and writing the data into the ADS layer;
a chart is automatically generated by a chart component and presented on a web page.
6. An automated metadata management system is provided that, it is characterized by comprising the following steps of,
The data acquisition and adaptation module (100) is used for adapting and acquiring metadata from different data sources, connecting the data sources with a database through configuration data sources, and automatically acquiring the matched metadata into the metadata database according to a metadata model and acquisition task configuration;
the meta model management module (200) is used for defining and managing a meta model library, automatically collecting meta data information and checking to determine whether warehousing or version up is needed;
the data development and processing module (300) is used for carrying out development processing on the acquired data, including data extraction, conversion, loading into a data warehouse and carrying out data enhancement and implicit characteristic recognition;
the blood relationship management module (400) is used for constructing a data blood relationship and predicting a dependency relationship, and specifically comprises the following steps:
In the process of data development in a data warehouse, establishing a blood-edge relation between a metadata table and fields, intercepting SQL execution sentences by using SpringAOP, pushing the intercepted SQL sentences to a Kafka message queue, and automatically identifying the data blood-edge relation;
Screening the intercepted SQL to obtain a query statement, analyzing the SQL query statement, extracting table names, column names, operation types and conditions, generating a lexical unit, and converting the analyzed lexical unit into vectors to be expressed as:
vw=embedding(w)
wherein v w is a vector representation of the word w, constructing an input sequence;
constructing an LSTM model, which comprises a forgetting layer, an input layer, an LSTM layer and an output layer;
The known dependency relationship is extracted from the historical SQL query and log data to serve as training data, output is calculated, and error loss is calculated by using a cross entropy loss function:
loss=-∑yilog(Pi)
gradient was calculated by a back propagation algorithm:
model parameters are updated using Adam optimizer, minimizing loss function a:
wherein y i is the real label of the ith metadata, P i is the prediction probability output of the ith metadata, eta is the learning rate, and i is the variable index;
Converting the new SQL query or log data into a vector sequence, and inputting the vector sequence into a trained LSTM model;
predicting the dependency relationship between data entities by taking the value of an LSTM model output gate as an input value, and constructing a data blood-margin relationship graph;
Storing the constructed data blood relationship into a Neo4j graph database according to the data blood relationship graph, wherein the data blood relationship comprises a table, a field and an operation type;
the prediction dependency relationship may be determined, for example, by,
Judging the dependency relationship through a three-stage dependency model;
in the first stage of the input model, similarity S between data is calculated:
Wherein x i and x j represent the ith and jth data entities, ε is a scaling factor of similarity, and the control distance affects the degree of similarity, d i-dj represents Euclidean distance of feature vectors of data objects d i and d j, the power parameter in the calculation of p distance, ζ is the weight of feature contribution, and the influence of metadata on similarity calculation is determined, U is the total amount of data entities, and i and j are variable indexes;
After the similarity S is obtained, taking the value of an LSTM model output gate as an input value to perform time dynamic modeling D ij (t) of the second stage, integrating the dynamic dependency relationship D ij (t), and generating a third-stage global dependency relationship matrix E:
Wherein E represents the dependency relationship among all data entities, F is a normalization factor, and the value of the matrix is ensured to be in the range of [0,1], I is an index function, when the condition is satisfied, the return value is 1, otherwise 0;S y is a similarity threshold value, and the entity considered as related is determined;
When E >0.75, there is a dependency, and when E.ltoreq.0.75, there is no dependency.
7. The automated metadata management system of claim 6 wherein the data acquisition and adaptation module comprises a data source adaptation unit, an acquisition task management unit, and a metadata acquisition unit, wherein the data source adaptation unit provides connection information to the acquisition task management unit, and the acquisition task management unit invokes the metadata acquisition unit to perform data acquisition;
The meta model management module comprises a meta model library management unit, a meta data verification unit and a meta data version control unit, wherein the meta data acquisition unit transmits acquired meta data to the meta data verification unit, the meta data verification unit interacts with the meta model library management unit, and the meta data version control unit carries out version management according to a verification result;
The data development and processing module comprises a data extraction unit, a data conversion unit, a data loading unit, a data enhancement unit and a feature recognition unit, wherein the data extraction unit extracts data and then transmits the data to the data conversion unit, the data processed by the data conversion unit is loaded to a data warehouse by the data loading unit, and the data enhancement unit and the feature recognition unit provide support in the data processing process;
The blood relationship management module comprises a blood relationship construction unit and a dependency relationship prediction unit, wherein the blood relationship construction unit generates data blood relationship through an analysis tool, and the dependency relationship prediction unit predicts the dependency relationship by using the information.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the one automated metadata management method of any of claims 1 to 5 when the computer program is executed.
9. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of an automated metadata management method according to any of claims 1 to 5.
CN202510014520.6A 2025-01-06 2025-01-06 Automatic metadata management method, system, equipment and storage medium Active CN119396919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510014520.6A CN119396919B (en) 2025-01-06 2025-01-06 Automatic metadata management method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510014520.6A CN119396919B (en) 2025-01-06 2025-01-06 Automatic metadata management method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN119396919A CN119396919A (en) 2025-02-07
CN119396919B true CN119396919B (en) 2025-04-11

Family

ID=94426862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510014520.6A Active CN119396919B (en) 2025-01-06 2025-01-06 Automatic metadata management method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN119396919B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925054A (en) * 2022-05-24 2022-08-19 浪潮软件科技有限公司 Meta-model-based metadata management system and method
CN116680354A (en) * 2023-06-07 2023-09-01 合肥国轩高科动力能源有限公司 Lithium battery manufacturing metadata management method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11908573B1 (en) * 2020-02-18 2024-02-20 C/Hca, Inc. Predictive resource management
CN115344635A (en) * 2022-08-18 2022-11-15 苏州启数道数据科技有限公司 Data object-based data asset platform construction method
CN118170685B (en) * 2024-05-09 2024-07-30 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) An automated testing platform and method for an adaptive operating system environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925054A (en) * 2022-05-24 2022-08-19 浪潮软件科技有限公司 Meta-model-based metadata management system and method
CN116680354A (en) * 2023-06-07 2023-09-01 合肥国轩高科动力能源有限公司 Lithium battery manufacturing metadata management method and system

Also Published As

Publication number Publication date
CN119396919A (en) 2025-02-07

Similar Documents

Publication Publication Date Title
Diba et al. Extraction, correlation, and abstraction of event data for process mining
CN113779272A (en) Data processing method, device and equipment based on knowledge graph and storage medium
CN119807960A (en) A data anomaly diagnosis method and system based on knowledge graph and large model
CN120354833B (en) Table data processing method and system based on RAG
CN121051103B (en) Method for carrying out data cleaning decision and automatic tuning by using AI workflow
CN119201978A (en) SQL cross-database conversion method, device, equipment and storage medium
CN119807190A (en) A multi-dimensional data blood relationship tracking method and system
CN120508569B (en) SQL optimization interaction method and device based on deep learning framework large model
CN120162353A (en) A method and system for classifying and retrieving technical achievements
CN119396919B (en) Automatic metadata management method, system, equipment and storage medium
CN118966201B (en) Data lineage generation method, device, equipment and medium with verification mechanism
CN121073292B (en) Method and system for constructing full-flow quality tracking supervision of data annotation item
CN121454617B (en) Massive seismic waveform data analysis multi-agent autonomous evolution method and system
CN120849301B (en) Test case multiplexing automatic management method, system, equipment and medium
CN121187955A (en) A method for generating test cases for graphical user interfaces based on a multi-agent framework
CN121579642A (en) Business-driven data analysis method and system based on large model intelligent agent
CN121301125A (en) Methods, devices, electronic equipment and computer-readable storage media for verifying embedded points
CN121581046A (en) Audit knowledge graph full-flow construction method and system based on large model
CN121636659A (en) An AI-based method for optimizing technical consultation responses
CN121279408A (en) Knowledge graph quality evaluation and self-repair intelligent management system and method
CN120931269A (en) Predictive maintenance method based on elevator operation and maintenance time sequence knowledge graph
CN121434074A (en) An Adaptive Evolutionary Generation Method for Software Test Cases
CN121144160A (en) A method, apparatus, device, and storage medium for analyzing SQL statements.
CN121705416A (en) Data retrieval method and device based on meta-model semantic analysis
CN119106099A (en) Aquaculture feedback big data mining method and system based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant