CN114168573B - A data quality governance method based on programmable components - Google Patents
A data quality governance method based on programmable components Download PDFInfo
- Publication number
- CN114168573B CN114168573B CN202010949136.2A CN202010949136A CN114168573B CN 114168573 B CN114168573 B CN 114168573B CN 202010949136 A CN202010949136 A CN 202010949136A CN 114168573 B CN114168573 B CN 114168573B
- Authority
- CN
- China
- Prior art keywords
- data
- component
- sub
- evaluation
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Databases & Information Systems (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Stored Programmes (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a data quality control method based on an orchestratable component, which comprises a data acquisition component, a data evaluation component, a data modification component, a data analysis component, a flow improvement component, a data storage component and a data destruction component, wherein the data acquisition component, the data evaluation component, the data modification component, the data analysis component, the flow improvement component, the data storage component and the data destruction component are orchestrated according to a data quality control range and a quality control target formulated by a data definition component. The invention has the advantages that each part for data quality management is divided into loose components, when the data quality management is carried out, the required components are arranged and used according to the data quality management range and the quality management target formulated in the data management definition components, so that the components are mutually cooperated to avoid waste of flow and time, improve the data quality management efficiency, realize scattered storage of data, improve the extraction efficiency of the data and avoid the bottleneck of data extraction performance when processing massive data.
Description
Technical Field
The invention relates to the field of data management, in particular to a data quality management method based on an orchestratable component.
Background
Currently, for the field of data quality optimization, a centralized data quality management system is mainly used in the industry. The traditional centralized data quality management system realizes the capabilities of standardized management of the check rules, regular execution time scheduling, unified management of data quality reports and the like, and improves the efficiency and management level of data quality check.
The centralized data quality management system has the limitations that the quality management flow is fixed, the flow can not be freely arranged according to different data quality treatment ranges and quality treatment targets, the waste of flow and data quality treatment time is caused, and the performance bottleneck is easy to occur when a traditional database is used for centralized storage and massive data is processed.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provide a data quality management method based on an orchestratable component.
In order to achieve the technical purpose and the technical effect, the invention is realized by the following technical scheme:
A data quality governance method based on orchestratable components, comprising:
A data governance definition component for formulating a data quality governance scope and a quality governance goal,
A data acquisition component for acquiring data,
A data evaluation component for evaluating the data,
A data modification component for modifying the anomalous data,
A data analysis component for analyzing the data,
A flow improvement component for improving the data quality governance flow,
A data storage component for storing data in a decentralized manner,
A data destruction component for destroying data;
The data acquisition component, the data evaluation component, the data modification component, the data analysis component, the flow improvement component, the data storage component and the data destruction component are arranged according to the data quality control range and the quality control target formulated by the data definition component.
The data acquisition assembly comprises a plurality of data acquisition subassemblies which are respectively used for acquiring according to different data sources.
The data evaluation assembly comprises a uniqueness evaluation sub-assembly, an integrity evaluation sub-assembly, an accuracy evaluation sub-assembly, a consistency evaluation sub-assembly, a relevance evaluation sub-assembly and a timeliness evaluation sub-assembly.
The data modification component comprises a cross verification method data modification sub-component for correcting error data and missing data and a similar comparison method data removal sub-component for removing redundant data.
The data analysis component comprises a regression analysis sub-component, a factor analysis sub-component, a fishbone diagram analysis sub-component, a pareto analysis sub-component and a matrix data analysis sub-component.
The flow improvement component comprises a flow feedback sub-component and a flow reconstruction sub-component.
Wherein the data storage component comprises a number of independent data storage sub-components.
The invention has the advantages that each part for data quality management is divided into loose components, when the data quality management is carried out, the required components are arranged and used according to the data quality management range and the quality management target formulated in the data management definition components, so that the components are mutually cooperated to avoid waste of flow and time, improve the data quality management efficiency, realize scattered storage of data, improve the extraction efficiency of the data and avoid the bottleneck of data extraction performance when processing massive data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a block diagram of a data quality management method in accordance with the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings in combination with embodiments.
As shown in fig. 1, a data quality management method based on programmable components includes:
A data governance definition component for formulating a data quality governance scope and a quality governance goal,
A data acquisition component for acquiring data,
A data evaluation component for evaluating the data,
A data modification component for modifying the anomalous data,
A data analysis component for analyzing the data,
A flow improvement component for improving the data quality governance flow,
A data storage component for storing data in a decentralized manner,
A data destruction component for destroying data;
The data acquisition component, the data evaluation component, the data modification component, the data analysis component, the flow improvement component, the data storage component and the data destruction component are arranged according to the data quality control range and the quality control target formulated by the data definition component.
The data acquisition assembly comprises a plurality of data acquisition subassemblies which respectively acquire according to different data sources.
The data evaluation assembly comprises a uniqueness evaluation sub-assembly, an integrity evaluation sub-assembly, an accuracy evaluation sub-assembly, a consistency evaluation sub-assembly, a relevance evaluation sub-assembly and a timeliness evaluation sub-assembly.
The data modification component includes a cross-validation data modification sub-component for modifying erroneous data and missing data, and a similar comparison data removal sub-component for removing redundant data.
The data analysis component comprises a regression analysis sub-component, a factor analysis sub-component, a fishbone diagram analysis sub-component, a pareto analysis sub-component and a matrix data analysis sub-component.
The flow improvement component comprises a flow feedback sub-component and a flow reconstruction sub-component.
The data storage component includes several independent data storage sub-components.
In the first embodiment, the data quality control range is kilowatts of electricity consumption of residents, the quality control target is integrity assessment of data, the data control definition assembly is used for arranging the data acquisition subassembly and the integrity assessment subassembly, the data acquisition subassembly acquires the kilowatts of electricity consumption of the residents to form a kilowatts database of electricity consumption of residents, the integrity assessment subassembly is used for carrying out integrity assessment on the data in the kilowatts database of electricity consumption of the residents, and a data integrity assessment report is formed after the assessment is completed.
In the second embodiment, the data quality control range is a power transmission area of a certain transformer, the quality control target is power consumption condition analysis, the data control definition component schedules the data acquisition subassembly, the data evaluation component, the data modification component and the data analysis component for use, the data acquisition subassembly acquires power consumption data in the power transmission area of the certain transformer to form an area power consumption database, then the data evaluation component is utilized to perform data evaluation on data in the area power consumption database, abnormal data is modified by the data modification component, and the data analysis component performs data analysis on the modified area power consumption database to form an analysis report.
In the third embodiment, the data quality control range is a certain municipal power grid dispatching area, the quality control target is power grid dispatching quality optimization, the data control definition component schedules and uses the data acquisition component, the data evaluation component, the data modification component, the data analysis component, the flow improvement component, the data storage component and the data destruction component, the data acquisition component acquires power grid dispatching data in the certain municipal power grid dispatching area to form a power grid dispatching database of the municipal, the data evaluation component evaluates the data in the power grid dispatching database in six aspects, on one hand, the six aspects of the evaluated data are tested, on the other hand, abnormal data are found out, the data modification component modifies the abnormal data in the power grid dispatching database, the data analysis component analyzes the modified power grid dispatching database to form an analysis report, the flow improvement component schedules and processes the data to be saved by the data storage component in a sequence, a complete cooperative mode and a processing result, and the data to be destroyed by the data storage component so as to avoid theft.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010949136.2A CN114168573B (en) | 2020-09-10 | 2020-09-10 | A data quality governance method based on programmable components |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010949136.2A CN114168573B (en) | 2020-09-10 | 2020-09-10 | A data quality governance method based on programmable components |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114168573A CN114168573A (en) | 2022-03-11 |
CN114168573B true CN114168573B (en) | 2025-02-25 |
Family
ID=80475735
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010949136.2A Active CN114168573B (en) | 2020-09-10 | 2020-09-10 | A data quality governance method based on programmable components |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114168573B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102708149A (en) * | 2012-04-01 | 2012-10-03 | 河海大学 | Data quality management method and system |
CN110704502A (en) * | 2019-11-20 | 2020-01-17 | 中电万维信息技术有限责任公司 | Componentized data quality checking method |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130117202A1 (en) * | 2011-11-03 | 2013-05-09 | Microsoft Corporation | Knowledge-based data quality solution |
CN103401945B (en) * | 2013-08-14 | 2016-08-10 | 青岛大学 | A kind of service combination dynamic reconstruction method |
US9760428B1 (en) * | 2013-12-19 | 2017-09-12 | Amdocs Software Systems Limited | System, method, and computer program for performing preventative maintenance in a network function virtualization (NFV) based communication network |
CN104134121A (en) * | 2014-07-30 | 2014-11-05 | 国家电网公司 | Method for achieving visualization of power grid information system business data |
US10459881B2 (en) * | 2015-02-27 | 2019-10-29 | Podium Data, Inc. | Data management platform using metadata repository |
CN108268997A (en) * | 2017-11-23 | 2018-07-10 | 国网陕西省电力公司经济技术研究院 | A kind of electricity grid substation quality of data wire examination method |
EP3575980A3 (en) * | 2018-05-29 | 2020-03-04 | Accenture Global Solutions Limited | Intelligent data quality |
-
2020
- 2020-09-10 CN CN202010949136.2A patent/CN114168573B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102708149A (en) * | 2012-04-01 | 2012-10-03 | 河海大学 | Data quality management method and system |
CN110704502A (en) * | 2019-11-20 | 2020-01-17 | 中电万维信息技术有限责任公司 | Componentized data quality checking method |
Also Published As
Publication number | Publication date |
---|---|
CN114168573A (en) | 2022-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113935497B (en) | Intelligent operation and maintenance fault processing method, device, equipment and storage medium thereof | |
US10102039B2 (en) | Converting a hybrid flow | |
Bosu et al. | Data quality in empirical software engineering: a targeted review | |
CN113597664B (en) | Method, electronic device, storage medium and system for determining cause of failure | |
Liu et al. | Efficient distributed query processing in large RFID-enabled supply chains | |
CN107679089B (en) | A cleaning method, device and system for power sensing data | |
Lyu et al. | Going green and profitable: The impact of smart manufacturing on Chinese enterprises | |
CN107168868B (en) | A Software Change Defect Prediction Method Based on Sampling and Ensemble Learning | |
Gupta et al. | Simulation modeling and analysis of a complex system of a thermal power plant | |
CN107153406A (en) | A whole-process quality control method for products | |
CN114168573B (en) | A data quality governance method based on programmable components | |
Mursitama et al. | The role of absorptive capacity, technological capability, and firm performance in Indonesia’s high-tech industry | |
CN104764455B (en) | A kind of data in navigation electronic map processing method and processing device | |
CN104732436A (en) | A third-party tax-related information collection and analysis tool | |
CN106130929B (en) | The service message automatic processing method and system of internet insurance field based on graph-theoretical algorithm | |
CN108229827A (en) | A kind of power equipment quality problems modeling and analysis methods | |
CN112926269B (en) | Method and system for data grouping and cleaning of power plant edge nodes | |
CN101196742A (en) | A production process quality management system | |
Raj et al. | On the Impact of ML use cases on Industrial Data Pipelines | |
CN119028601A (en) | A clinical trial protocol deviation identification method and management system | |
CN107609016A (en) | Electricity transaction data accuracy method of calibration based on expression parsing | |
CN110909068A (en) | Emergency diesel generator set big data acquisition processing method and system and storage medium | |
CN113779178A (en) | Data storage method and device based on knowledge graph | |
Nihayah et al. | Does the clean development mechanism exist in developing countries after an international agreement? | |
Tahrat et al. | Abstracting temporal aboxes in TDL-Lite |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |