[go: up one dir, main page]

CN114168573B - A data quality governance method based on programmable components - Google Patents

A data quality governance method based on programmable components Download PDF

Info

Publication number
CN114168573B
CN114168573B CN202010949136.2A CN202010949136A CN114168573B CN 114168573 B CN114168573 B CN 114168573B CN 202010949136 A CN202010949136 A CN 202010949136A CN 114168573 B CN114168573 B CN 114168573B
Authority
CN
China
Prior art keywords
data
component
sub
evaluation
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010949136.2A
Other languages
Chinese (zh)
Other versions
CN114168573A (en
Inventor
吴钟飞
陈凤超
黎鸣
梅傲琪
何毅鹏
赵俊炜
李祺威
周立德
饶欢
张锐
徐睿烽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Dongguan Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Dongguan Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202010949136.2A priority Critical patent/CN114168573B/en
Publication of CN114168573A publication Critical patent/CN114168573A/en
Application granted granted Critical
Publication of CN114168573B publication Critical patent/CN114168573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a data quality control method based on an orchestratable component, which comprises a data acquisition component, a data evaluation component, a data modification component, a data analysis component, a flow improvement component, a data storage component and a data destruction component, wherein the data acquisition component, the data evaluation component, the data modification component, the data analysis component, the flow improvement component, the data storage component and the data destruction component are orchestrated according to a data quality control range and a quality control target formulated by a data definition component. The invention has the advantages that each part for data quality management is divided into loose components, when the data quality management is carried out, the required components are arranged and used according to the data quality management range and the quality management target formulated in the data management definition components, so that the components are mutually cooperated to avoid waste of flow and time, improve the data quality management efficiency, realize scattered storage of data, improve the extraction efficiency of the data and avoid the bottleneck of data extraction performance when processing massive data.

Description

Data quality management method based on programmable assembly
Technical Field
The invention relates to the field of data management, in particular to a data quality management method based on an orchestratable component.
Background
Currently, for the field of data quality optimization, a centralized data quality management system is mainly used in the industry. The traditional centralized data quality management system realizes the capabilities of standardized management of the check rules, regular execution time scheduling, unified management of data quality reports and the like, and improves the efficiency and management level of data quality check.
The centralized data quality management system has the limitations that the quality management flow is fixed, the flow can not be freely arranged according to different data quality treatment ranges and quality treatment targets, the waste of flow and data quality treatment time is caused, and the performance bottleneck is easy to occur when a traditional database is used for centralized storage and massive data is processed.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provide a data quality management method based on an orchestratable component.
In order to achieve the technical purpose and the technical effect, the invention is realized by the following technical scheme:
A data quality governance method based on orchestratable components, comprising:
A data governance definition component for formulating a data quality governance scope and a quality governance goal,
A data acquisition component for acquiring data,
A data evaluation component for evaluating the data,
A data modification component for modifying the anomalous data,
A data analysis component for analyzing the data,
A flow improvement component for improving the data quality governance flow,
A data storage component for storing data in a decentralized manner,
A data destruction component for destroying data;
The data acquisition component, the data evaluation component, the data modification component, the data analysis component, the flow improvement component, the data storage component and the data destruction component are arranged according to the data quality control range and the quality control target formulated by the data definition component.
The data acquisition assembly comprises a plurality of data acquisition subassemblies which are respectively used for acquiring according to different data sources.
The data evaluation assembly comprises a uniqueness evaluation sub-assembly, an integrity evaluation sub-assembly, an accuracy evaluation sub-assembly, a consistency evaluation sub-assembly, a relevance evaluation sub-assembly and a timeliness evaluation sub-assembly.
The data modification component comprises a cross verification method data modification sub-component for correcting error data and missing data and a similar comparison method data removal sub-component for removing redundant data.
The data analysis component comprises a regression analysis sub-component, a factor analysis sub-component, a fishbone diagram analysis sub-component, a pareto analysis sub-component and a matrix data analysis sub-component.
The flow improvement component comprises a flow feedback sub-component and a flow reconstruction sub-component.
Wherein the data storage component comprises a number of independent data storage sub-components.
The invention has the advantages that each part for data quality management is divided into loose components, when the data quality management is carried out, the required components are arranged and used according to the data quality management range and the quality management target formulated in the data management definition components, so that the components are mutually cooperated to avoid waste of flow and time, improve the data quality management efficiency, realize scattered storage of data, improve the extraction efficiency of the data and avoid the bottleneck of data extraction performance when processing massive data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a block diagram of a data quality management method in accordance with the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings in combination with embodiments.
As shown in fig. 1, a data quality management method based on programmable components includes:
A data governance definition component for formulating a data quality governance scope and a quality governance goal,
A data acquisition component for acquiring data,
A data evaluation component for evaluating the data,
A data modification component for modifying the anomalous data,
A data analysis component for analyzing the data,
A flow improvement component for improving the data quality governance flow,
A data storage component for storing data in a decentralized manner,
A data destruction component for destroying data;
The data acquisition component, the data evaluation component, the data modification component, the data analysis component, the flow improvement component, the data storage component and the data destruction component are arranged according to the data quality control range and the quality control target formulated by the data definition component.
The data acquisition assembly comprises a plurality of data acquisition subassemblies which respectively acquire according to different data sources.
The data evaluation assembly comprises a uniqueness evaluation sub-assembly, an integrity evaluation sub-assembly, an accuracy evaluation sub-assembly, a consistency evaluation sub-assembly, a relevance evaluation sub-assembly and a timeliness evaluation sub-assembly.
The data modification component includes a cross-validation data modification sub-component for modifying erroneous data and missing data, and a similar comparison data removal sub-component for removing redundant data.
The data analysis component comprises a regression analysis sub-component, a factor analysis sub-component, a fishbone diagram analysis sub-component, a pareto analysis sub-component and a matrix data analysis sub-component.
The flow improvement component comprises a flow feedback sub-component and a flow reconstruction sub-component.
The data storage component includes several independent data storage sub-components.
In the first embodiment, the data quality control range is kilowatts of electricity consumption of residents, the quality control target is integrity assessment of data, the data control definition assembly is used for arranging the data acquisition subassembly and the integrity assessment subassembly, the data acquisition subassembly acquires the kilowatts of electricity consumption of the residents to form a kilowatts database of electricity consumption of residents, the integrity assessment subassembly is used for carrying out integrity assessment on the data in the kilowatts database of electricity consumption of the residents, and a data integrity assessment report is formed after the assessment is completed.
In the second embodiment, the data quality control range is a power transmission area of a certain transformer, the quality control target is power consumption condition analysis, the data control definition component schedules the data acquisition subassembly, the data evaluation component, the data modification component and the data analysis component for use, the data acquisition subassembly acquires power consumption data in the power transmission area of the certain transformer to form an area power consumption database, then the data evaluation component is utilized to perform data evaluation on data in the area power consumption database, abnormal data is modified by the data modification component, and the data analysis component performs data analysis on the modified area power consumption database to form an analysis report.
In the third embodiment, the data quality control range is a certain municipal power grid dispatching area, the quality control target is power grid dispatching quality optimization, the data control definition component schedules and uses the data acquisition component, the data evaluation component, the data modification component, the data analysis component, the flow improvement component, the data storage component and the data destruction component, the data acquisition component acquires power grid dispatching data in the certain municipal power grid dispatching area to form a power grid dispatching database of the municipal, the data evaluation component evaluates the data in the power grid dispatching database in six aspects, on one hand, the six aspects of the evaluated data are tested, on the other hand, abnormal data are found out, the data modification component modifies the abnormal data in the power grid dispatching database, the data analysis component analyzes the modified power grid dispatching database to form an analysis report, the flow improvement component schedules and processes the data to be saved by the data storage component in a sequence, a complete cooperative mode and a processing result, and the data to be destroyed by the data storage component so as to avoid theft.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims (6)

1.一种基于可编排组件的数据质量治理方法 ,其特征在于,包括:1. A data quality management method based on programmable components, characterized by comprising: 用于制定数据质量治理范围和质量治理目标的数据治理定义组件,Data governance definition components used to establish data quality governance scope and quality governance objectives, 用于获取数据的数据获取组件,A data acquisition component for acquiring data, 用于对数据进行测评的数据测评组件,Data evaluation components used to evaluate data, 用于对异常数据进行修改的数据修改组件,A data modification component used to modify abnormal data. 用于对数据进行分析的数据分析组件,Data analysis components for analyzing data, 用于对数据质量治理流程进行改进的流程改进组件,所述流程改进组件包括流程反馈子组件、流程再造子组件,A process improvement component for improving the data quality governance process, wherein the process improvement component includes a process feedback subcomponent and a process reengineering subcomponent. 用于分散存储数据的数据存储组件,Data storage components for decentralized storage of data, 用于销毁数据的数据销毁组件;Data destruction components for destroying data; 所述数据获取组件、数据测评组件、数据修改组件、数据分析组件、流程改进组件、数据存储组件、数据销毁组件根据数据定义组件制定的数据质量治理范围和质量治理目标进行编排;The data acquisition component, data evaluation component, data modification component, data analysis component, process improvement component, data storage component, and data destruction component are arranged according to the data quality governance scope and quality governance objectives formulated by the data definition component; 数据质量治理范围为一市级电网调度区域,质量治理目标为电网调度质量优化,数据治理定义组件将数据获取组件、数据测评组件、数据修改组件、数据分析组件、流程改进组件、数据存储组件、数据销毁组件进行编排使用,数据获取组件获取该市级电网调度区域内的电网调度数据,形成该市的电网调度数据库,接着数据测评组件对电网调度数据库中的数据进行测评,用于找出异常数据,数据修改组件对电网调度数据库中的异常数据进行修改,数据分析组件对修改后的电网调度数据库进行数据分析,形成分析报告,流程改进组件对所述数据获取组件、数据测评组件、数据修改组件、数据分析组件编排顺序、协同方式、处理结果作出反馈和/或改变,数据存储组件对需要保存的数据进行保存,而不需要保存的数据由数据销毁组件彻底销毁,以免被窃取盗用。The scope of data quality governance is a municipal power grid dispatching area, and the quality governance goal is to optimize the power grid dispatching quality. The data governance definition component arranges and uses the data acquisition component, data evaluation component, data modification component, data analysis component, process improvement component, data storage component, and data destruction component. The data acquisition component acquires the power grid dispatching data in the municipal power grid dispatching area to form the power grid dispatching database of the city. Then the data evaluation component evaluates the data in the power grid dispatching database to find out abnormal data. The data modification component modifies the abnormal data in the power grid dispatching database. The data analysis component performs data analysis on the modified power grid dispatching database to form an analysis report. The process improvement component gives feedback and/or changes the arrangement order, coordination mode, and processing results of the data acquisition component, data evaluation component, data modification component, and data analysis component. The data storage component saves the data that needs to be saved, and the data that does not need to be saved is completely destroyed by the data destruction component to prevent it from being stolen or misused. 2.根据权利要求1所述的基于可编排组件的数据质量治理方法,其特征在于:所述数据获取组件包括多个根据数据来源不同分别进行采集的数据采集子组件。2. According to the data quality management method based on programmable components in claim 1, it is characterized in that: the data acquisition component includes multiple data acquisition sub-components that collect data according to different data sources. 3.根据权利要求1所述的基于可编排组件的数据质量治理方法,其特征在于:所述数据测评组件包括唯一性测评子组件、完整性测评子组件、准确性测评子组件、一致性测评子组件、关联性测评子组件、及时性测评子组件。3. According to the data quality management method based on programmable components in claim 1, it is characterized in that: the data evaluation component includes a uniqueness evaluation sub-component, an integrity evaluation sub-component, an accuracy evaluation sub-component, a consistency evaluation sub-component, a relevance evaluation sub-component, and a timeliness evaluation sub-component. 4.根据权利要求1所述的基于可编排组件的数据质量治理方法,其特征在于:所述数据修改组件包括用于修正错误数据和缺失数据的交叉验证法数据修正子组件、用于去除冗余数据的相似比较法数据去除子组件。4. According to the data quality management method based on programmable components described in claim 1, it is characterized in that: the data modification component includes a cross-validation method data correction sub-component for correcting erroneous data and missing data, and a similarity comparison method data removal sub-component for removing redundant data. 5.根据权利要求1所述的基于可编排组件的数据质量治理方法,其特征在于:所述数据分析组件包括回归分析子组件、因子分析子组件、鱼骨图分析子组件、帕累托分析子组件、矩阵数据分析子组件。5. According to the data quality management method based on programmable components in claim 1, it is characterized in that: the data analysis component includes a regression analysis sub-component, a factor analysis sub-component, a fishbone diagram analysis sub-component, a Pareto analysis sub-component, and a matrix data analysis sub-component. 6.根据权利要求1所述的基于可编排组件的数据质量治理方法,其特征在于:所述数据存储组件包括若干个独立的数据存储子组件。6. The data quality management method based on programmable components according to claim 1 is characterized in that: the data storage component includes several independent data storage sub-components.
CN202010949136.2A 2020-09-10 2020-09-10 A data quality governance method based on programmable components Active CN114168573B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010949136.2A CN114168573B (en) 2020-09-10 2020-09-10 A data quality governance method based on programmable components

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010949136.2A CN114168573B (en) 2020-09-10 2020-09-10 A data quality governance method based on programmable components

Publications (2)

Publication Number Publication Date
CN114168573A CN114168573A (en) 2022-03-11
CN114168573B true CN114168573B (en) 2025-02-25

Family

ID=80475735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010949136.2A Active CN114168573B (en) 2020-09-10 2020-09-10 A data quality governance method based on programmable components

Country Status (1)

Country Link
CN (1) CN114168573B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708149A (en) * 2012-04-01 2012-10-03 河海大学 Data quality management method and system
CN110704502A (en) * 2019-11-20 2020-01-17 中电万维信息技术有限责任公司 Componentized data quality checking method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130117202A1 (en) * 2011-11-03 2013-05-09 Microsoft Corporation Knowledge-based data quality solution
CN103401945B (en) * 2013-08-14 2016-08-10 青岛大学 A kind of service combination dynamic reconstruction method
US9760428B1 (en) * 2013-12-19 2017-09-12 Amdocs Software Systems Limited System, method, and computer program for performing preventative maintenance in a network function virtualization (NFV) based communication network
CN104134121A (en) * 2014-07-30 2014-11-05 国家电网公司 Method for achieving visualization of power grid information system business data
US10459881B2 (en) * 2015-02-27 2019-10-29 Podium Data, Inc. Data management platform using metadata repository
CN108268997A (en) * 2017-11-23 2018-07-10 国网陕西省电力公司经济技术研究院 A kind of electricity grid substation quality of data wire examination method
EP3575980A3 (en) * 2018-05-29 2020-03-04 Accenture Global Solutions Limited Intelligent data quality

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708149A (en) * 2012-04-01 2012-10-03 河海大学 Data quality management method and system
CN110704502A (en) * 2019-11-20 2020-01-17 中电万维信息技术有限责任公司 Componentized data quality checking method

Also Published As

Publication number Publication date
CN114168573A (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN113935497B (en) Intelligent operation and maintenance fault processing method, device, equipment and storage medium thereof
US10102039B2 (en) Converting a hybrid flow
Bosu et al. Data quality in empirical software engineering: a targeted review
CN113597664B (en) Method, electronic device, storage medium and system for determining cause of failure
Liu et al. Efficient distributed query processing in large RFID-enabled supply chains
CN107679089B (en) A cleaning method, device and system for power sensing data
Lyu et al. Going green and profitable: The impact of smart manufacturing on Chinese enterprises
CN107168868B (en) A Software Change Defect Prediction Method Based on Sampling and Ensemble Learning
Gupta et al. Simulation modeling and analysis of a complex system of a thermal power plant
CN107153406A (en) A whole-process quality control method for products
CN114168573B (en) A data quality governance method based on programmable components
Mursitama et al. The role of absorptive capacity, technological capability, and firm performance in Indonesia’s high-tech industry
CN104764455B (en) A kind of data in navigation electronic map processing method and processing device
CN104732436A (en) A third-party tax-related information collection and analysis tool
CN106130929B (en) The service message automatic processing method and system of internet insurance field based on graph-theoretical algorithm
CN108229827A (en) A kind of power equipment quality problems modeling and analysis methods
CN112926269B (en) Method and system for data grouping and cleaning of power plant edge nodes
CN101196742A (en) A production process quality management system
Raj et al. On the Impact of ML use cases on Industrial Data Pipelines
CN119028601A (en) A clinical trial protocol deviation identification method and management system
CN107609016A (en) Electricity transaction data accuracy method of calibration based on expression parsing
CN110909068A (en) Emergency diesel generator set big data acquisition processing method and system and storage medium
CN113779178A (en) Data storage method and device based on knowledge graph
Nihayah et al. Does the clean development mechanism exist in developing countries after an international agreement?
Tahrat et al. Abstracting temporal aboxes in TDL-Lite

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant