[go: up one dir, main page]

CN115328902B - A data quality inspection rule matching method, storage medium and system - Google Patents

A data quality inspection rule matching method, storage medium and system Download PDF

Info

Publication number
CN115328902B
CN115328902B CN202211049853.5A CN202211049853A CN115328902B CN 115328902 B CN115328902 B CN 115328902B CN 202211049853 A CN202211049853 A CN 202211049853A CN 115328902 B CN115328902 B CN 115328902B
Authority
CN
China
Prior art keywords
field
metadata
data quality
matched
quality inspection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211049853.5A
Other languages
Chinese (zh)
Other versions
CN115328902A (en
Inventor
徐欢
施勇
段琳
李标奇
徐敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Center of Yunnan Power Grid Co Ltd
Original Assignee
Information Center of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Center of Yunnan Power Grid Co Ltd filed Critical Information Center of Yunnan Power Grid Co Ltd
Priority to CN202211049853.5A priority Critical patent/CN115328902B/en
Publication of CN115328902A publication Critical patent/CN115328902A/en
Application granted granted Critical
Publication of CN115328902B publication Critical patent/CN115328902B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Tests Of Electronic Circuits (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Collecting a plurality of field metadata and a plurality of data quality inspection rules, calculating the association degree between each field metadata and each data quality inspection rule, enabling the field metadata with the association degree reaching the standard to be matched with the data quality inspection rules, identifying candidate field metadata matched with the data quality inspection rules and candidate field metadata not matched with the data quality inspection rules, and if the candidate field metadata which are more than a preset threshold in text similarity and consistent in data type with the field metadata to be matched exist, replacing parameter information contained in the data quality inspection rules selected by a user with data information of the field metadata to be matched, replacing condition parameters with new condition parameters input by the user, obtaining new data quality inspection rules and enabling the field metadata to be matched with the new data quality inspection rules.

Description

Data quality inspection rule matching method, storage medium and system
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, a storage medium, and a system for matching data quality inspection rules.
Background
When the power grid system operates, a large amount of service data can be generated, the service data can reflect the operation condition of the power grid system, and the service data are stored in the service system after being acquired. At present, a data quality inspection rule is generally adopted to inspect the quality of service data in a service system, and if the quality inspection result of the service data is abnormal, a worker needs to monitor the power grid operation service corresponding to the abnormal service data.
In the process of selecting the data quality inspection rules for quality inspection of service data, the association degree between the field metadata for describing the service data and the data quality inspection rules is calculated first, and if the association degree meets the standard, the field metadata is matched with the data quality inspection rules for quality inspection. If a certain field metadata does not reach the standard of the association degree between the field metadata and each data quality inspection rule, the field metadata cannot be matched with the data quality inspection rule, and therefore quality inspection of service data described by the field metadata cannot be performed.
Disclosure of Invention
The technical problem to be solved by the invention is how to improve the situation that field metadata is not matched with a data quality check rule.
In order to solve the technical problems, the invention provides a data quality inspection rule matching method, which comprises the following steps:
A. collecting name information, source information and data type information of a plurality of field metadata used for describing service data from a service system;
B. Acquiring preset multiple data quality inspection rules, and field name information, field source information and condition parameters contained in each data quality inspection rule;
C. Judging whether the association degree between each field metadata and each data quality inspection rule meets the standard or not according to the name information and the source information of each field metadata and the field name information and the field source information contained in each data quality inspection rule;
D. Matching the field metadata with the association degree reaching the standard with a data quality check rule;
E. Identifying candidate field metadata matched with the data quality check rule and field metadata to be matched without matching the data quality check rule among the plurality of field metadata;
F. for each field metadata to be matched, the following steps F1, F2, F3, F4 are performed:
F1, judging whether candidate field metadata with the text similarity larger than a preset threshold value and consistent in data type exists or not, and if so, displaying the candidate field metadata and the matched data quality check rule thereof for the user to select;
F2. obtaining candidate field metadata selected by a user and a data quality check rule matched with the candidate field metadata, and replacing field name information and field source information contained in the data quality check rule with name information and source information of the field metadata to be matched;
f3. obtaining new condition parameters input by the user, and replacing the condition parameters contained in the data quality inspection rule selected by the user with the new condition parameters input by the user to obtain a new data quality inspection rule;
-F4. matching the field metadata to be matched with the new data quality check rule.
Preferably, in the step D, if there is a data quality inspection rule, the text similarity between the field name information and the name information of the metadata of the field reaches a first preset value, and the text similarity between the field source information and the source information of the metadata of the field reaches a second preset value, then the association degree between the data quality inspection rule and the metadata of the field reaches the standard.
Preferably, in the step F1, the text similarity between the metadata of the field to be matched and the metadata of each candidate field is calculated according to the name information of the metadata of the field to be matched and the name information of the metadata of each candidate field, whether the metadata of the candidate field with the text similarity greater than a preset threshold exists is judged, and if so, whether the metadata of the field to be matched is consistent with the metadata of the candidate field in data type is judged by comparing the data type information of the metadata of the field to be matched with the data type information of the metadata of the candidate field.
Preferably, in the step F1, if the text similarity with the field metadata to be matched is greater than a preset threshold and there are multiple candidate field metadata with the same data type, the multiple candidate field metadata are sorted and displayed for the user to select according to the text similarity from large to small.
Preferably, in the step F1, candidate field metadata with text similarity ranked before a predetermined ranking is selected for display.
Preferably, in the step F2, the data quality inspection rule is first decomposed into a select clause including field name information, a from clause including field source information, and a where clause including condition parameters by using an SQL engine, then the field name information in the select clause is replaced with the name information of the field metadata to be matched, and the field source information in the from clause is replaced with the source information of the field metadata to be matched, in the step F3, the condition parameters in the where clause are replaced with new condition parameters input by a user, and then the replaced select clause, from clause, and where clause are combined to obtain the new data quality inspection rule.
The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps in a data quality check rule matching method as described above.
The invention also provides a data quality inspection rule matching system, which comprises a computer readable storage medium and a processor which are connected with each other, wherein the computer readable storage medium is as described above.
Judging whether candidate field metadata which is more than a preset threshold value and has consistent data types exists for the field metadata to be matched of the unmatched data quality inspection rule, if so, showing the candidate field metadata and the matched data quality inspection rule thereof for user selection, then acquiring the candidate field metadata selected by the user and the matched data quality inspection rule thereof, replacing field name information and field source information contained in the data quality inspection rule with name information and source information of the field metadata to be matched, acquiring new condition parameters input by the user, replacing the condition parameters contained in the user selected data quality inspection rule with new condition parameters input by the user, namely changing the field name information, the field source information and the condition parameters of the data quality inspection rule according to the name information, the source information and the new condition parameters input by the user of the field metadata to be matched on the basis of the template of the original data quality inspection rule, acquiring the new field name information, the field source information and the condition parameters of the new condition parameters input by the user, associating the new field metadata to be matched with the new field metadata, and the new condition parameters input by the user, and the new field metadata to be matched by the quality inspection rule, the quality check of the service data described by the metadata of the fields to be matched can be performed using the new data quality check rules.
Drawings
Fig. 1 is a flow chart of a data quality check rule matching method.
Detailed Description
The invention is further described in detail below in connection with the detailed description.
The embodiment provides a data quality check rule matching system, which comprises a computer readable storage medium and a processor which are connected with each other, wherein the computer readable storage medium stores a computer program, and the computer program is executed by the processor to implement a data quality check rule matching method as shown in fig. 1, and the method specifically comprises the following step A, B, C, D, E, F.
A. A plurality of field metadata describing service data, name information, source information and data type information of each field metadata are collected from a service system.
In this embodiment, the service system stores a large amount of service data generated by the power grid system during operation, and the service data can reflect the operation condition of the power grid system. The data quality inspection rule matching system collects a plurality of field metadata for describing service data from the service system, and collects data information such as name information, source information, data type information and the like of each field metadata. For example, the name information of the field metadata I is name, the source information is t_ userinfo, the data type information is text type, the name information of the field metadata II is name, the source information is t_ admininfo, the data type information is text type, the name information of the field metadata III is age, the source information is t_ userinfo, and the data type information is numerical type.
B. Acquiring preset multiple data quality inspection rules, and field name information, field source information and condition parameters contained in each data quality inspection rule.
In order to perform quality inspection on service data in a service system, a plurality of data quality inspection rules are usually preset, and each data quality inspection rule includes parameter information such as field name information, field source information, condition parameters and the like. For example, a data quality check rule of "SELECT NAME from t_ userinfo where len (name) >8" is preset, the field name information is "name", the field source information is "t_ userinfo", the condition parameter is "8", a data quality check rule of "SELECT DATE from t_ userinfo WHERE DATE IS null" is preset, the field name information is "date", the field source information is "t_ userinfo", and the condition parameter is "null". The data quality inspection rule matching system acquires a plurality of preset data quality inspection rules, and acquires field name information, field source information and condition parameters contained in each data quality inspection rule.
C. Judging whether the association degree between each field metadata and each data quality inspection rule meets the standard or not according to the name information and the source information of each field metadata and the field name information and the field source information contained in each data quality inspection rule;
The system calculates the text similarity between the name information of each field metadata and the field name information of each data quality inspection rule by adopting LEVENSHTEIN DISTANCE algorithm, judges whether the text similarity reaches a first preset value (80%), calculates the text similarity between the source information of each field metadata and the field source information of each data quality inspection rule, judges whether the text similarity reaches a second preset value (100%), and judges that the association degree between the field metadata and the data quality inspection rule meets the standard under the condition that the text similarity between the name information of the field metadata and the field name information of the data quality inspection rule meets the first preset value and the text similarity between the source information of the field metadata and the field source information of the data quality inspection rule meets the second preset value, or else judges that the association degree between the field metadata and the data quality inspection rule does not meet the standard. For example, for field metadata one, field metadata two, field metadata three, data quality inspection rule one and data quality inspection rule two, the system needs to calculate the association degree between field metadata one and data quality inspection rule one, the association degree between field metadata one and data quality inspection rule two, the association degree between field metadata two and data quality inspection rule one, the association degree between field metadata three and data quality inspection rule two, and then determine whether these association degrees reach the standard, which is specifically as follows:
The system calculates the text similarity between the name information "name" of the first field metadata and the field name information "name" of the first data quality inspection rule, the calculated result is that the text similarity is 100%, the first preset value (80%), then calculates the text similarity between the source information "t_ userinfo" of the first field metadata and the field source information "t_ userinfo" of the first data quality inspection rule, the calculated result is that the text similarity is 100%, and the second preset value (100%) is reached, in which case the system judges that the association degree between the first field metadata and the first data quality inspection rule meets the standard.
The system calculates the text similarity between the name information "name" of the first field metadata and the field name information "date" of the second data quality inspection rule, the calculated result is that the text similarity is 50%, the first preset value (80%) is not reached, then calculates the text similarity between the source information "t_ userinfo" of the first field metadata and the field source information "t_ userinfo" of the second data quality inspection rule, the calculated result is that the text similarity is 100%, and the second preset value (100%) is reached, in which case the system judges that the association degree between the first field metadata and the second data quality inspection rule is not up to standard.
The system calculates the text similarity between the name information "name" of the field metadata two and the field name information "name" of the data quality inspection rule one, the calculated result is that the text similarity is 100%, the first preset value (80%) is not reached, then calculates the text similarity between the source information "t_ admininfo" of the field metadata two and the field source information "t_ userinfo" of the data quality inspection rule one, the calculated result is that the text similarity is 50%, the second preset value (100%) is not reached, and in this case, the system judges that the association degree between the field metadata two and the data quality inspection rule one does not reach the standard.
The system calculates the text similarity between the name information "name" of the field metadata two and the field name information "date" of the data quality inspection rule two, the calculated result is that the text similarity is 50%, the first preset value (80%) is not reached, then calculates the text similarity between the source information "t_ admininfo" of the field metadata two and the field source information "t_ userinfo" of the data quality inspection rule two, the calculated result is that the text similarity is 50%, the second preset value (100%) is not reached, and in this case, the system judges that the association degree between the field metadata two and the data quality inspection rule two does not reach the standard.
The system calculates the text similarity between the name information "age" of the field metadata three and the field name information "name" of the data quality inspection rule one, the calculated result is that the text similarity is 30%, the first preset value (80%) is not reached, then calculates the text similarity between the source information "t_ userinfo" of the field metadata three and the field source information "t_ userinfo" of the data quality inspection rule one, the calculated result is that the text similarity is 100%, and the second preset value (100%) is reached, in which case the system judges that the association degree between the field metadata three and the data quality inspection rule two does not reach the standard.
The system calculates the text similarity between the name information "age" of the field metadata three and the field name information "date" of the data quality inspection rule two, the calculated result is that the text similarity is 30%, the first preset value (80%) is not reached, then calculates the text similarity between the source information "t_ userinfo" of the field metadata three and the field source information "t_ userinfo" of the data quality inspection rule two, the calculated result is that the text similarity is 100%, and the second preset value (100%) is reached, in which case, the system judges that the association degree between the field metadata three and the data quality inspection rule two is not up to standard.
It should be noted that, the LEVENSHTEIN DISTANCE algorithm is also called EDIT DISTANCE algorithm, that is, an edit distance algorithm, which obtains an edit distance between two character strings by calculating a minimum number of edit operations required for converting one character string into another character string, and the smaller the edit distance is, the greater the text similarity of the two character strings is, wherein the edit operations include replacing one character with another character, inserting one character, and deleting one character.
D. and matching the field metadata with the association degree reaching the standard with the data quality inspection rule.
After judging whether the association degree between each field metadata and each data quality inspection rule meets the standard, the system establishes a mapping relation between the field metadata with the standard and the data quality inspection rule, so that the field metadata with the standard and the data quality inspection rule are matched, and does not establish a mapping relation between the field metadata with the standard and the data quality inspection rule, so that the field metadata with the standard and the data quality inspection rule are not matched, specifically, the system judges that the association degree between the field metadata with the first data quality inspection rule meets the standard, the association degree between the field metadata with the second data quality inspection rule is not met, the association degree between the field metadata with the third data quality inspection rule is not met, the field metadata with the first data quality inspection rule is not matched, the field metadata with the third data quality inspection rule is not met, and the field metadata with the quality inspection rule is not matched, and the field metadata with the third data quality inspection rule is not matched, and the field metadata is not matched with the quality inspection rule is not met. That is, field metadata one matches data quality check rule one "SELECT NAME from t_ userinfo where len (name) >8", while field metadata two and three do not match data quality check rule.
E. Candidate field metadata of the matched data quality check rule and field metadata to be matched of the unmatched data quality check rule among the plurality of field metadata are identified.
In this embodiment, the system marks the first field metadata of the first matched data quality inspection rule as candidate field metadata, marks the second field metadata and the third field metadata of the non-matched data quality inspection rule as field metadata to be matched, and identifies and distinguishes the first field metadata and the second field metadata and the third field metadata to be matched.
F. The following steps F1, F2, F3, F4 are performed for each field metadata to be matched.
(1) The execution of steps F1, F2, F3, F4 for field metadata two to be matched is detailed as follows:
F1. Judging whether candidate field metadata with the text similarity larger than a preset threshold value and consistent in data type exists or not, if so, displaying the candidate field metadata and the matched data quality check rule thereof for the user to select;
The system calculates the text similarity of the metadata of the field to be matched and the metadata of each candidate field according to the name information of the metadata of the field to be matched and the name information of the metadata of each candidate field, judges whether the metadata of the candidate field with the text similarity of the metadata of the field to be matched being greater than a preset threshold (for example, 80 percent) exists, and then compares and judges whether the metadata of the field to be matched is consistent with the metadata of the candidate field according to the data type information of the metadata of the field to be matched and the data type information of the metadata of the candidate field if the metadata of the field to be matched exists, and displays the metadata of the candidate field and the quality check rule of the matched data of the metadata of the candidate field for the user to select if the metadata of the field to be matched is greater than the preset threshold and the data type is consistent.
In this embodiment, the name information of the field metadata to be matched is "name", the data type information is "text type", one candidate field metadata is specifically "first candidate field metadata", the field name information is "name", and the data type information is "text type", so that according to the name information "name" of the field metadata to be matched and the name information "name" of the first candidate field metadata, the text similarity result between the field metadata to be matched and the first candidate field metadata is calculated to be 100%, which is greater than a preset threshold (80%), namely, there is the first candidate field metadata with the text similarity greater than the preset threshold with the first candidate field metadata, so that according to the data type information "text type" of the field metadata to be matched and the first candidate field metadata, the data type of the field metadata to be matched is judged to be consistent, namely, the text similarity between the field metadata to be matched and the first candidate field metadata is greater than the preset threshold, and the data type is consistent, and in this case, the system displays the quality check rule of the first candidate field metadata userinfo where len and the first candidate field metadata for matching (39t_26) of the user is selected (398).
In other embodiments, if there are multiple candidate field metadata, and there are multiple candidate field metadata with text similarity greater than a preset threshold and consistent data types with the second field metadata to be matched, the multiple candidate field metadata are sorted according to the text similarity from large to small, and the candidate field metadata with the text similarity ranked in the first three names are selected for display for selection by a user.
F2. and acquiring candidate field metadata selected by a user and a matched data quality inspection rule thereof, and replacing field name information and field source information contained in the data quality inspection rule with the name information and source information of the field metadata to be matched.
In this embodiment, the system displays the candidate field metadata one and the matched data quality check rule one ' SELECT NAME from t_ userinfo where len (name) >8 ', and if the user finds that the data quality check rule one is suitable after viewing, the candidate field metadata one and the matched data quality check rule one can be selected, the system acquires the candidate field metadata one selected by the user and the matched data quality check rule one, then the SQL partitioning module in the SQL engine decomposes the data quality rule one ' SELECT NAME from t_ userinfo where len (name) >8 ' into a select clause ' SELECT NAME ', a from clause ' from t_ userinfo ' and a where clause ' where ' 8 ' is found, and then replaces the field name information ' name ' in the select clause with the name information ' anme ' of the field metadata two to be matched by the parameter filling module in the SQL engine, and replaces the field name information ' userinfo ' from t_3754 ' in the from clause with the source information ' admininfo ' from t_4924 '.
F3. And acquiring new condition parameters input by the user, and replacing the condition parameters contained in the data quality inspection rule selected by the user with the new condition parameters input by the user to acquire the new data quality inspection rule.
After selecting the first candidate field metadata and the first matched data quality check rule, the user also needs to input new condition parameters of the first candidate field metadata, such as '10', into the system according to experience, after acquiring the new condition parameters '10' input by the user, the system replaces the condition parameters '8' in the where clause with the new condition parameters '10' input by the user by using a parameter filling module in the SQL engine, and then combines the replaced selected clause, from clause and where the where clause is combined by using an Sql combination module in the SQL engine to obtain a new data quality check rule 'SELECT NAME from t_ admininfo where len (name) > 10'.
As can be seen from steps F2 and F3, the system changes the field name information "name", the field source information "t_ userinfo" and the condition parameter "8" of the first data quality inspection rule according to the name information "name", the source information "t_ admininfo" of the second field metadata to be matched and the new condition parameter "10" input by the user on the basis of the template of the first data quality inspection rule "SELECT NAME from t_ userinfo where len (name) >8", so as to obtain the new data quality inspection rule "SELECT NAME from t_ admininfo where len (name) >10".
It should be noted that SQL is an abbreviation of Structured Query Language, translated into a "structured query language," which is a computer language used to access, query, update, and manage data in relational databases. The SQL engine is one of important subsystems of the database, and is responsible for accepting SQL sentences sent by the application program on the upper part and directing the executor to run an execution plan on the lower part.
F4. and matching the field metadata to be matched with the new data quality check rule.
After the new data quality check rule "SELECT NAME from t_ admininfo where len (name) >10" is obtained, the association degree between the field metadata to be matched and the new data quality check rule "SELECT NAME from t_ admininfo where len (name) >10" will reach the standard, so the system establishes a mapping relationship between the field metadata to be matched and the new data quality check rule "SELECT NAME from t_ admininfo where len (name) >10", so that the field metadata to be matched is matched with the new data quality check rule "SELECT NAME from t_ admininfo where len (name) >10", and the service data described by the field metadata to be matched can be checked in quality by using the new data quality check rule "SELECT NAME from t_ admininfo where len (name) > 10".
(2) The execution of steps F1, F2, F3, F4 for field metadata three to be matched is detailed as follows:
In this embodiment, the step F1 is performed first, in which the name information of the field metadata to be matched is "age", the data type information is "numeric value type", and one candidate field metadata, specifically, the candidate field metadata one, is "name", and the data type information is "text type", so that according to the name information "age" of the field metadata to be matched and the name information "name" of the candidate field metadata one, the text similarity result between the field metadata to be matched and the candidate field metadata one is calculated to be 30%, and is not greater than the preset threshold (80%), that is, there is no candidate field metadata with the text similarity greater than the preset threshold with the field metadata to be matched, so that it is unnecessary to compare and judge whether the data types of the field metadata to be matched and the candidate field metadata one are consistent, so that the text similarity between the field metadata to be matched and the candidate field metadata one is not greater than the preset threshold and the data type is consistent, and in this case, the system does not display the candidate field metadata one and the matched data quality inspection rule "39from_5326_ userinfo where len (name) for the user to select.
The system then does not need to perform steps F2, F3, F4.
The above-described embodiments are provided for the present invention only and are not intended to limit the scope of patent protection. Insubstantial changes and substitutions can be made by one skilled in the art in light of the teachings of the invention, as yet fall within the scope of the claims.

Claims (8)

1. A data quality inspection rule matching method is characterized by comprising the following steps:
A. collecting name information, source information and data type information of a plurality of field metadata used for describing service data from a service system;
B. Acquiring preset multiple data quality inspection rules, and field name information, field source information and condition parameters contained in each data quality inspection rule;
C. Judging whether the association degree between each field metadata and each data quality inspection rule meets the standard or not according to the name information and the source information of each field metadata and the field name information and the field source information contained in each data quality inspection rule;
D. Matching the field metadata with the association degree reaching the standard with a data quality check rule;
E. Identifying candidate field metadata matched with the data quality check rule and field metadata to be matched without matching the data quality check rule among the plurality of field metadata;
F. for each field metadata to be matched, the following steps F1, F2, F3, F4 are performed:
F1, judging whether candidate field metadata with the text similarity larger than a preset threshold value and consistent in data type exists or not, and if so, displaying the candidate field metadata and the matched data quality check rule thereof for the user to select;
F2. obtaining candidate field metadata selected by a user and a data quality check rule matched with the candidate field metadata, and replacing field name information and field source information contained in the data quality check rule with name information and source information of the field metadata to be matched;
f3. obtaining new condition parameters input by the user, and replacing the condition parameters contained in the data quality inspection rule selected by the user with the new condition parameters input by the user to obtain a new data quality inspection rule;
-F4. matching the field metadata to be matched with the new data quality check rule.
2. The method according to claim 1, wherein in the step C, if the data quality inspection rule is provided, the text similarity between the field name information and the name information of the metadata of the field reaches a first preset value, and the text similarity between the field source information and the source information of the metadata of the field reaches a second preset value, the association degree between the data quality inspection rule and the metadata of the field is up to standard.
3. The method according to claim 1, wherein in the step F1, the text similarity between the metadata of the field to be matched and the metadata of each candidate field is calculated according to the name information of the metadata of the field to be matched and the name information of the metadata of each candidate field, and whether the metadata of the candidate field with the text similarity greater than a preset threshold is determined, if so, the metadata of the field to be matched and the metadata of the candidate field are compared and determined according to the data type information of the metadata of the field to be matched and the data type information of the metadata of the candidate field.
4. The method according to claim 1, wherein in the step F1, if the text similarity with the field metadata to be matched is greater than a preset threshold and there are a plurality of candidate field metadata with the same data type, the plurality of candidate field metadata are sorted and displayed for the user to select according to the text similarity from large to small.
5. The method according to claim 4, wherein in the step F1, candidate field metadata with text similarity ranked before a predetermined ranking is selected for presentation.
6. The method of claim 1, wherein in step F2, the SQL engine is used to decompose the data quality inspection rule into a select clause including field name information, a from clause including field source information, and a where clause including condition parameters, then the field name information in the select clause is replaced with the name information of the field metadata to be matched, the field source information in the from clause is replaced with the source information of the field metadata to be matched, and in step F3, the condition parameters in the where clause are replaced with new condition parameters input by a user, and then the replaced select clause, from clause, and where clause are combined to obtain the new data quality inspection rule.
7. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor realizes the steps in the data quality check rule matching method according to any one of claims 1 to 6.
8. A data quality check rule matching system comprising a computer readable storage medium and a processor coupled to each other, wherein the computer readable storage medium is as claimed in claim 7.
CN202211049853.5A 2022-08-30 2022-08-30 A data quality inspection rule matching method, storage medium and system Active CN115328902B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211049853.5A CN115328902B (en) 2022-08-30 2022-08-30 A data quality inspection rule matching method, storage medium and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211049853.5A CN115328902B (en) 2022-08-30 2022-08-30 A data quality inspection rule matching method, storage medium and system

Publications (2)

Publication Number Publication Date
CN115328902A CN115328902A (en) 2022-11-11
CN115328902B true CN115328902B (en) 2025-05-16

Family

ID=83927567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211049853.5A Active CN115328902B (en) 2022-08-30 2022-08-30 A data quality inspection rule matching method, storage medium and system

Country Status (1)

Country Link
CN (1) CN115328902B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905319B (en) * 2022-11-16 2024-04-19 国网山东省电力公司营销服务中心(计量中心) A method and system for automatically identifying abnormal electricity charges of massive users
CN115905455B (en) * 2022-12-31 2023-09-29 北京和兴创联健康科技有限公司 Method for normalizing hospital database based on automatic detection technology
CN116151743A (en) * 2023-02-17 2023-05-23 中移动信息技术有限公司 Data processing method, device, electronic equipment and storage medium
CN117149753A (en) * 2023-08-30 2023-12-01 中电云计算技术有限公司 Data checking methods and systems
CN116910496B (en) * 2023-09-14 2024-01-23 深圳市智慧城市科技发展集团有限公司 Configuration method and device of data quality monitoring rule and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667923A (en) * 2020-06-05 2020-09-15 医渡云(北京)技术有限公司 Data matching method and device, computer readable medium and electronic equipment
CN113901075A (en) * 2021-10-12 2022-01-07 平安医疗健康管理股份有限公司 Method, device, computer device and storage medium for generating SQL statement

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287383B (en) * 2019-06-28 2023-06-09 深圳前海微众银行股份有限公司 Field information inspection method and device
CN112800095B (en) * 2021-04-13 2021-07-13 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667923A (en) * 2020-06-05 2020-09-15 医渡云(北京)技术有限公司 Data matching method and device, computer readable medium and electronic equipment
CN113901075A (en) * 2021-10-12 2022-01-07 平安医疗健康管理股份有限公司 Method, device, computer device and storage medium for generating SQL statement

Also Published As

Publication number Publication date
CN115328902A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN115328902B (en) A data quality inspection rule matching method, storage medium and system
US8190616B2 (en) Statistical measure and calibration of reflexive, symmetric and transitive fuzzy search criteria where one or both of the search criteria and database is incomplete
US8332366B2 (en) System and method for automatic weight generation for probabilistic matching
US9195952B2 (en) Systems and methods for contextual mapping utilized in business process controls
CN111143370B (en) Method, apparatus and computer-readable storage medium for analyzing relationships between a plurality of data tables
US20030182296A1 (en) Association candidate generating apparatus and method, association-establishing system, and computer-readable medium recording an association candidate generating program therein
CN115357572A (en) Data quality inspection rule construction method, storage medium and system
US12099551B2 (en) Information search system
KR100877156B1 (en) Dictionary performance analysis system and method for atypical query language
CN119066181B (en) Engineering project artificial intelligent interaction method, device, equipment and storage medium
CN112786124B (en) Problem troubleshooting method and device, storage medium and equipment
CN110806977A (en) Test case set generation method and device based on product requirements and electronic equipment
JP2008117280A (en) Software source code search method and system
US20220327164A1 (en) Data processing assistant system, data processing assistant method, and data processing assistant program
WO2024150457A1 (en) Information search method and information search system
US20220019594A1 (en) Data processing device, data processing program and data processing method
US20230060475A1 (en) Operation data analysis device, operation data analysis system, and operation data analysis method
CN119884447A (en) Metadata maintenance management method and system
CN120407635A (en) A search optimization method and a search optimization device
JPH06348471A (en) Software parts search device
JP2009271588A (en) Data registration program and data registration method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant