[go: up one dir, main page]

CN108595541A - A kind of test method and system of data pick-up quality - Google Patents

A kind of test method and system of data pick-up quality Download PDF

Info

Publication number
CN108595541A
CN108595541A CN201810305163.9A CN201810305163A CN108595541A CN 108595541 A CN108595541 A CN 108595541A CN 201810305163 A CN201810305163 A CN 201810305163A CN 108595541 A CN108595541 A CN 108595541A
Authority
CN
China
Prior art keywords
field
name
data type
data
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810305163.9A
Other languages
Chinese (zh)
Inventor
胡丽英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kangfei Information Technology Co Ltd
Original Assignee
Shanghai Kangfei Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kangfei Information Technology Co Ltd filed Critical Shanghai Kangfei Information Technology Co Ltd
Priority to CN201810305163.9A priority Critical patent/CN108595541A/en
Publication of CN108595541A publication Critical patent/CN108595541A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of test method of data pick-up quality, the method includes:The quantity of the quantity of the field of the first table in MySQL database and the field of the second table in corresponding Hive databases is obtained, and judges whether the quantity of the field of first table is consistent with the quantity of the field of second table;If the quantity of the field of first table records corresponding quantity variance information more than the quantity of the field of second table;If the quantity of the field of first table is consistent with the quantity of field of the second table, then judge the corresponding data type of each field in first table, whether data type corresponding with each field in second table is consistent, if inconsistent, corresponding data type different information is then recorded, the testing efficiency of data pick-up quality is greatly improved.

Description

A kind of test method and system of data pick-up quality
Technical field
The invention belongs to database technical fields, the more particularly to test method and system of data pick-up quality.
Background technology
MySQL is a Relational DBMS, is developed by MySQL AB companies of Sweden.MySQL is in WEB Application aspect, MySQL are best RDBMS (Relational Database Management System, relation datas Base management system) application software.Relational database saves the data in different tables, rather than all data are placed on one In a big warehouse, which adds speed and flexibility is improved.SQL language used in MySQL is for accessing number According to the most frequently used standardized language in library.MySQL softwares use double authorization policies, are divided into Community Edition and commercial version, due to its body Product is small, speed is fast, the total cost of ownership is low, especially this feature of open source code, and the exploitation of general middle-size and small-size website all selects MySQL is as site databases.
The English name of data warehouse is Data Warehouse, can be abbreviated as DW or DWH.Data warehouse is for enterprise The decision-making process of all ranks of industry provides the strategy set of all types data support.It is individual data storage, for Analytical presentation and decision support purpose and create.To need the enterprise of business intelligence, service guidance flow scheme improvements, prison are provided Between apparent time, cost, quality and control.
Data warehouse, be database largely in the presence of, for further mining data resource, in order to certainly Plan need and generate, it is not so-called " large database ".The purpose of the scheme construction of data warehouse, is looked into for front end Based on asking and analyzing, due to there is larger redundancy, so the storage needed is also larger.In order to preferably be taken for front end applications Business, data warehouse often following points feature:
1, efficiency is sufficiently high.The analysis data of data warehouse are generally divided into day, week, the moon, season, year etc., it can be seen that day is the period Data demand efficiency highest, it is desirable that client can see the data analysis of yesterday even in 12 hours within 24 hours.
2, the quality of data.The various information that data warehouse is provided want accurate data certainly, but due to data warehouse stream Journey is generally divided into multiple steps, including data cleansing, loads, and inquiry shows etc., complicated framework can more levels, then Since data source has dirty data or code not rigorous, data distortion is can result in, client sees that the information of mistake may The decision for causing analysis to make mistake, causes damages, rather than benefit.
3, autgmentability.The large data warehouse system architecture design why having is complicated, is since it is considered that the following 3-5 The autgmentability in year, in this case, future do not have to spend very much to rebuild data warehouse soon, can very stable operation.Main body The reasonability of present data modeling has more some middle layers in Data Warehouse Plan, so that mass data flow is had enough bufferings, no It is much larger as data volume, just it can not run.
Quality of data feature about above-mentioned data warehouse depends on data pick-up quality, establishes On Line Analysis Process Using before, needing the data pick-up by each autonomous system to come out, by certain conversion and filtering, it is stored in a concentration Place, become data warehouse.This extracts, converts, the process of load is ETL(Extract, Transform, Load).
Hive is built upon the data warehouse base frame on Hadoop.It provides a series of tool, Ke Yiyong To carry out data extraction conversion load(ETL), this is a kind of big rule that can be stored, inquire and analyze and be stored in Hadoop The mechanism of modulus evidence.Hive defines simple class SQL query languages, and referred to as HQL, it allows the user for being familiar with SQL Inquire data.Meanwhile this language also allow to be familiar with MapReduce developer the customized mapper of exploitation and Reducer handles the analysis work of the impossible complexity of built-in mapper and reducer.
Tester needs test data to extract quality, and traditional test method needs carry out table sample testing, right manually Whether the structure than each table extracts success, and whether the type conversion of table is correct, and testing efficiency is very low.
Therefore, the present invention proposes that a kind of automatic test data extracts the technical solution of quality, greatly improves data pick-up matter The testing efficiency of amount.
Invention content
In consideration of it, the purpose of the present invention is to provide a kind of test method and system of data pick-up quality, greatly improve The testing efficiency of data pick-up quality.
According to foregoing invention purpose, the present invention provides a kind of test method of data pick-up quality, the method includes:
S1, obtain the first table in MySQL database field quantity and the second table in corresponding Hive databases The quantity of field, and judge whether the quantity of the field of first table is consistent with the quantity of the field of second table;
If it is poor to record corresponding quantity more than the quantity of the field of second table for the quantity of the field of S2, first table Different information;
If the quantity of the field of S3, first table is consistent with the quantity of field of the second table, first table is judged In the corresponding data type of each field, whether data type corresponding with each field in second table consistent, if It is inconsistent, then record corresponding data type different information.
Preferably, include before the step S1:Establish library name, table name and the Hive in the MySQL database The library name of database, the mapping relations of table name.
Preferably, the step S1 further includes:
First table is provided in the IP address of the MySQL database, port numbers, library name and table name, and according to the IP Location, port numbers, library name and table name obtain first table;
According to the mapping relations of the MySQL database and the Hive databases, the Hive databases are accordingly obtained Second table.
Preferably, the step S2 further includes:
If the quantity of the field of first table does not record corresponding quantity variance less than the quantity of the field of second table Information.
Preferably, the step S4 further includes:
If the corresponding data type of each field in first table, data corresponding with each field in second table Whether Type-Inconsistencies then judge data type inconsistent in first table in a preset data type rule table;
If not existing, corresponding data type different information is recorded.
Preferably, the step S4 further includes:
If inconsistent data type does not record corresponding number in a preset data type rule table in first table According to type difference information.
Preferably, the method further includes:
A difference table is created, by the data type different information in the quantity variance information of the step 2 and the step S3 It is recorded in the difference table.
According to foregoing invention purpose, the present invention provides a kind of test system of data pick-up quality, the system comprises:
Acquisition module, obtain the quantity of the field of the first table in MySQL database and in corresponding Hive databases the The quantity of the field of two tables;
First judgment module judges whether the quantity of the field of first table is consistent with the quantity of the field of second table, If the quantity of the field of first table records corresponding quantity variance information more than the quantity of the field of second table;
Second judgment module judges if the quantity of the field of first table is consistent with the quantity of field of the second table The corresponding data type of each field in first table, data type corresponding with each field in second table are It is no consistent, if inconsistent, record corresponding data type different information.
Preferably, the system also includes mapping block, establish library name in the MySQL database, table name with it is described The library name of Hive databases, the mapping relations of table name.
Preferably, the acquisition module includes the first table acquiring unit and the second table acquiring unit;
The first table acquiring unit, provide first table the IP address of the MySQL database, port numbers, library name and Table name, and first table is obtained according to the IP address, port numbers, library name and table name;
The second table acquiring unit, according to reflecting for MySQL database described in the mapping block and the Hive databases Relationship is penetrated, the second table of the Hive databases is accordingly obtained.
Compared with prior art, the test method and system of a kind of data pick-up quality provided by the invention has following Advantageous effect:When test data extracts quality, the knot of the source data table by judging the MySQL database and Hive databases Structure difference, user need to only input the information of corresponding MySQL database, execute the technical solution, so that it may obtain the two data The differentiation information in library, including the row name information that increases or delete, the information such as difference of same column name data type, the test Scheme whole-course automation greatly improves the testing efficiency of data pick-up quality, saves human cost, brought convenience to user.
Description of the drawings
Below by a manner of clearly understandable, preferred embodiment is described with reference to the drawings, to a kind of data pick-up quality Above-mentioned characteristic, technical characteristic, advantage and its realization method of test method and system are further described.
Fig. 1 is a kind of flow chart of the test method of data pick-up quality of the present invention;
Fig. 2 is a kind of composed structure schematic diagram of the test system of data pick-up quality of the present invention.
Specific implementation mode
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, control is illustrated below The specific implementation mode of the present invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing, and obtain other embodiments.
To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented Its practical structures as product.In addition, so that simplified form is easy to understand, there is identical structure or function in some figures Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated " only this ", can also indicate the situation of " more than one ".
As shown in Figure 1, one embodiment of the present of invention, a kind of test method of data pick-up quality, the method includes:
S1, obtain the first table in MySQL database field quantity and the second table in corresponding Hive databases The quantity of field, and judge whether the quantity of the field of first table is consistent with the quantity of the field of second table;
If it is poor to record corresponding quantity more than the quantity of the field of second table for the quantity of the field of S2, first table Different information;
If the quantity of the field of S3, first table is consistent with the quantity of field of the second table, first table is judged In the corresponding data type of each field, whether data type corresponding with each field in second table consistent, if It is inconsistent, then record corresponding data type different information.
Specifically, include before the step S1:Establish library name, table name and the Hive in the MySQL database The library name of database, the mapping relations of table name.According to the library name in the MySQL database, table name, the MySQL numbers are obtained According to the corresponding field of the table name, the data type of field in library.By establishing the MySQL database and the Hive numbers According to the mapping relations in library, library name, the table name of the Hive databases are correspondingly obtained, and then obtain in the Hive databases The corresponding field of the table name, the data type of field.
Multiple MySQL databases are typically provided, each MySQL database has corresponding IP address and port numbers.Described Each MySQL database is equipped with multiple databases, and each database is provided with corresponding library name.Each database is equipped with multiple Table, each table are provided with corresponding table name.There is provided first table the MySQL database IP address, port numbers, library name And table name, and first table is obtained according to the IP address, port numbers, library name and table name.According to the MySQL database with The mapping relations of the Hive databases accordingly obtain the second table of the Hive databases.When user needs test data When extracting quality, the difference of MySQL database described in comparison and the data list structure of the Hive databases, user is needed only to need IP address, port numbers, library name and the table name of MySQL database are inputted, according to the technique and scheme of the present invention, so that it may to obtain this The different information of the data structure table of two databases.
When the table name for providing the first table in MySQL database, the field of the first table in MySQL database is obtained Quantity.Correspondingly, the table name of the second table of the Hive databases is obtained, the quantity of the field of second table is obtained, and is sentenced Whether the quantity of the field of disconnected first table is consistent with the quantity of the field of second table.If the field of first table Quantity then records corresponding quantity variance information more than the quantity of the field of second table.If the field of first table Quantity does not record corresponding quantity variance information then less than the quantity of the field of second table.
If the quantity of the field of first table is consistent with the quantity of field of the second table, first table is obtained In the corresponding data type of each field, and obtain corresponding with each field in second table data type, and Judge the corresponding data type of each field in first table, data class corresponding with each field in second table Whether type is consistent, if inconsistent, records corresponding data type different information.
The specific embodiment of the present invention, if the corresponding data type of each field in first table, with described the The corresponding data type of each field is inconsistent in two tables, then judge data type inconsistent in first table whether In one preset data type rule table, if not existing, corresponding data type different information is recorded.If in first table not Consistent data type does not record corresponding data type different information then in a preset data type rule table.
The specific embodiment of the present invention creates a difference table, by the quantity variance information of the step 2 and described Data type different information in step S3 is recorded in the difference table.
According to the technical solution, when test data extracts quality, by judging the MySQL database and Hive databases Source data table architectural difference, user need to only input the information of corresponding MySQL database, execute the technical solution, so that it may The differentiation information of the two databases is obtained, which greatly improves the test of data pick-up quality Efficiency saves human cost, is brought convenience to user.
As shown in Fig. 2, one embodiment of the invention, a kind of test system of data pick-up quality, the system comprises:
Acquisition module 20, obtain MySQL database in the first table field quantity and corresponding Hive databases in The quantity of the field of second table;
First judgment module 21, judge the quantity of the field of first table and the field of second table quantity whether one It causes, if the quantity of the field of first table records corresponding quantity variance letter more than the quantity of the field of second table Breath;
Second judgment module 22 is sentenced if the quantity of the field of first table is consistent with the quantity of field of the second table The corresponding data type of each field in disconnected first table, data type corresponding with each field in second table It is whether consistent, if inconsistent, record corresponding data type different information.
The specific embodiment of the present invention, the system also includes mapping blocks, establish the library in the MySQL database Name, library name, the mapping relations of table name of table name and the Hive databases.
The acquisition module includes the first table acquiring unit and the second table acquiring unit.In the first table acquiring unit In, first table is provided in the IP address of the MySQL database, port numbers, library name and table name, and according to the IP Location, port numbers, library name and table name obtain first table.When the table name for providing the first table in MySQL database, obtain The quantity of the field of the first table in MySQL database.The second table acquiring unit, according to the mapping block The mapping relations of MySQL database and the Hive databases accordingly obtain the second table of the Hive databases.Accordingly Ground obtains the table name of the second table of the Hive databases, obtains the quantity of the field of second table.When user needs to test When data pick-up quality, the difference of MySQL database described in comparison and the data list structure of the Hive databases, user are needed Only need IP address, port numbers, library name and the table name of input MySQL database.
First judgment module judges that the quantity of the quantity and the field of second table of the field of first table is It is no consistent.If it is poor to record corresponding quantity more than the quantity of the field of second table for the quantity of the field of first table Different information.If it is poor not record corresponding number less than the quantity of the field of second table for the quantity of the field of first table Different information.
In second judgment module, if the quantity one of the quantity of the field of first table and the field of second table It causes, then obtains the corresponding data type of each field in first table, and obtain and each word in second table The corresponding data type of section, and judge every in the corresponding data type of each field, with second table in first table Whether the corresponding data type of one field is consistent, if inconsistent, records corresponding data type different information.The present invention's One specific embodiment, if each field in the corresponding data type of each field, with second table in first table Whether corresponding data type is inconsistent, then judge data type inconsistent in first table in a preset data type In rule list, if not existing, corresponding data type different information is recorded.If inconsistent data type exists in first table In one preset data type rule table, then corresponding data type different information is not recorded.
In conclusion technical solution through the invention, substantially increases testing efficiency, better body is brought to user It tests.
It should be noted that above-described embodiment can be freely combined as needed.The above is only the preferred of the present invention Embodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the invention Under, several improvements and modifications can also be made, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of test method of data pick-up quality, which is characterized in that the method includes:
S1, obtain the first table in MySQL database field quantity and the second table in corresponding Hive databases The quantity of field, and judge whether the quantity of the field of first table is consistent with the quantity of the field of second table;
If it is poor to record corresponding quantity more than the quantity of the field of second table for the quantity of the field of S2, first table Different information;
If the quantity of the field of S3, first table is consistent with the quantity of field of the second table, first table is judged In the corresponding data type of each field, whether data type corresponding with each field in second table consistent, if It is inconsistent, then record corresponding data type different information.
2. the test method of data pick-up quality as described in claim 1, which is characterized in that include before the step S1: Establish library name, library name, the mapping relations of table name of table name and the Hive databases in the MySQL database.
3. the test method of data pick-up quality as claimed in claim 2, which is characterized in that the step S1 further includes:
First table is provided in the IP address of the MySQL database, port numbers, library name and table name, and according to the IP Location, port numbers, library name and table name obtain first table;
According to the mapping relations of the MySQL database and the Hive databases, the Hive databases are accordingly obtained Second table.
4. the test method of data pick-up quality as described in claim 1, which is characterized in that the step S2 further includes:
If the quantity of the field of first table does not record corresponding quantity variance less than the quantity of the field of second table Information.
5. the test method of data pick-up quality as described in claim 1, which is characterized in that the step S4 further includes:
If the corresponding data type of each field in first table, data corresponding with each field in second table Whether Type-Inconsistencies then judge data type inconsistent in first table in a preset data type rule table;
If not existing, corresponding data type different information is recorded.
6. the test method of data pick-up quality as claimed in claim 5, which is characterized in that the step S4 further includes:
If inconsistent data type does not record corresponding number in a preset data type rule table in first table According to type difference information.
7. the test method of data pick-up quality as claimed in claim 6, which is characterized in that the method further includes:
A difference table is created, by the data type different information in the quantity variance information of the step 2 and the step S3 It is recorded in the difference table.
8. a kind of test system of data pick-up quality, which is characterized in that the system comprises:
Acquisition module, obtain the quantity of the field of the first table in MySQL database and in corresponding Hive databases the The quantity of the field of two tables;
First judgment module judges whether the quantity of the field of first table is consistent with the quantity of the field of second table, If the quantity of the field of first table records corresponding quantity variance information more than the quantity of the field of second table;
Second judgment module judges if the quantity of the field of first table is consistent with the quantity of field of the second table The corresponding data type of each field in first table, data type corresponding with each field in second table are It is no consistent, if inconsistent, record corresponding data type different information.
9. the test system of data pick-up quality as claimed in claim 8, which is characterized in that the system also includes mapping moulds Block establishes library name, library name, the mapping relations of table name of table name and the Hive databases in the MySQL database.
10. the test system of data pick-up quality as claimed in claim 9, which is characterized in that the acquisition module includes the One table acquiring unit and the second table acquiring unit;
The first table acquiring unit, provide first table the IP address of the MySQL database, port numbers, library name and Table name, and first table is obtained according to the IP address, port numbers, library name and table name;
The second table acquiring unit, according to reflecting for MySQL database described in the mapping block and the Hive databases Relationship is penetrated, the second table of the Hive databases is accordingly obtained.
CN201810305163.9A 2018-04-08 2018-04-08 A kind of test method and system of data pick-up quality Pending CN108595541A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810305163.9A CN108595541A (en) 2018-04-08 2018-04-08 A kind of test method and system of data pick-up quality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810305163.9A CN108595541A (en) 2018-04-08 2018-04-08 A kind of test method and system of data pick-up quality

Publications (1)

Publication Number Publication Date
CN108595541A true CN108595541A (en) 2018-09-28

Family

ID=63621150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810305163.9A Pending CN108595541A (en) 2018-04-08 2018-04-08 A kind of test method and system of data pick-up quality

Country Status (1)

Country Link
CN (1) CN108595541A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140645A1 (en) * 2019-01-03 2020-07-09 深圳壹账通智能科技有限公司 Abnormal data provision detection method and apparatus based on data migration, and terminal device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050030848A (en) * 2003-09-26 2005-03-31 마이크로소프트 코포레이션 Method for maintaining information about multiple instances of an activity
CN102945262A (en) * 2012-10-19 2013-02-27 大唐移动通信设备有限公司 Comparing method and device for RNC (Radio Network Controller) configuration data
CN104281704A (en) * 2014-10-22 2015-01-14 新华瑞德(北京)网络科技有限公司 Database data copying method and device
CN106649333A (en) * 2015-10-29 2017-05-10 阿里巴巴集团控股有限公司 Method and device for consistency testing of field sequence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050030848A (en) * 2003-09-26 2005-03-31 마이크로소프트 코포레이션 Method for maintaining information about multiple instances of an activity
CN102945262A (en) * 2012-10-19 2013-02-27 大唐移动通信设备有限公司 Comparing method and device for RNC (Radio Network Controller) configuration data
CN104281704A (en) * 2014-10-22 2015-01-14 新华瑞德(北京)网络科技有限公司 Database data copying method and device
CN106649333A (en) * 2015-10-29 2017-05-10 阿里巴巴集团控股有限公司 Method and device for consistency testing of field sequence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
兰天云: "多源数据采集与分析系统的设计与实现", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, 15 March 2018 (2018-03-15), pages 139 - 120 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140645A1 (en) * 2019-01-03 2020-07-09 深圳壹账通智能科技有限公司 Abnormal data provision detection method and apparatus based on data migration, and terminal device

Similar Documents

Publication Publication Date Title
González López de Murillas et al. Connecting databases with process mining: a meta model and toolset
US7774295B2 (en) Database track history
US20160224594A1 (en) Schema Definition Tool
US20030233365A1 (en) System and method for semantics driven data processing
Vyawahare et al. A hybrid database approach using graph and relational database
CN110291517A (en) Query language interoperability in chart database
CN107122360A (en) Data mover system and method
CN106716416A (en) Data retrieval apparatus, program and recording medium
US10628421B2 (en) Managing a single database management system
US9141251B2 (en) Techniques for guided access to an external distributed file system from a database management system
Rodzi et al. Significance of data integration and ETL in business intelligence framework for higher education
HUP0004097A2 (en) Method and system for generating corporate information, as well as providing it for system users
US10459987B2 (en) Data virtualization for workflows
WO2011111532A1 (en) Database system
Wijaya et al. An overview and implementation of extraction-transformation-loading (ETL) process in data warehouse (Case study: Department of agriculture)
Ruggles et al. Harmonization of census data: Ipums–international
CN108595541A (en) A kind of test method and system of data pick-up quality
Yasser et al. Implementing Business Intelligence System-Case Study
JP2023548152A (en) System and method for providing a query execution debugger for use in a data analysis environment
Yafooz et al. FlexiDC: a flexible platform for database conversion
JP7428599B2 (en) System construction support device and method
KR102605931B1 (en) Method for processing structured data and unstructured data on a plurality of databases and data processing platform providing the method
Bernal et al. A test model for database architectures: an assessment for job search engine systems
AU2004202620B2 (en) Database interactions and applications
Hámori Data Reconciliation Between the Database of an SAP ECC System and a Data Lakehouse: A Monitoring System for Nokia’s Enterprise Data Platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180928

WD01 Invention patent application deemed withdrawn after publication