CN108595541A

CN108595541A - A kind of test method and system of data pick-up quality

Info

Publication number: CN108595541A
Application number: CN201810305163.9A
Authority: CN
Inventors: 胡丽英
Original assignee: Shanghai Kangfei Information Technology Co Ltd
Current assignee: Shanghai Kangfei Information Technology Co Ltd
Priority date: 2018-04-08
Filing date: 2018-04-08
Publication date: 2018-09-28

Abstract

The invention discloses a kind of test method of data pick-up quality, the method includes：The quantity of the quantity of the field of the first table in MySQL database and the field of the second table in corresponding Hive databases is obtained, and judges whether the quantity of the field of first table is consistent with the quantity of the field of second table；If the quantity of the field of first table records corresponding quantity variance information more than the quantity of the field of second table；If the quantity of the field of first table is consistent with the quantity of field of the second table, then judge the corresponding data type of each field in first table, whether data type corresponding with each field in second table is consistent, if inconsistent, corresponding data type different information is then recorded, the testing efficiency of data pick-up quality is greatly improved.

Description

A kind of test method and system of data pick-up quality

Technical field

The invention belongs to database technical fields, the more particularly to test method and system of data pick-up quality.

Background technology

MySQL is a Relational DBMS, is developed by MySQL AB companies of Sweden.MySQL is in WEB Application aspect, MySQL are best RDBMS (Relational Database Management System, relation datas Base management system) application software.Relational database saves the data in different tables, rather than all data are placed on one In a big warehouse, which adds speed and flexibility is improved.SQL language used in MySQL is for accessing number According to the most frequently used standardized language in library.MySQL softwares use double authorization policies, are divided into Community Edition and commercial version, due to its body Product is small, speed is fast, the total cost of ownership is low, especially this feature of open source code, and the exploitation of general middle-size and small-size website all selects MySQL is as site databases.

The English name of data warehouse is Data Warehouse, can be abbreviated as DW or DWH.Data warehouse is for enterprise The decision-making process of all ranks of industry provides the strategy set of all types data support.It is individual data storage, for Analytical presentation and decision support purpose and create.To need the enterprise of business intelligence, service guidance flow scheme improvements, prison are provided Between apparent time, cost, quality and control.

Data warehouse, be database largely in the presence of, for further mining data resource, in order to certainly Plan need and generate, it is not so-called " large database ".The purpose of the scheme construction of data warehouse, is looked into for front end Based on asking and analyzing, due to there is larger redundancy, so the storage needed is also larger.In order to preferably be taken for front end applications Business, data warehouse often following points feature：

1, efficiency is sufficiently high.The analysis data of data warehouse are generally divided into day, week, the moon, season, year etc., it can be seen that day is the period Data demand efficiency highest, it is desirable that client can see the data analysis of yesterday even in 12 hours within 24 hours.

2, the quality of data.The various information that data warehouse is provided want accurate data certainly, but due to data warehouse stream Journey is generally divided into multiple steps, including data cleansing, loads, and inquiry shows etc., complicated framework can more levels, then Since data source has dirty data or code not rigorous, data distortion is can result in, client sees that the information of mistake may The decision for causing analysis to make mistake, causes damages, rather than benefit.

3, autgmentability.The large data warehouse system architecture design why having is complicated, is since it is considered that the following 3-5 The autgmentability in year, in this case, future do not have to spend very much to rebuild data warehouse soon, can very stable operation.Main body The reasonability of present data modeling has more some middle layers in Data Warehouse Plan, so that mass data flow is had enough bufferings, no It is much larger as data volume, just it can not run.

Quality of data feature about above-mentioned data warehouse depends on data pick-up quality, establishes On Line Analysis Process Using before, needing the data pick-up by each autonomous system to come out, by certain conversion and filtering, it is stored in a concentration Place, become data warehouse.This extracts, converts, the process of load is ETL（Extract, Transform, Load）.

Hive is built upon the data warehouse base frame on Hadoop.It provides a series of tool, Ke Yiyong To carry out data extraction conversion load（ETL）, this is a kind of big rule that can be stored, inquire and analyze and be stored in Hadoop The mechanism of modulus evidence.Hive defines simple class SQL query languages, and referred to as HQL, it allows the user for being familiar with SQL Inquire data.Meanwhile this language also allow to be familiar with MapReduce developer the customized mapper of exploitation and Reducer handles the analysis work of the impossible complexity of built-in mapper and reducer.

Tester needs test data to extract quality, and traditional test method needs carry out table sample testing, right manually Whether the structure than each table extracts success, and whether the type conversion of table is correct, and testing efficiency is very low.

Therefore, the present invention proposes that a kind of automatic test data extracts the technical solution of quality, greatly improves data pick-up matter The testing efficiency of amount.

Invention content

In consideration of it, the purpose of the present invention is to provide a kind of test method and system of data pick-up quality, greatly improve The testing efficiency of data pick-up quality.

According to foregoing invention purpose, the present invention provides a kind of test method of data pick-up quality, the method includes：

S1, obtain the first table in MySQL database field quantity and the second table in corresponding Hive databases The quantity of field, and judge whether the quantity of the field of first table is consistent with the quantity of the field of second table；

If it is poor to record corresponding quantity more than the quantity of the field of second table for the quantity of the field of S2, first table Different information；

If the quantity of the field of S3, first table is consistent with the quantity of field of the second table, first table is judged In the corresponding data type of each field, whether data type corresponding with each field in second table consistent, if It is inconsistent, then record corresponding data type different information.

Preferably, include before the step S1：Establish library name, table name and the Hive in the MySQL database The library name of database, the mapping relations of table name.

Preferably, the step S1 further includes：

First table is provided in the IP address of the MySQL database, port numbers, library name and table name, and according to the IP Location, port numbers, library name and table name obtain first table；

According to the mapping relations of the MySQL database and the Hive databases, the Hive databases are accordingly obtained Second table.

Preferably, the step S2 further includes：

If the quantity of the field of first table does not record corresponding quantity variance less than the quantity of the field of second table Information.

Preferably, the step S4 further includes：

If the corresponding data type of each field in first table, data corresponding with each field in second table Whether Type-Inconsistencies then judge data type inconsistent in first table in a preset data type rule table；

If not existing, corresponding data type different information is recorded.

Preferably, the step S4 further includes:

If inconsistent data type does not record corresponding number in a preset data type rule table in first table According to type difference information.

Preferably, the method further includes：

A difference table is created, by the data type different information in the quantity variance information of the step 2 and the step S3 It is recorded in the difference table.

According to foregoing invention purpose, the present invention provides a kind of test system of data pick-up quality, the system comprises：

Acquisition module, obtain the quantity of the field of the first table in MySQL database and in corresponding Hive databases the The quantity of the field of two tables；

First judgment module judges whether the quantity of the field of first table is consistent with the quantity of the field of second table, If the quantity of the field of first table records corresponding quantity variance information more than the quantity of the field of second table；

Second judgment module judges if the quantity of the field of first table is consistent with the quantity of field of the second table The corresponding data type of each field in first table, data type corresponding with each field in second table are It is no consistent, if inconsistent, record corresponding data type different information.

Preferably, the system also includes mapping block, establish library name in the MySQL database, table name with it is described The library name of Hive databases, the mapping relations of table name.

Preferably, the acquisition module includes the first table acquiring unit and the second table acquiring unit；

The first table acquiring unit, provide first table the IP address of the MySQL database, port numbers, library name and Table name, and first table is obtained according to the IP address, port numbers, library name and table name；

The second table acquiring unit, according to reflecting for MySQL database described in the mapping block and the Hive databases Relationship is penetrated, the second table of the Hive databases is accordingly obtained.

Compared with prior art, the test method and system of a kind of data pick-up quality provided by the invention has following Advantageous effect：When test data extracts quality, the knot of the source data table by judging the MySQL database and Hive databases Structure difference, user need to only input the information of corresponding MySQL database, execute the technical solution, so that it may obtain the two data The differentiation information in library, including the row name information that increases or delete, the information such as difference of same column name data type, the test Scheme whole-course automation greatly improves the testing efficiency of data pick-up quality, saves human cost, brought convenience to user.

Description of the drawings

Below by a manner of clearly understandable, preferred embodiment is described with reference to the drawings, to a kind of data pick-up quality Above-mentioned characteristic, technical characteristic, advantage and its realization method of test method and system are further described.

Fig. 1 is a kind of flow chart of the test method of data pick-up quality of the present invention；

Fig. 2 is a kind of composed structure schematic diagram of the test system of data pick-up quality of the present invention.

Specific implementation mode

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, control is illustrated below The specific implementation mode of the present invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing, and obtain other embodiments.

To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented Its practical structures as product.In addition, so that simplified form is easy to understand, there is identical structure or function in some figures Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated " only this ", can also indicate the situation of " more than one ".

As shown in Figure 1, one embodiment of the present of invention, a kind of test method of data pick-up quality, the method includes：

Specifically, include before the step S1：Establish library name, table name and the Hive in the MySQL database The library name of database, the mapping relations of table name.According to the library name in the MySQL database, table name, the MySQL numbers are obtained According to the corresponding field of the table name, the data type of field in library.By establishing the MySQL database and the Hive numbers According to the mapping relations in library, library name, the table name of the Hive databases are correspondingly obtained, and then obtain in the Hive databases The corresponding field of the table name, the data type of field.

Multiple MySQL databases are typically provided, each MySQL database has corresponding IP address and port numbers.Described Each MySQL database is equipped with multiple databases, and each database is provided with corresponding library name.Each database is equipped with multiple Table, each table are provided with corresponding table name.There is provided first table the MySQL database IP address, port numbers, library name And table name, and first table is obtained according to the IP address, port numbers, library name and table name.According to the MySQL database with The mapping relations of the Hive databases accordingly obtain the second table of the Hive databases.When user needs test data When extracting quality, the difference of MySQL database described in comparison and the data list structure of the Hive databases, user is needed only to need IP address, port numbers, library name and the table name of MySQL database are inputted, according to the technique and scheme of the present invention, so that it may to obtain this The different information of the data structure table of two databases.

When the table name for providing the first table in MySQL database, the field of the first table in MySQL database is obtained Quantity.Correspondingly, the table name of the second table of the Hive databases is obtained, the quantity of the field of second table is obtained, and is sentenced Whether the quantity of the field of disconnected first table is consistent with the quantity of the field of second table.If the field of first table Quantity then records corresponding quantity variance information more than the quantity of the field of second table.If the field of first table Quantity does not record corresponding quantity variance information then less than the quantity of the field of second table.

If the quantity of the field of first table is consistent with the quantity of field of the second table, first table is obtained In the corresponding data type of each field, and obtain corresponding with each field in second table data type, and Judge the corresponding data type of each field in first table, data class corresponding with each field in second table Whether type is consistent, if inconsistent, records corresponding data type different information.

The specific embodiment of the present invention, if the corresponding data type of each field in first table, with described the The corresponding data type of each field is inconsistent in two tables, then judge data type inconsistent in first table whether In one preset data type rule table, if not existing, corresponding data type different information is recorded.If in first table not Consistent data type does not record corresponding data type different information then in a preset data type rule table.

The specific embodiment of the present invention creates a difference table, by the quantity variance information of the step 2 and described Data type different information in step S3 is recorded in the difference table.

According to the technical solution, when test data extracts quality, by judging the MySQL database and Hive databases Source data table architectural difference, user need to only input the information of corresponding MySQL database, execute the technical solution, so that it may The differentiation information of the two databases is obtained, which greatly improves the test of data pick-up quality Efficiency saves human cost, is brought convenience to user.

As shown in Fig. 2, one embodiment of the invention, a kind of test system of data pick-up quality, the system comprises：

Acquisition module 20, obtain MySQL database in the first table field quantity and corresponding Hive databases in The quantity of the field of second table；

First judgment module 21, judge the quantity of the field of first table and the field of second table quantity whether one It causes, if the quantity of the field of first table records corresponding quantity variance letter more than the quantity of the field of second table Breath；

Second judgment module 22 is sentenced if the quantity of the field of first table is consistent with the quantity of field of the second table The corresponding data type of each field in disconnected first table, data type corresponding with each field in second table It is whether consistent, if inconsistent, record corresponding data type different information.

The specific embodiment of the present invention, the system also includes mapping blocks, establish the library in the MySQL database Name, library name, the mapping relations of table name of table name and the Hive databases.

The acquisition module includes the first table acquiring unit and the second table acquiring unit.In the first table acquiring unit In, first table is provided in the IP address of the MySQL database, port numbers, library name and table name, and according to the IP Location, port numbers, library name and table name obtain first table.When the table name for providing the first table in MySQL database, obtain The quantity of the field of the first table in MySQL database.The second table acquiring unit, according to the mapping block The mapping relations of MySQL database and the Hive databases accordingly obtain the second table of the Hive databases.Accordingly Ground obtains the table name of the second table of the Hive databases, obtains the quantity of the field of second table.When user needs to test When data pick-up quality, the difference of MySQL database described in comparison and the data list structure of the Hive databases, user are needed Only need IP address, port numbers, library name and the table name of input MySQL database.

First judgment module judges that the quantity of the quantity and the field of second table of the field of first table is It is no consistent.If it is poor to record corresponding quantity more than the quantity of the field of second table for the quantity of the field of first table Different information.If it is poor not record corresponding number less than the quantity of the field of second table for the quantity of the field of first table Different information.

In second judgment module, if the quantity one of the quantity of the field of first table and the field of second table It causes, then obtains the corresponding data type of each field in first table, and obtain and each word in second table The corresponding data type of section, and judge every in the corresponding data type of each field, with second table in first table Whether the corresponding data type of one field is consistent, if inconsistent, records corresponding data type different information.The present invention's One specific embodiment, if each field in the corresponding data type of each field, with second table in first table Whether corresponding data type is inconsistent, then judge data type inconsistent in first table in a preset data type In rule list, if not existing, corresponding data type different information is recorded.If inconsistent data type exists in first table In one preset data type rule table, then corresponding data type different information is not recorded.

In conclusion technical solution through the invention, substantially increases testing efficiency, better body is brought to user It tests.

It should be noted that above-described embodiment can be freely combined as needed.The above is only the preferred of the present invention Embodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the invention Under, several improvements and modifications can also be made, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims

1. a kind of test method of data pick-up quality, which is characterized in that the method includes：

2. the test method of data pick-up quality as described in claim 1, which is characterized in that include before the step S1： Establish library name, library name, the mapping relations of table name of table name and the Hive databases in the MySQL database.

3. the test method of data pick-up quality as claimed in claim 2, which is characterized in that the step S1 further includes：

4. the test method of data pick-up quality as described in claim 1, which is characterized in that the step S2 further includes：

5. the test method of data pick-up quality as described in claim 1, which is characterized in that the step S4 further includes：

If not existing, corresponding data type different information is recorded.

6. the test method of data pick-up quality as claimed in claim 5, which is characterized in that the step S4 further includes:

7. the test method of data pick-up quality as claimed in claim 6, which is characterized in that the method further includes：

8. a kind of test system of data pick-up quality, which is characterized in that the system comprises：

9. the test system of data pick-up quality as claimed in claim 8, which is characterized in that the system also includes mapping moulds Block establishes library name, library name, the mapping relations of table name of table name and the Hive databases in the MySQL database.

10. the test system of data pick-up quality as claimed in claim 9, which is characterized in that the acquisition module includes the One table acquiring unit and the second table acquiring unit；