CN113094343A

CN113094343A - Data processing method, device and system

Info

Publication number: CN113094343A
Application number: CN202110370682.5A
Authority: CN
Inventors: 邓华丰; 廖宸; 阮文浩; 郭润文
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2021-07-09
Anticipated expiration: 2041-04-07
Also published as: CN113094343B

Abstract

The invention discloses a data processing method, a device and a system, which relate to the technical field of big data, wherein the method comprises the following steps: in response to the update of the table structure of the source data, acquiring the source data stored in the database according to the field information of the preset table structure; generating at least one binary file from the acquired source data according to the size of a preset single file; generating a spliced file according to the at least one binary file, the file name of the binary file and the table structure field information; and before the data hosting operation is executed, the spliced file is transmitted to a preset position in the data hosting system according to preset data mapping information. The method and the device can reduce the error reporting rate of the managed data and improve the success rate of file transmission.

Description

Data processing method, device and system

Technical Field

The invention relates to the technical field of big data, in particular to a data processing method, device and system.

Background

At present, the large data receiving and management needs to register information such as a table structure in advance to generate a lake entering table, the receiving and management process is realized by reading data and file content which is registered in advance and sent from a server of the opposite side, so that when the table structure corresponding to each managed table is changed, the table structure cannot be synchronized to a large data warehouse in real time, and the problem of receiving and managing errors is caused.

Disclosure of Invention

Accordingly, the present invention is directed to a data processing method, apparatus and system for solving at least one of the above problems.

According to a first aspect of the present invention, there is provided a data processing method, the method comprising:

in response to the update of the table structure of the source data, acquiring the source data stored in the database according to the field information of the preset table structure;

generating at least one binary file from the acquired source data according to the size of a preset single file;

generating a spliced file according to the at least one binary file, the file name of the binary file and the table structure field information;

and before the data hosting operation is executed, the spliced file is transmitted to a preset position in the data hosting system according to preset data mapping information.

According to a second aspect of the present invention, there is provided a data processing apparatus, the apparatus comprising:

a source data acquisition unit for acquiring source data stored in the database according to predetermined table structure field information in response to a source data table structure update;

a binary file generating unit for generating at least one binary file from the acquired source data according to a predetermined single file size;

the spliced file generating unit is used for generating a spliced file according to the at least one binary file, the file name of the binary file and the table structure field information;

and the spliced file transmission unit is used for transmitting the spliced file to a preset position in the data storage system according to preset data mapping information before the data storage operation is executed.

According to a third aspect of the present invention there is provided a data processing system, the system comprising: the data processing device, the database and the data storage and management system are provided.

According to a fourth aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program.

According to a fifth aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

According to the technical scheme, when the structure of the source data table is updated, the source data is obtained according to the preset table structure field information, at least one binary file is generated according to the preset single file size, the spliced file is generated according to the binary file, the file name and the table structure field information of the binary file, and then the spliced file is transmitted to the preset position in the data receiving and managing system before the data receiving and managing operation is executed, so that the problem that the data receiving and managing system cannot be synchronized to the data receiving and managing system in real time when the structure of the source table changes in the prior art to cause receiving and managing error can be solved, the problem that the file cannot be transmitted due to the fact that the file is too large can be solved by generating a plurality of binary files, and through the technical scheme, the receiving and managing data error rate can be reduced, and the transmission success rate of the file can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow diagram of a data processing method according to an embodiment of the invention;

FIG. 2 is a block diagram of a data processing system according to an embodiment of the present invention;

FIG. 3 is a block diagram of a data processing apparatus according to an embodiment of the present invention;

FIG. 4 is a block diagram of an exemplary architecture of a data processing system according to an embodiment of the present invention;

FIG. 5 is a functional schematic diagram of an exemplary system for data processing according to an embodiment of the present invention;

fig. 6 is a schematic block diagram of a system configuration of an electronic apparatus 600 according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In view of the current data storage scheme, when the structure of the source table changes, the data storage system cannot be synchronized in real time, which results in the problem of storage error, and meanwhile, the situation that the file cannot be transmitted due to the fact that the size of the transmitted file is limited cannot be dealt with. Based on this, embodiments of the present invention provide a data processing scheme, by which the above-mentioned defects can be solved, an error rate of managed data is reduced, and a problem that a file cannot be transmitted due to an excessively large file is reduced. Embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, as shown in fig. 1, the flowchart including:

in response to the update of the table structure of the source data, the step 101 acquires the source data stored in the database according to the field information of the predetermined table structure.

The table structure field information here may be: table names, file names, etc.

Step 102, generating at least one binary file (BIN file) from the acquired source data according to a predetermined single file size.

Specifically, a plurality of binary files may be sequentially generated from the acquired source data according to a predetermined sequence number according to a predetermined single file size.

In one embodiment, a corresponding check file may be generated according to the file size of the at least one binary file, and used to check whether the binary file is complete.

Step 103, generating a concatenation file according to the at least one binary file, the file name of the binary file (for example, the file name includes a source table name, a file serial number, and the like), and the table structure field information.

And 104, before the data hosting operation is executed, transmitting the spliced file to a preset position in the data hosting system according to preset data mapping information.

The mapping information herein may include: table names, file names, associations between file names in the hosting system.

And when the data corresponding to the table name and the file name of the spliced file does not exist in the data storage and management system, the spliced file is indicated to be stored in the data storage and management system for the first time, and the spliced file is directly stored in the data storage and management system. When historical data corresponding to the table name and the file name of the spliced file exists in the data management system, the spliced file is indicated to be updating data, the spliced file is stored in a position corresponding to the historical data, and updating operation is performed on the historical data, wherein the operation includes but is not limited to: delete, replace, add, etc.

And then, updating the data mapping information according to the splicing file.

When the structure of a source data table is updated, source data is obtained according to preset table structure field information, at least one binary file is generated according to the size of a preset single file, then a spliced file is generated according to the binary file, the file name of the binary file and the table structure field information, and then the spliced file is transmitted to a preset position in a data nano-tube system before data nano-tube operation is executed, so that the problem of error report of nano-tube caused by the fact that the binary file cannot be synchronized to the data nano-tube system in real time when the structure of the source table changes in the prior art can be solved, the problem that the file cannot be transmitted due to the fact that the file is too large can be solved by generating a plurality of binary files.

In one embodiment, after the at least one binary File is generated in step 102, the at least one binary File may also be directly transmitted to the data hosting system via a File Transfer Protocol (FTP).

In the FTP transfer, the transferred file is preferably backed up regardless of success or failure of the file transfer.

In practical operation, data index information may also be generated in advance, and the data index information includes: table structure field information; and then, when the query operation is required, the query operation can be executed on the database according to the data index information.

Based on similar inventive concepts, the embodiment of the present invention further provides a data processing system, as shown in fig. 2, the system includes: a data processing device 1, a database 2 and a data hosting system 3, wherein the data processing device 1 is preferably operable to implement the procedures in the above-described method embodiments.

Fig. 3 is a block diagram showing the configuration of the data processing apparatus 1, and as shown in fig. 3, the data processing apparatus 1 includes: a source data obtaining unit 11, a binary file generating unit 12, a spliced file generating unit 13, and a spliced file transmitting unit 14, wherein:

a source data obtaining unit 11, configured to, in response to the source data table structure update, obtain the source data stored in the database according to predetermined table structure field information.

A binary file generating unit 12, configured to generate at least one binary file from the acquired source data according to a predetermined single file size.

Specifically, the binary file generating unit sequentially generates a plurality of binary files from the acquired source data according to a predetermined sequence number, based on a predetermined single file size.

And the spliced file generating unit 13 is configured to generate a spliced file according to the at least one binary file, the file name of the binary file, and the table structure field information.

And the spliced file transmission unit 14 is used for transmitting the spliced file to a preset position in the data storage system according to preset data mapping information before the data storage operation is executed.

When the structure of the source data table is updated, the source data acquisition unit 11 acquires source data according to the field information of the preset table structure, the binary file generation unit 12 generates at least one binary file according to the size of a preset single file from the acquired source data, then the spliced file generation unit 13 generates a spliced file according to the binary file and the file name and the field information of the table structure, and then before the data nano tube operation is executed, the spliced file transmission unit 14 transmits the spliced file to a preset position in the data nano tube system, so that the problem of error report of nano tube caused by the fact that the source table structure cannot be synchronized to the data nano tube system in real time when changed in the prior art can be solved, the problem that the file cannot be transmitted due to the fact that the file is too large can be solved by generating a plurality of binary files, and the error report rate of the nano tube data can be reduced by the embodiment of the present invention, the success rate of file transmission is improved.

In one embodiment, the data processing apparatus 1 further includes: the binary file transmission unit is used for transmitting the at least one binary file to the data nano-tube system through a file transmission protocol.

In a specific implementation process, the data processing apparatus 1 further includes: index information generation unit and inquiry unit, wherein: an index information generation unit configured to generate data index information in advance, the data index information including: table structure field information; and the query unit is used for executing query operation on the database according to the data index information.

Preferably, the data processing apparatus 1 further includes: and the check file generating unit is used for generating the check file of the at least one binary file according to the file size of the at least one binary file.

Further, the data processing apparatus 1 further includes: and the mapping information updating unit is used for updating the data mapping information according to the splicing file.

For specific execution processes of the units and the modules, reference may be made to the description in the foregoing method embodiments, and details are not described here again.

In practical operation, the units and the modules may be combined or may be singly arranged, and the present invention is not limited thereto.

For a better understanding of the present invention, a data processing exemplary system is described in detail below in conjunction with FIG. 4.

As shown in fig. 4, the example system includes: data acquisition system, data transmission system and data storage system, wherein:

the data acquisition system is used for acquiring a table structure of a source database, generating a file, wherein the content of the file is a text spliced according to the space supplemented before the number of characters of a table structure field, and supporting condition query and multiple data sources.

And the data transmission system is used for realizing data interaction between the data acquisition system and the data management system, can be a KAFKA (high throughput distributed publish-subscribe message system) system, and realizes that information and files acquired by the data acquisition system are transmitted to the data management system, and alarms and logs are output when the time is out.

The data storage and management system comprises: and uploading the acquired data file to a Hadoop (a distributed system infrastructure) distributed file system according to a preset logic.

The working principle of the exemplary system is described in detail below in connection with fig. 5.

As shown in fig. 5, the working principle of the exemplary system includes:

1. pretreatment of

Step 1-1, registering lake table data in advance, generating an incidence relation between a source table and a Hadoop table structure, wherein the incidence relation can be a mapping relation table sql existing in a table form, and facilitating subsequent operations of inquiring, obtaining data and the like. And generating a lake entering information table, wherein the lake entering information table comprises a Hadoop library name, a lake entering date (a first lake entering date), a lake entering state and data states of 0 (0: not entering the lake, 1: entering the lake and 2: entering the lake).

And 1-2, opening firewalls of file servers of the data acquisition system and the data storage management system so as to send files by using the FTP.

2. Working principle of data acquisition system

Step 2-1, the multiple data sources may query the table structure and index of the database by using information.

show keys from'${tableName}'

select column_name,column_type,column_comment,is_nullable from information_schema.COLUMNS where table_schema＝DATABASE()and table_name＝'${tableName}'

And 2-2, comparing field information and indexes of the same tables of the plurality of data sources, if the field information and the indexes are not consistent, indicating that the tables of the plurality of data sources are different, sending an alarm, and if the field information and the indexes are consistent, executing the step 2-3.

And 2-3, configuring query database data conditions, acquiring table structure field information and indexes by using a mapping relation table sql in preprocessing, placing the table structure field information and the indexes in xml (extensible markup language) as a selected query condition for splicing, and generating a file name of table name _ file serial number _ lake entering date.

Sorting and adding the queried table structure field orderby, splicing the sorted and added table structure field orderby and the spliced table structure field as the data of the field query source database of the database to generate a BIN file (namely, the binary file), wherein the example code for querying sql is as follows:

select${columnName}from${tableName}where PARTID BETWEEN${minPartid}AND${maxPartid}order by${primaryKey}LIMIT${beginRow},${line}

and 2-4, in actual operation, setting the maximum capacity of a single file, customizing the size of the file, and configuring the file in the lake-entering table name xml to facilitate the query of the program. Specifically, the field length may be used to calculate a file size of the number of rows of data allowed to be written in a single file, which is a file size/query row number per writing, in advance, and if the file size exceeds the file size of the number of rows of data allowed to be written in the single file, a next file may be generated according to the sequence number.

And 2-5, generating a CHK check file for each BIN file, storing the check files in a fixed directory of the local server, wherein the CHK check name is the same as the BIN file name, and only the suffix of the file name is different. The CHK check file can be generated according to the file size of the BIN file and used for verifying the integrity of the BIN file subsequently.

And 2-6, transmitting the BIN file to the data storage and management system in an FTP file transmission mode in the data transmission system.

And 2-7, splicing the first lake entering date, the table name, the index, the field name and the field length of the corresponding field in the preprocessing, and the file name of the source table by using a splicing character.

3. Operating principle of data transmission system

And 3-1, for the KAFKA system, the spliced text in the step 2-7 of the data acquisition system is transmitted to the data receiving and managing system, the pushing mode is asynchronous, the message is not consumed within 3 days, an alarm is given out, and meanwhile, the unconsumed information is written into a log table.

And 3-2, for the FTP file transmission mode, an automatic trigger mode can be used, a sending directory of the data acquisition system is scanned regularly, when the files are completely generated in folders of a local file server, the files are automatically sent to a file server of the data storage system, if the files are sent successfully, the files are moved to a successful directory (sending directory plus. succ), and if the files are sent unsuccessfully, the files are moved to a failed directory (sending directory plus. fail) for backup.

4. Working principle of data storage system

And 4-1, regularly reading the spliced text obtained in the step 2-7 by the KAFKA system, and intercepting the obtained data by using a splicing character for later use.

And 4-2, comparing the field information of the database, specifically, inquiring a field information table according to the acquired data, executing the step 4-3 if the field information of the data is different from the field information of the database, and otherwise, executing the step 4-4.

And 4-3, when the field information table is inquired, if the state of the field information table is 1 (namely, the table field is changed), transferring the old table data to the history table, and then transferring the HIVE (the HIVE is an sql analysis engine based on Hadoopd, and can convert sql statements into an M-R relation and then be executed in Hadoop) corresponding table structure. In actual operation, the data File of the HDFS (Hadoop Distributed File System) is not modified, but only the table definition in the metadata is modified, i.e., the M-R (object-relationship) data mapping is performed on the data File content of the HDFS in sequence, and the post-field state is 0.

And 4-4, inquiring a field information table according to the acquired data, if the field information is the same, directly putting the data into a preset data table, and setting the table state field information to be 0 (namely, the table field is not changed). And meanwhile, updating the first lake entering date, the table name, the index and the lake entering file name to a lake entering information association mapping table for storage.

And 4-5, reading the table file name of the lake entering information association mapping table, and scanning the files in the local server directory at regular time according to the file name.

And 4-6, transmitting the local BIN file to a server of the distributed file system HDFS in a streaming mode by using a jar package (Hadoop-client. jar, Hadoop-common. jar, Hadoop-hadfs. jar, Hadoop-map-client-core. jar) provided by the Hadoop for the corresponding file, sequentially setting data states 1 (entering the lake) and 2 (entering the lake), and setting the lake entering date as the next day when the data states are successful.

And 4-7, based on field information in the HIVE lookup TABLE structure information TABLE, sequentially mapping the data file content of the HDFS into a TABLE (TABLE).

The example system provided by the embodiment performs dynamic large data hosting based on KAFKA, on one hand, when a source table is changed temporarily, since a table structure of Hadoopd consumes field information pushed to KAFKA from a data acquisition system every day, HDFS can be moved after the structure of the source table is updated and before data is hosted, hosting error reporting is avoided, and for the case that the traffic volume is larger and larger, the problem that a single file cannot be transmitted due to overlarge size can be well solved by writing data in a sequence number dynamic newly-added file, so that the maintenance cost is reduced, on the other hand, the system configures table names, query conditions, data sources and other configurations, and improves the operability of the system.

The present embodiment also provides an electronic device, which may be a desktop computer, a tablet computer, a mobile terminal, and the like, but is not limited thereto. In this embodiment, the electronic device may be implemented with reference to the above method embodiments and the data processing apparatus/system embodiments, and the contents thereof are incorporated herein, and repeated descriptions are omitted.

Fig. 6 is a schematic block diagram of a system configuration of an electronic apparatus 600 according to an embodiment of the present invention. As shown in fig. 6, the electronic device 600 may include a central processor 100 and a memory 140; the memory 140 is coupled to the central processor 100. Notably, this diagram is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

In one embodiment, the data processing functions may be integrated into the central processor 100. The central processor 100 may be configured to control as follows:

As can be seen from the above description, the electronic device provided in the embodiments of the present application, through when the source data table structure is updated, acquiring source data according to the preset table structure field information, generating at least one binary file from the acquired source data according to the preset single file size, then generating a spliced file according to the binary file, the file name of the binary file and the table structure field information, then transmitting the spliced file to a preset position in the data nano-tube system before executing the data nano-tube operation, thereby overcoming the problem of error receiving and reporting caused by the failure of real-time synchronization to the data receiving system when the structure of the source table is changed in the prior art, the problem of file transfer failure due to too large files can also be overcome by generating multiple binary files, by the embodiment of the invention, the error reporting rate of the managed data can be reduced, and the success rate of file transmission is improved.

In another embodiment, the data processing apparatus/system may be configured separately from the central processor 100, for example, the data processing apparatus/system may be configured as a chip connected to the central processor 100, and the data processing function is realized by the control of the central processor.

As shown in fig. 6, the electronic device 600 may further include: communication module 110, input unit 120, audio processing unit 130, display 160, power supply 170. It is noted that the electronic device 600 does not necessarily include all of the components shown in FIG. 6; furthermore, the electronic device 600 may also comprise components not shown in fig. 6, which may be referred to in the prior art.

As shown in fig. 6, the central processor 100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 100 receiving input and controlling the operation of the various components of the electronic device 600.

The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 100 may execute the program stored in the memory 140 to realize information storage or processing, etc.

The input unit 120 provides input to the cpu 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used to display an object to be displayed, such as an image or a character. The display may be, for example, an LCD display, but is not limited thereto.

The memory 140 may be a solid state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 600 by the central processing unit 100.

The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).

The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and receive audio input from the microphone 132 to implement general telecommunications functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, an audio processor 130 is also coupled to the central processor 100, so that recording on the local can be enabled through a microphone 132, and so that sound stored on the local can be played through a speaker 131.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the data processing method.

In summary, the embodiments of the present invention provide a data processing scheme, which is mainly used to solve the defect that a source table cannot be synchronized to a big data management platform in real time when a structure is temporarily changed, and can dynamically cut and generate corresponding files according to numbers when the data amount is increased rapidly, so as to prevent the file from being unable to be transmitted due to too large file, and thus, the lake entering purpose can be achieved by only setting a configuration file.

The preferred embodiments of the present invention have been described above with reference to the accompanying drawings. The many features and advantages of the embodiments are apparent from the detailed specification, and thus, it is intended by the appended claims to cover all such features and advantages of the embodiments which fall within the true spirit and scope thereof. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the embodiments of the invention to the exact construction and operation illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope thereof.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of data processing, the method comprising:

2. The method of claim 1, wherein after generating the acquired source data into at least one binary file, the method further comprises:

and transmitting the at least one binary file to the data nano-tube system through a file transfer protocol.

3. The method of claim 1, further comprising:

pre-generating data index information, wherein the data index information comprises: table structure field information;

and executing query operation on the database according to the data index information.

4. The method of claim 1, wherein after generating the acquired source data into at least one binary file, the method further comprises:

and generating a check file of the at least one binary file according to the file size of the at least one binary file.

5. The method of claim 1, wherein generating the acquired source data into at least one binary file according to a predetermined single file size comprises:

and sequentially generating a plurality of binary files according to the preset sequence number from the acquired source data according to the preset single file size.

6. The method of claim 1, wherein after transmitting the mosaic file to a predetermined location in a data hosting system according to predetermined data mapping information, the method further comprises:

and updating the data mapping information according to the splicing file.

7. A data processing apparatus, characterized in that the apparatus comprises:

8. The apparatus of claim 7, further comprising:

the binary file transmission unit is used for transmitting the at least one binary file to the data nano-tube system through a file transmission protocol.

9. The apparatus of claim 7, further comprising:

an index information generation unit configured to generate data index information in advance, the data index information including: table structure field information;

and the query unit is used for executing query operation on the database according to the data index information.

10. The apparatus of claim 7, further comprising:

and the check file generating unit is used for generating the check file of the at least one binary file according to the file size of the at least one binary file.

11. The apparatus according to claim 7, wherein the binary file generating unit is specifically configured to:

12. The apparatus of claim 7, further comprising:

and the mapping information updating unit is used for updating the data mapping information according to the splicing file.

13. A data processing system, characterized in that the system comprises: the data processing apparatus, database and data hosting system of any of claims 7 to 12.

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 6 are implemented when the processor executes the program.

15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.