[go: up one dir, main page]

CN110019251A - A kind of data processing system, method and apparatus - Google Patents

A kind of data processing system, method and apparatus Download PDF

Info

Publication number
CN110019251A
CN110019251A CN201910224202.7A CN201910224202A CN110019251A CN 110019251 A CN110019251 A CN 110019251A CN 201910224202 A CN201910224202 A CN 201910224202A CN 110019251 A CN110019251 A CN 110019251A
Authority
CN
China
Prior art keywords
data
node
target data
processing
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910224202.7A
Other languages
Chinese (zh)
Inventor
李跃森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN201910224202.7A priority Critical patent/CN110019251A/en
Publication of CN110019251A publication Critical patent/CN110019251A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data processing systems, method and apparatus, belong to database technical field.It include mutually independent host node and standby node in the data processing system, host node is configured to processing OLTP business, and standby node is configured to processing OLAP business.Host node obtains target data according to the pending data with the format of row storage and progress OLTP processing, obtained target data is sent to standby node again, the format that the target data received is arranged storage by standby node stores, and OLAP processing is carried out to it, to, realize both different types of business of concurrent processing OLTP and OLAP in single database cluster, reduce data carrying cost, improve business processing efficiency, and OLTP business and OLAP business are executed by mutually independent two database nodes respectively, it can be to avoid resource contention by way of resource isolation, further improve business processing efficiency.

Description

Data processing system, method and equipment
Technical Field
The present invention relates to the field of database technologies, and in particular, to a data processing system, method, and device.
Background
Depending on the storage format of the data records, the data records may be stored in a relational database system using row stores or column stores. In online Transaction Processing (OLTP) transactions, row stores generally perform well, while column stores perform well On online analytical Processing (OLAP) queries because it only needs to read the data of those columns needed to process the query, which can greatly reduce disk Input/Output (I/O) operations.
In the current database architecture, a single database generally has only OLTP capability or OLAP capability, and when a copy of data needs to be processed by OLTP and OLAP simultaneously, a series of complex processes of copying, converting and loading the data from an operation database (i.e. a database for performing OLTP processing) to a data warehouse (i.e. a database for performing OLAP processing) are required, which has a large data storage cost, and the efficiency of transaction processing of OLTP and OLAP is low due to the need of cross-database operation.
Disclosure of Invention
The embodiment of the application provides a data processing system, a data processing method and data processing equipment, which are used for solving the technical problem that the service processing efficiency of the existing OLTP service and OLAP service is low.
In one aspect, there is provided a data processing system, the system comprising a primary node and a backup node, wherein:
the main node is used for performing online transaction processing on the data to be processed stored in a line storage format; obtaining target data according to the data to be processed; and sending the target data to the standby node;
the standby node is used for storing the received target data in a column storage format; and performing online analysis processing on the stored target data.
In one aspect, a data processing method is provided, and the method includes:
the method comprises the steps that a main node obtains target data according to-be-processed data for executing online transaction processing and sends the target data to a standby node, wherein the to-be-processed data are stored in the main node in a line storage format;
and the standby node receives the target data, stores the target data in a column storage format, and performs online analysis processing on the stored target data.
In one possible design, the primary node invokes a heterogeneous copy process; and extracting the target data from the data to be processed through the heterogeneous replication process, and sending the target data to the standby node.
In one possible design, if the target data is sent to the standby node for the first time, the master node determines the copied data obtained by extracting the data to be processed in full amount as the target data;
in a possible design, if the target data is sent to the standby node again after the first time, the master node obtains data used for representing that the data to be processed is updated, and determines the data as the target data.
In a possible design, if the target data is sent to the standby node again after the first time, the master node determines the replicated data obtained by incrementally extracting the to-be-processed data as the target data, or determines the log data corresponding to the to-be-processed data as the target data.
In one possible design, the master node obtains a heterogeneous replication instruction, wherein the heterogeneous replication instruction is used for indicating a data table needing to be replicated and a column needing to be replicated in the data table needing to be replicated; according to the heterogeneous replication instruction, determining a target data table to be replicated and a target column to be replicated in the target data table from the data table corresponding to the data to be processed; and obtaining the target data according to the target data table and the target list.
In one possible design, if the backup node sends the heterogeneous replication instruction to the primary node, the primary node receives the heterogeneous replication instruction sent by the backup node.
In one possible design, the master node receives the heterogeneous replication instruction sent by a coordinating node connected to the master node.
In one possible design, the master node determines data to be hidden from the standby node in the data to be processed; desensitization processing is carried out on the data needing to be hidden to the standby node, and the target data are obtained according to the data to be processed after desensitization processing.
In one possible design, when it is determined that a heterogeneous copy triggering condition is satisfied, the master node obtains the target data according to the data to be processed.
In one aspect, a data processing method is provided, and the method includes:
obtaining target data according to-be-processed data for executing online transaction processing, wherein the to-be-processed data is stored in a line storage format;
and sending the target data to a standby node so that the standby node stores the target data in a column storage format and executes online analysis processing.
In one possible design, obtaining the target data according to the data to be processed for performing the online transaction includes:
calling a heterogeneous copying process;
and extracting the target data from the data to be processed through the heterogeneous copying process.
In one possible design, obtaining the target data according to the data to be processed for performing the online transaction includes:
if the target data is sent to the standby node for the first time, determining the copy data obtained by extracting the data to be processed in full quantity as the target data;
and if the target data is sent to the standby node again after the first time, obtaining data used for representing the updating of the data to be processed and determining the data as the target data.
In one possible design, obtaining data for characterizing the update of the data to be processed is determined as the target data, and includes:
determining the copied data obtained by incrementally extracting the data to be processed as the target data; or,
and determining the log data corresponding to the data to be processed as the target data.
In one possible design, obtaining the target data according to the data to be processed for performing the online transaction includes:
obtaining a heterogeneous replication instruction, wherein the heterogeneous replication instruction is used for indicating a data table needing to be replicated and a column needing to be replicated in the data table needing to be replicated;
according to the heterogeneous replication instruction, determining a target data table to be replicated and a target column to be replicated in the target data table from the data table corresponding to the data to be processed;
and obtaining the target data according to the target data table and the target list.
In one possible design, obtaining a heterogeneous replication instruction includes:
obtaining the heterogeneous replication instruction sent by the standby node; or,
and obtaining the heterogeneous replication instruction sent by the coordination node connected with the main node.
In one possible design, obtaining the target data according to the data to be processed for performing the online transaction includes:
determining data which needs to be hidden to the standby node in the data to be processed;
desensitizing the data needing to be hidden from the standby node, and obtaining the target data according to the data to be processed after desensitizing.
In one possible design, obtaining the target data according to the data to be processed for performing the online transaction includes:
and when the heterogeneous copy triggering condition is met, obtaining the target data according to the data to be processed.
In one aspect, a data processing method is provided, and the method includes:
receiving target data sent by a main node, wherein the target data is data obtained by the main node according to-be-processed data which is stored in a line storage format and executes online transaction processing;
storing the target data in a column storage format;
and performing online analysis processing on the stored target data.
In one possible design, before receiving the target data sent by the master node, the method further includes:
and sending a heterogeneous replication instruction to the main node, wherein the heterogeneous replication instruction is used for indicating the data table needing to be replicated and the columns needing to be replicated in the data table needing to be replicated.
In one aspect, a data processing system is provided, where the system includes a main node and a standby node, the main node includes a first processing module and a sending module, and the standby node includes a receiving module, a storage module, and a second processing module; wherein:
the first processing module is used for performing online transaction processing on to-be-processed data stored in a line storage format and obtaining target data according to the to-be-processed data; the sending module is used for sending the target data to the receiving module;
the receiving module is used for receiving the target data, the storage module is used for storing the target data in a column storage format, and the second processing module is used for performing online analysis processing on the stored target data.
In one aspect, a data processing apparatus is provided, the apparatus comprising:
the storage module is used for storing the data to be processed in a line storage mode;
the business processing module is used for performing online transaction processing on the data to be processed stored in the line storage mode;
the heterogeneous replication module is used for obtaining target data according to the data to be processed;
and the sending module is used for sending the target data to the standby node so that the standby node stores the target data in a column storage mode and performs online analysis processing on the target data stored in the column storage mode, wherein the data processing device and the standby node belong to the same database cluster.
In one aspect, a data processing apparatus is provided, the apparatus comprising:
the system comprises a receiving module, a processing module and a sending module, wherein the receiving module is used for receiving target data sent by a main node, the target data is obtained by the main node according to-be-processed data for executing online transaction processing, and a data processing device and the main node belong to the same database cluster;
the storage module is used for storing the target data in a column storage mode;
and the business processing module is used for performing online analysis processing on the target data stored in the column storage mode.
In one aspect, a data processing device is provided, which includes a memory, a processor and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to realize the steps included in the data processing method in the above aspects.
In one aspect, a computer-readable storage medium is provided, which stores computer-executable instructions for causing a computer to perform the steps included in the data processing method in the above aspects.
In the technical scheme provided by the embodiment of the application, a data processing system of cross-data-node heterogeneous replication is adopted, the data processing system comprises two mutually independent database nodes, namely a main node and a standby node, in the same database cluster, the main node is configured to process OLTP services, and the standby node is configured to process OLAP services. The main node can obtain target data according to data to be processed, which is stored in a row storage mode and subjected to OLTP processing, and then sends the obtained target data to the standby node, the standby node can store the received target data in a column storage format and perform OLAP processing on the target data, so that two different types of services, namely OLTP and OLAP, can be concurrently processed in a single database cluster, the data storage cost can be reduced, the service processing efficiency is improved, the OLTP service and the OLAP service are respectively executed by two independent database nodes, the resource competition can be avoided through a resource isolation mode, and the service processing efficiency is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic diagram of a same data table being stored in a row storage mode and a column storage mode, respectively;
fig. 2 is a schematic diagram of an application scenario in an embodiment of the present application;
FIG. 3 is a diagram illustrating a database architecture in the prior art;
FIG. 4 is a block diagram of a data processing system according to an embodiment of the present application;
FIG. 5 is a flow chart of a data processing method in an embodiment of the present application;
FIG. 6 is a block diagram of another embodiment of a data processing system;
fig. 7 is a block diagram of a data processing apparatus in the embodiment of the present application;
fig. 8 is another block diagram of the data processing apparatus in the embodiment of the present application;
FIG. 9 is a schematic structural diagram of a data processing apparatus in an embodiment of the present application;
fig. 10 is another schematic structural diagram of the data processing apparatus in the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The embodiments and features of the embodiments of the present invention may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The "plurality" in the present application may mean at least two, for example, two, three or more, and the embodiments of the present application are not limited.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document generally indicates that the preceding and following related objects are in an "or" relationship unless otherwise specified.
Some terms referred to herein are explained below to facilitate understanding by those skilled in the art.
1. The data storage mode refers to a storage mode of data in a database, and generally includes two storage modes of row storage and column storage.
The row storage, namely the data storage mode stored in the row storage format, means that the data in the same row is stored in close physical proximity in the row minimum unit in the physical storage in the database, and in such a scenario, operations such as Insertion (INSERT), Deletion (DELETE) and Update (UPDATA) of the data can be efficiently processed. The data under the line storage mode is stored according to tuples, all attributes of each tuple are stored together, if a certain attribute value of one tuple is required to be inquired, the data of a certain tuple needs to be read firstly, and the mode takes the tuple as a unit and is suitable for frequent reading and writing of the data.
The data storage mode of column storage, i.e. storage in a column storage format, means that physical storage in the database is in a minimum unit of columns, and data in the same column in the table is stored next to each other on the disk. The column storage organizes each column of the data table together for storage, different columns are stored independently, and the mode can efficiently compress a large amount of sparse data, so that the storage space is saved, and the data type of each column of data is the same, so that the complex analysis of the data is facilitated.
For ease of understanding, the row storage mode and the column storage mode are described below in conjunction with fig. 1.
Referring to the achievement list shown in fig. 1, if the achievement list is stored in a row storage mode, as indicated by "row storage" below the achievement list, it can be seen that data in the achievement list is sequentially stored in units of "rows"; if the achievement list is stored in the column storage mode, as indicated by the column storage below the achievement list, the data in the achievement list is sequentially stored in the unit of the column.
A data table is stored in a row storage mode or a column storage mode, which means that the data are stored in different ways in the memory, and the data table itself has no difference. For example, the achievement tables in fig. 1 are stored in two node devices, respectively, one of the node devices stores the achievement table in a row storage mode, and the other node device stores the achievement table in a column storage mode, and although the storage modes in the two node devices are different, the table structures of the achievement tables in the two node devices are the same, for example, both the achievement tables are stored in the table structure shown in fig. 1.
2. OLTP, the primary application of traditional relational databases, is primarily basic, everyday transactions such as banking transactions. In OLTP transactions, which typically contain INSERT/DELETE/UPDATA operations, row stores typically exhibit good performance, so the data storage pattern of the row store is appropriate for OLTP type transactions.
3. OLAP is a main application of a data warehouse system, supports complex analysis operation, emphasizes decision support and provides intuitive and understandable query results. The data can be compressed and stored by effectively utilizing the characteristics of the column data in a column storage mode, and the processing efficiency of the OLAP is improved, so that the data storage mode of the column storage is suitable for OLAP type services.
4. In a database non-shared cluster, each data node is provided with an independent disk storage system and an independent memory system, service data are divided into the data nodes according to a database model and application characteristics, and the data nodes are mutually connected through a special network or a commercial general network, are mutually calculated in a coordinated manner and provide database service as a whole. The non-shared database cluster has the advantages of complete scalability, high availability, high performance, excellent cost performance, resource sharing and the like.
5. Hybrid Transactional Analytical Processing (HTAP), which is a new distributed database technology proposed in the embodiments of the present application, based on the HTAP technology, a single database can process both OLTP-type transactions and OLAP-type analysis, in other words, the HTAP supports a single database to support both OLTP and OLAP types of Processing.
6. A Structured Query Language (SQL) execution plan, which may also be referred to as an execution plan in this application, may be understood as an execution mode made from SQL statements input by a user. For different types of businesses, corresponding execution plans may be made, for example, for an OLTP business and an OLAP business, and for example, an execution plan made for the OLTP business may be referred to as an OLTP execution plan, and an execution plan made for the OLAP business may be referred to as an OLAP execution plan.
7. The database optimizer is a component for performing logic optimization on SQL statements input by a user, and can perform optimization of different logics through the database optimizer aiming at different services. For example, for the SQL statement of the OLTP service, the optimizer logic of OLTP may be used, and for the SQL statement of the OLAP service, the optimizer logic of OLAP may be used, so that a corresponding efficient execution plan may be formulated as much as possible according to the service features of different services.
8. Data extraction refers to extracting data from a data source, and continuing with the score table in fig. 1 as an example, extracting data from the score table can be understood as copying data from the score table, and therefore, data extraction can also be understood as copying data.
The data extraction may include full extraction, partial extraction, and incremental extraction.
The total amount extraction refers to copying all data in the data source, and for example, the total amount extraction refers to copying all data in the result table in fig. 1.
Partial extraction refers to copying partial data in a data source, for example, performing partial extraction on the achievement list in fig. 1, then copying data of some rows or some columns of the achievement list according to requirements, for example, copying only all achievement data with the academic number "1", that is, copying only data of a first row, or copying only all achievement data with the subject "mathematics", that is, copying only data of a 4 th column, and so on.
Incremental extraction refers to extracting data which is newly added, modified, deleted and the like in a data source since last extraction, based on the score table in fig. 1 as an example, if the "language" score of the classmate with the school number "2" in the score table is 92, but the score entry error may be caused, and the "language" score of the actual classmate is 94, the passed "language" score can be modified from 92 to 94, at this time, the data which is changed in the data table is the "language" score of the classmate with the school number "2", and the changed data is 94, so if the incremental extraction mode is adopted, only the data of "94" can be directly copied. The incremental extraction mode has smaller data quantity, and can embody the data with change in the data source.
The log data 9 refers to an operation record in which operations such as addition, modification, deletion, and insertion are performed on data in a certain data source, and therefore the log data can also be understood as an operation flow record. The log data of the data can also reflect changes that occur to the data.
The idea of the present application is presented below.
As described above, in the current database system, a single database cluster generally has only OLTP capability or OLAP capability, and if OLTP and OLAP processing are required to be performed on a piece of data at the same time, cross-database operation is required, which results in higher data cost and lower transaction processing efficiency of OLTP and OLAP.
Through analysis of the prior art, the inventor finds that the main reason for the low processing efficiency of the OLTP service and the OLAP service is due to a series of operations such as copying, converting, loading and the like required across databases, and in view of this, the inventor considers that the processing of the OLTP service and the OLAP service can be simultaneously realized in the same database cluster, so that the operation local to one database can save some time compared with the operation across databases, and thus the service processing efficiency can be improved to a certain extent.
Meanwhile, the present inventors also consider that the requirements of the OLTP service and the OLAP service for the data storage mode are different, so a method is proposed to perform the processing of OLTP and OLAP separately by using independent database nodes, for example, the database node performing the OLTP service stores data in a row storage mode, and the database node performing the OLAP service stores data in a column storage mode, so that the different requirements of two different service types for data storage can be met separately. Meanwhile, different types of services are respectively executed through mutually independent database nodes, so that node isolation can be achieved, and the node isolation represents node resource isolation, so that resource competition when two services are executed simultaneously can be avoided, and the service processing efficiency is improved.
In addition, according to the logic of data processing, the OLTP processing is generally performed on data first, and then the OLAP analysis is performed, and since the data processed by OLTP is generally stored in a row storage mode, and the data processed by OLAP needs to be stored in columns, that is, the storage modes of the data targeted by the OLTP service and the OLAP service are not the same, in view of this, in order to implement the above concept of performing the OLTP processing and the OLAP analysis on the same copy of data in parallel by two data nodes which are independent of each other, the present inventors also designed a heterogeneous replication mechanism between database nodes. Specifically, OLTP processed data is copied across database nodes to perform OLAP analysis on the copied data, and in consideration of the requirements of OLTP and OLAP on data storage formats, after the OLTP processed data in a row storage mode is copied and sent to another database node, the OLTP processed data needs to be stored in a column storage format again in order to support OLAP services. That is, the present inventors also consider that, when the OLAP parallel processing of OLTP is required to be performed on the same data between two independent database nodes, the same data may be replicated between different nodes in the same database cluster in a heterogeneous replication manner, so as to ensure the effective execution of OLAP traffic of OLTP traffic.
Based on the above technical concept, the present inventors propose a technical solution for supporting two types of OLAP services of OLTP in the same database cluster, and in particular, provide a new database architecture, which may be referred to as a data processing system, where the data processing system includes a main node and a standby node, where the main node and the standby node are the above-mentioned independent database nodes, and the main node is configured to process OLTP services, and the standby node is configured to process OLAP services. The master node may obtain target data according to the to-be-processed data processed by OLTP in the line storage format, and then send the obtained target data to the standby node. Further, the standby node may store the received target data in the column storage format, and perform OLAP processing on the target data stored in the column storage mode. Therefore, by carrying out heterogeneous replication on the data across data nodes, OLTP and OLAP parallel processing can be simultaneously carried out on the data, so that OLTP and OLAP business processing can be simultaneously supported in one database cluster, and the purpose of concurrently processing OLTP and OLAP business of two different types in a single database cluster is realized, thereby not only reducing the data storage cost, but also shortening the delay time between data analysis and processing, and further improving the business processing efficiency of OLTP and OLAP.
In addition, because the OLTP service and the OLAP service consume resources of different nodes (namely a main node and a standby node), resource competition during service processing can be avoided in a resource isolation mode, efficient execution of the service is ensured, and the processing efficiency of the OLTP service and the OLAP service is further improved.
By the technical scheme of concurrently processing the OLTP service and the OLAP service in a single database cluster in the embodiment of the application, the processing efficiency of the OLTP service and the OLAP service can be improved, so that timely support can be provided for real-time business analysis and enterprise decision for enterprises adopting the technical scheme of the embodiment of the application, and the use requirements of clients can be met.
After introducing the design concept of the embodiment of the present application, some simple descriptions are provided below for application scenarios to which the technical solution provided by the embodiment of the present application is applicable, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present invention and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.
Please refer to fig. 2, wherein fig. 2 is an application scenario to which the present disclosure is applied. The application scenario includes a plurality of clients (i.e., client 1, client 2, and client 3) and a database cluster, and a plurality of clients and the database cluster can communicate with each other through a network. The database cluster is a database architecture newly proposed in the embodiment of the present application, and specifically, the database cluster includes a node a, a node b, a node c, and a node d, where the node a may be used to manage the node c, and the node b may be used to manage the node d. The data in the node c is stored in a row storage mode, the node c is configured to execute OLTP service, the data in the node d is stored in a column storage mode, the node d is configured to execute OLAP service, the row storage data in the node c can be copied to the node d through a heterogeneous copying mechanism provided in the embodiment of the application, the node d stores the data sent by the node c in a column storage mode, and further, OLTP processing and OLAP analysis can be performed on the same data in parallel by the node c and the node d in the database cluster.
In the application scenario, a user may input data to the database cluster through a client, and may issue an OLTP task and an OLAP task through the client, after receiving the data input by the user or the issued task, the database cluster may store the data and process corresponding services through nodes in the cluster, and finally send a service processing result to the client to satisfy the service requirement of the user, where a specific process of the database cluster to execute the task will be described in detail in the following embodiments, which is not described here.
For example, a large-scale enterprise may analyze and process salary data of thirty thousand employees of the enterprise at the same time through the database cluster provided in the embodiment of the present application, then may make relevant data such as employee job numbers, employee names, departments to which the employee belongs, job ages, monthly salaries of each month, and the like of the thirty thousand employees into one or more data tables, and then input the data tables into the database cluster through a client, and then issue an OLTP task and an OLAP task through the client, after receiving the two tasks, the database cluster may perform concurrent processing of the OLTP task and the OLAP task inside the database cluster, and after obtaining a task processing result, may return the obtained task processing result to the client to be presented to the staff through the client.
In order to facilitate understanding of the new database architecture proposed in the present application, a prior art database architecture is first introduced below with reference to fig. 3, in some scenarios, the database architecture shown in fig. 3 may be understood as a data processing system, and the database architecture shown in fig. 3 may be understood as a data processing system with only OLTP capability or only OLAP capability in the aforementioned prior art.
The database architecture shown in fig. 3 employs a multi-master distributed MPP architecture, and the portions of the database architecture shown in fig. 3 are described below.
The Global Transaction Manager (GTM) is mainly responsible for Global transaction management information without storing actual data, and can be used to ensure transaction consistency within a database cluster. The GTM-M can be called as a main GTM, and GTM-S (GTM Standby) can be called as an auxiliary GTM or a standby GTM, wherein the GTM controls the distribution of all global transactions, if a problem occurs, the whole database cluster can be caused to be unavailable, and in order to increase the availability, the GTM-S can be increased, so that when the GTM-M has a problem, the GTM-S can be upgraded to the GTM-M to ensure the normal operation of the cluster.
The CNs (coordinators), also called coordinators, are peers, each providing an identical view of the database, storing Global metadata, and storing Global directory (Global Catalog) information. Providing an interface for the outside, and taking charge of distribution and query planning of data; functionally, the CN stores only the global metadata of the system and does not store the actual service data. The CN may analyze a query statement (e.g., SQL statement) input by a user, generate an execution plan, and pass the generated execution plan to a Data Node (DN) for execution. In one database cluster, a plurality of CNs can be deployed, for example, 3 CNs (i.e., CN1, CN2, and CN3) are deployed in fig. 1 as an example.
DN, each DN stores a data fragment, is a node for actually storing data, stores metadata information of data related to the node, and can process and store metadata related to the node. Functionally, the DN node is responsible for completing an execution plan to perform CN distribution, e.g., an OLTP execution plan may be executed or an OLAP execution plan may be executed. In a database cluster, multiple DNs may be deployed, such as illustrated in fig. 1 by deploying 4 DNs (i.e., DN1, DN2, DN3, DN 4).
Based on the database architecture shown in fig. 3, the embodiment of the present application provides a new database architecture capable of supporting the HTAP technology, and the database architecture provided in the embodiment of the present application is, for example, as shown in fig. 4, the database architecture shown in the figure may be understood as a data processing system.
Referring to fig. 4, in the data processing system provided in the embodiment of the present application, the CNs and DNs are divided into node groups, and specifically, some CNs and corresponding DNs are configured to be dedicated to processing OLTP tasks, for example, the CNs and DNs configured to be dedicated to processing OLTP tasks are referred to as OLTP node groups, and then the CNs in the OLTP node groups may be referred to as OLTP CNs, and the DNs in the OLTP node groups may be referred to as OLTP DNs; and some of the CNs and corresponding DNs are configured to be dedicated to processing OLAP tasks, e.g., the CNs and DNs configured to be dedicated to processing OLAP tasks are referred to as OLAP node groups, then the CNs in an OLAP node group may be referred to as OLAP CNs, and the DNs in an OLAP node group may be referred to as OLAP DNs.
In a specific implementation process, one OLTP node group or one OLAP node group may include one or more CNs, and one or more DNs, and the present embodiment does not limit the number of the various types of nodes respectively included in each OLTP node group and OLAP node group, for example, as shown in fig. 4, in the OLTP node group, one OLTP CN and two OLTP DNs (i.e., DN1 and DN2) are included, and in the OLAP node group, one OLAP CN and two OLAP DNs (i.e., DN3 and DN4) are included.
The CN and the DN in the same OLTP node group and the same OLAP node group are configured to be connected with each other, taking the left OLTP node group in fig. 4 as an example, DN1 and DN2 in the OLTP CN and OLTP DN are both connected, when the OLTP CN obtains data through a client, the data can be sent to DN1 and DN2, and after receiving the data sent by the OLTP CN, DN1 and DN2 store the data in a row storage manner, and the DN1 and DN2 in fig. 4 both represent data storage modes in DN1 and DN2 in a row storage manner. Taking DN2 in OLTP DN as an example, after receiving an OLTP execution plan issued by OLTP CN, OLTP processing may be performed on the data stored therein, and at the same time, if the data needs to be subjected to OLAP processing, DN2 may perform heterogeneous copy operation, that is, obtain target data from the data and send the target data to, for example, DN3 in OLAPDN, DN3 may store the target data in a column storage manner after receiving the target data, and further, when receiving an OLAP execution plan issued by OLAP CN, may perform OLAP processing on the target data already stored in a column storage manner, thereby implementing parallel processing of OLTP and OLAP of the same data in one database cluster, and since resource-isolated processing is performed by DNs (i.e., DN2 and DN3 in the foregoing example), contention of resources during parallel processing of OLTP and OLAP may be avoided, thereby improving the service processing efficiency of OLTP and OLAP.
In the data processing system according to the embodiment of the present application, an optimizer logic may be configured for each database node, and both the optimizer logic suitable for OLTP and the optimizer logic suitable for OLTP may be configured for each database node at the same time, considering that the operations are mainly directed to both types of traffic, OLTP and OLAP. Since each CN and DN may be configured to perform an OLTP task or an OLAP task before the database nodes are divided into OLTP node groups or OLAP node groups, in order to maximize the applicability of each database node, each database node may be made to have both optimizer logic to perform OLTP-applicable and optimizer logic to OLAP-applicable. SQL queries of OLTP classes and OLAP classes can be optimized through corresponding optimizer logics respectively, so that efficient execution plans corresponding to OLTP services and OLAP services respectively are obtained. The optimizer logic of each database node may be manually configured by a user or configuration files may be set in the respective database node so that the database nodes may be automatically configured according to the respective configuration files.
In view of the aforementioned difference between the query tasks that the OLTP node group and the OLAP node group are required to execute, in order to facilitate efficient execution of the OLTP task and the OLAP task, it is possible to let all the database nodes in the OLTP node group execute only the OLTP-applicable optimizer logic and all the database nodes in the OLAP node group execute only the OLAP-applicable optimizer logic, as shown in FIG. 4, only the OLTP optimizer is reserved and disabled in OLTP CN and OLTP DN (DN1 and DN2), and only the OLAP optimizer is reserved and disabled in OLAP CN and OLDN (DN3 and DN 4). In this way, the OLTP CN (and OLTP DN) and the OLAP CN (and OLAP DN) may respectively execute different types of services through the non-disabled optimizer, that is, efficient execution plans corresponding to the OLTP service and the OLAP service may be respectively formulated for different SQL statements, so that the processing efficiency of different types of services may be improved to a certain extent.
Since each CN node in the embodiment of the present application is already configured to execute only the OLTP optimizer logic or only the OLAP optimizer logic, in a specific implementation process, a user may select a corresponding OLTP CN if the user needs to process the OLTP service, or may select a corresponding OLAP CN if the user needs to process the OLAP service, so that a corresponding service operation may be executed through the database cluster. In addition, in order to facilitate the user to operate from the client, an option of the type of the task to be executed may be configured in the user interaction interface corresponding to the database cluster in the client, so that the user may select the option according to the actual requirement of the user, and meanwhile, the database cluster may also allocate, according to the selection of the user, the query statement for executing the task input by the user to the OLTP CN or the OLAP CN.
To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide the method operation steps as shown in the following embodiments or figures, more or less operation steps may be included in the method based on the conventional or non-inventive labor. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application. The method can be executed in sequence or in parallel according to the method shown in the embodiment or the figure when the method is executed in an actual processing procedure or a device.
The following describes a technical solution in an embodiment of the present application with reference to an interaction diagram of a data processing method shown in fig. 5. Fig. 5 includes four devices in the same database cluster, that is, a first CN, a primary node, a standby node, and a second CN, where the primary node and the standby node are independent database nodes in a data processing system, and may be, for example, physically independent data nodes. Among them, the first CN may be understood as an OLTP CN in fig. 4, the second CN as an OLAPCN in fig. 4, the primary node as an OLTP DN (DN1 or DN2) in fig. 4, and the standby node as an OLAPDN (DN3 or DN4) in fig. 4. That is, the first CN and the master node are configured to pair to perform OLTP traffic, while the second CN and the standby node are configured to pair to perform OLAP traffic.
The method flow in fig. 5 is explained below.
Step 501: the first CN sends the data to be processed to the master node, and further, the master node may receive the data to be processed sent by the first CN.
In practice, a user may write data to be processed into a database cluster through a client, for example, the data to be processed is referred to as data to be processed, and the database cluster may receive the data to be processed through a first CN and write the data to be processed into a master node corresponding to the first CN.
Step 502: the main node stores the received data to be processed in a line storage format.
Since the master node is a DN configured to execute the OLTP task, in order to meet the requirement of the OLTP task for data storage, the master node may store the received data to be processed in a row storage format, for example, the data to be processed is data in an achievement list of all classmates in a class shown in fig. 1, or the data to be processed is data and wage data of all workers in a plant.
Step 503: the first CN sends an OLTP execution plan to the master node, and further, the master node may receive the OLTP execution plan.
The main node is a DN which is in charge of executing the service in the database cluster, and the DN execution task is controlled by the corresponding CN, particularly, after the CN issues an execution plan to the DN, the DN carries out corresponding task processing according to the execution plan. In a specific implementation process, the first CN may obtain, through a user interface, an SQL statement input by a user, where the SQL statement input by the user is used to represent a task requirement of the user, and further, the first CN may analyze the input SQL statement by using an optimizer logic of a corresponding OLTP and convert the SQL statement into an efficient OLTP execution plan corresponding to an OLTP task, where the execution plan may be referred to as an SQL query instruction, or may also be understood as an OLTP processing request, in other words, the OLTP execution plan is a request for instructing (or requesting) an OLTP DN (i.e., a master node) to execute OLTP processing.
Step 504: after receiving the OLTP execution plan, the master node may perform OLTP processing on the aforementioned to-be-processed data stored in a row.
In order to respond to the query requirement of the user in time, after receiving the OLTP execution plan, the host node may perform OLTP processing on the to-be-processed data stored in the line storage mode, and then complete a corresponding OLTP task in the host node.
Step 505: the master node judges whether a heterogeneous copy triggering condition is satisfied.
The heterogeneous replication triggering condition in the embodiment of the present application may be a condition for triggering the master node to replicate data to the standby node, or the heterogeneous replication condition may be understood as a triggering condition that needs to perform OLAP analysis on data to be processed stored in the master node. In the embodiment of the present application, such cross-node data replication is referred to as "heterogeneous replication" because of cross-node data replication, and another layer meaning of "heterogeneous replication" and "heterogeneous replication" may also be understood as conversion of the same data to different data storage modes, for example, data stored in rows is replicated to be finally converted into data stored in columns. By judging whether the heterogeneous copy triggering condition is met, the heterogeneous copy from the main node to the standby node can meet the actual query requirement of the user as much as possible, so that the accuracy and the effectiveness of service processing are improved.
In a possible implementation manner, the heterogeneous replication triggering condition refers to that the master node receives a heterogeneous replication instruction sent by the first CN. The heterogeneous replication instruction is used for indicating the data table to be replicated and the column to be replicated in the data table to be replicated, that is, the heterogeneous replication instruction can explicitly indicate which data to be replicated. The heterogeneous replication instruction sent by the first CN may be generated according to a heterogeneous replication request input by a user through the first CN.
In another possible implementation manner, the heterogeneous replication triggering condition refers to that the primary node receives a heterogeneous replication instruction sent by the standby node, the heterogeneous replication instruction has the same function as that of the heterogeneous replication instruction sent by the first CN, and the heterogeneous replication instruction sent by the standby node may be generated according to a heterogeneous replication request input by a user through the second CN.
The above heterogeneous replication instructions may be generated by a query request input by a user in the CN node, in other words, the user may specify data targeted by the OLAP task according to respective actual requirements, for example, may specify, from 10 tables to be processed in the master node, that 3 tables need to be subjected to OLAP processing, then the 3 tables may be referred to as target tables, and for the 3 target tables, 2 target tables of the 3 target tables need to replicate all data, that is, all column data in the 2 target tables need to be replicated, and the other 1 target table only needs to replicate part of columns therein. Therefore, the actual query requirements of the user can be met as much as possible by means of the mode specified by the user, and meanwhile, the data volume of heterogeneous copy can be reduced.
In another possible implementation manner, the heterogeneous copy triggering condition refers to that the master node determines that a current time at which data copying is required arrives, where the time is, for example, a specific time set by a user or a time determined according to a predetermined cycle, in other words, heterogeneous copying may be initiated at the specific time according to a pre-configuration of the user, so as to implement timed data copying.
In a specific implementation process, the execution order of step 505 and step 504 may not be limited, for example, step 505 may be executed after step 504 as shown in fig. 5, or step 505 may also be executed before step 504, or step 504 and step 505 may also be executed simultaneously.
Upon determining that the heterogeneous replication triggering condition is not satisfied, the primary node may again perform the detection. When it is determined that the heterogeneous replication triggering condition is satisfied, the master node may perform step 506.
Step 506: when the heterogeneous copy triggering condition is determined to be met, the main node can obtain target data according to the data to be processed.
Specifically, data may be extracted from the data to be processed to obtain the target data, where the extraction in this embodiment refers to copying the data, that is, copying some data from the data to be processed, and original data of the data copied from the data to be processed is still retained in the data to be processed, so that OLTP processing of the data to be processed by the master node may not be affected as much as possible. In another embodiment, the generated log data of the data to be processed may be extracted, and the extracted log data may be used as the target data in the embodiment of the present application.
In a specific implementation process, the master node may invoke a heterogeneous copy process to implement heterogeneous copy of data between the master node and the standby node, that is, an operation of obtaining target data may be performed through the heterogeneous copy process. That is to say, a process may be specially enabled to process the cross-node data migration between the primary node and the standby node, so that it may be ensured that data can be efficiently and quickly copied from the primary node to the standby node, and thus, data preparation for the OLAP task in the standby node may be reduced as much as possible, and thus, the processing efficiency of the OLAP task may be improved to a certain extent. In addition, the data is extracted through the special process, so that the accuracy of data extraction can be ensured as much as possible, the efficient execution of the service is improved, and the data in the main node can be reliably transmitted to the standby node, thereby ensuring the safety and the accuracy of data transmission.
The target data in the embodiment of the present application may refer to original data in the data to be processed, so that direct migration of the data may be implemented. Or, the target data may refer to data used for representing that the data to be processed is updated, that is, the changed data of the data to be processed is taken as the target data, which may refer to copying the changed data, thereby reducing the data amount of heterogeneous copy, shortening the transmission time, and improving the service processing efficiency.
In a specific implementation process, if the master node sends the target data for the to-be-processed data to the standby node for the first time, all data included in the to-be-processed data may be sent to the standby node as the target data, and at this time, the target data may be extracted from the to-be-processed data in a full extraction manner. Since the target data for the to-be-processed data is sent to the standby node for the first time, that is, the standby node does not perform OLAP processing on the to-be-processed data before, the original data needs to be sent to the standby node to ensure the reliability of the data source. Certainly, in the specific implementation process, partial data in the data to be processed may be selected as the data to be copied according to the specified requirement of the user, and then the selected data to be copied is subjected to the full-scale rights extraction.
If the primary node does not send the target data for the to-be-processed data to the standby node for the first time, that is, it indicates that the target data has been sent to the original data corresponding to the to-be-processed data before the primary node sends the target data to the standby node again (for example, for the second time or the third time, etc.), the incremental data that has changed in the to-be-processed data may be determined first, and then the updated data is copied as the target data by using an incremental extraction method, and the target data at this time is still direct data, so that the standby node may directly perform OLAP analysis processing as the original data after receiving.
Or in another embodiment, the log data generated in the processing process of the data to be processed may be obtained first, and then the obtained log data is used as the target data, because the log data is used to represent the operation history of the data, the target data at this time is data directly reflecting the change of the data to be processed, and is not the data to be processed itself, so after receiving such target data, the standby node performs log application again to restore the original data, and then performs OLAP analysis processing again.
By means of incremental extraction or log data, transmission of data volume can be reduced as much as possible, so that data storage cost of the standby node is reduced, and meanwhile processing efficiency of OLAP tasks in the standby node can be improved.
In the specific implementation process, some sensitive data possibly related in the data to be processed are viewed and processed without permission by the standby node, and considering the safety of the data, a possible processing mode is that the main node does not send the sensitive data to the standby node to achieve the purpose of protecting the safety of the data, however, this approach may not be able to satisfy the data request of the standby node, for example, taking the performance table in fig. 1 as an example, assuming that the data in the column of "name" is sensitive data, however, when requesting the primary node, the standby node requests to copy all data in the achievement list, and if the primary node does not include data in the column of "name" when extracting the target data, it is obvious that the request of the standby node is not satisfied.
Specifically, the master node determines data to be hidden to the standby node in the data to be processed, for example, data in a column of "name" in the achievement table in table 1 is data to be hidden to the standby node, then performs desensitization processing on the data in the column of "name" to obtain the data to be processed after the desensitization processing, then obtains target data according to the data to be processed after the desensitization processing, and then sends the obtained target data to the standby node. The desensitization process can, for example, replace all data in a column of "name" with "0" or other values, or convert the data into other data through some algorithm, and so on, so that even if the backup node receives the data, the data is not real original data, thereby realizing effective protection of the original data.
Step 507: the main node sends the obtained target data to the standby node, so that the standby node can receive the target data.
Step 508: after receiving the target data sent by the main node, the standby node stores the target data in a column storage mode.
That is, after receiving the target data sent by the master node, the standby node may analyze the target data, for example, it may first determine whether the target data is original data or incremental data or log data, then apply the target data according to a corresponding data reduction manner to obtain original data, and then store the original data in a column storage format, so as to meet the execution requirement of the OLAP service on data storage.
Continuing with the achievement table in fig. 1 as an example, assuming that the target data is the original data of all columns in the achievement table, after receiving the target data, the standby node may immediately store the target data in the memory in units of "columns" according to the storage manner of "column storage" in fig. 1.
Step 509: the second CN sends the OLAP execution plan to the standby node, and further, the standby node may receive the OLAP execution plan.
Step 510: and the standby node performs OLAP processing on the target data which is stored in the column storage mode based on the OLAP execution plan.
That is, after the standby node stores the target data sent by the master node in the column storage mode, the standby node may perform OLAP processing on the target data according to the OLAP execution plan issued by the second CN, so as to execute the OLAP task by the standby node.
Based on the architecture of the data processing system in the embodiment of the application, the parallel processing of OLTP and OLAP can be realized in one database cluster, so that the business processing efficiency of OLTP and OLAP can be improved. And the OLTP service and the OLAP service are respectively executed by the two independent database nodes, and the resource competition in the task execution process can be avoided by a resource isolation mode, so that the service processing efficiency is further improved.
Based on the same inventive concept, please refer to fig. 6, an embodiment of the present application provides a data processing system, which includes a first data processing apparatus 601 and a second data processing apparatus 602, where the first data processing apparatus 601 and the second data processing apparatus 602 may be independent database nodes in the same database cluster, for example, DN nodes in the database cluster, where the first data processing apparatus 601 can implement the function of an OLTP DN (e.g., a master node) in the foregoing data processing method, for example, DN1 or DN2 in fig. 4, and the second data processing apparatus 602 can implement the function of an OLAP DN (e.g., a backup node) in the foregoing data processing method, for example, DN3 or DN4 in fig. 4. The first data processing apparatus 601 and the second data processing apparatus 602 may each be a hardware structure, a software module, or a hardware structure plus a software module. The first data processing apparatus 601 and the second data processing apparatus 602 may be implemented by a chip system, and the chip system may be formed by a chip, and may also include a chip and other discrete devices.
Referring to fig. 6, in the embodiment of the present application, the first data processing apparatus 601 includes a first processing module 6011 and a sending module 6012, and the second data processing apparatus 602 includes a receiving module 6021, a storage module 6022, and a second processing module 6023. Wherein:
a first processing module 6011, configured to obtain target data according to-be-processed data for performing OLTP (online transaction processing), where the to-be-processed data is stored in a line storage format;
a sending module 6012, configured to send the target data to the receiving module 6021 in the second data processing apparatus 602;
the receiving module 6021 is configured to receive the target data sent by the sending module 6012;
the storage module 6022 is used for storing the target data in a column storage format;
the second processing module 6023 is configured to perform OLAP processing on the stored target data, that is, perform online analysis processing.
All relevant contents of each step executed by the OLTP DN related to the embodiment of the data processing method may be referred to as a functional description of the functional module corresponding to the first data processing apparatus 601, and all relevant contents of each step executed by the OLAP DN related to the embodiment of the data processing method may be referred to as a functional description of the functional module corresponding to the second data processing apparatus 602, which is not described herein again.
Based on the same inventive concept, please refer to fig. 7, an embodiment of the present application provides a data processing apparatus, which may be a database node, and includes a storage module 701, a service processing module 702, a heterogeneous replication module 703, and a sending module 704. Wherein: the storage module 701 is used for storing data to be processed in a line storage manner; a business processing module 702, configured to perform OLTP processing on data to be processed stored in a line storage mode; the heterogeneous copy module 703 is configured to obtain target data according to the data to be processed; the sending module 704 is configured to send the target data to the standby node, so that the standby node stores the target data in a column storage mode, and performs OLAP processing on the target data stored in the column storage mode, where the data processing apparatus and the standby node in this embodiment belong to the same database cluster.
All relevant contents of each step executed by the OLTP DN related to the embodiment of the data processing method may be cited to the functional description of the functional module corresponding to the data processing apparatus in the embodiment of the present application, and are not described herein again.
Based on the same inventive concept, please refer to fig. 8, an embodiment of the present application provides a data processing apparatus, which may be a database node, and includes a receiving module 801, a storing module 801, and a service processing module 803. Wherein: the receiving module 801 is configured to receive target data sent by a master node, where the target data is obtained by the master node according to data to be processed that performs OLTP processing, and in this embodiment, a data processing apparatus and the master node belong to the same database cluster; the storage module 802 is configured to store the target data in a column storage mode; the business processing module 803 is used for performing OLAP processing on the target data stored in the column storage mode.
All relevant contents of each step executed by the OLAP DN related to the embodiment of the data processing method can be cited to the functional description of the functional module corresponding to the data processing apparatus in the embodiment of the present application, and are not described herein again.
The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
Based on the same inventive concept, the present application also provides a data processing device, such as the aforementioned OLTP DN (e.g., a master node) or OLAP DN (e.g., a standby node), for example DN1 or DN2 or DN3 or DN4 in fig. 4. As shown in fig. 9, the data processing apparatus in this embodiment of the present application includes at least one processor 901, a memory 902 and a communication interface 903, where the memory 902 and the communication interface 903 are connected to the at least one processor 901, a specific connection medium between the processor 901 and the memory 902 is not limited in this embodiment of the present application, in fig. 9, the processor 901 and the memory 902 are connected through a bus 900 as an example, the bus 900 is represented by a thick line in fig. 9, and connection manners between other components are only schematically illustrated and are not limited. The bus 900 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 9 for ease of illustration, but does not represent only one bus or type of bus.
In the embodiment of the present application, the memory 902 stores instructions executable by the at least one processor 901, and the at least one processor 901 may execute the steps included in the foregoing method for recommending multimedia content by executing the instructions stored in the memory 902.
The processor 901 is a control center of the data processing device, and can connect various parts of the whole data processing device by using various interfaces and lines, and by executing or executing instructions stored in the memory 902 and calling data stored in the memory 902, various functions of the computing device and processing data are performed, thereby performing overall monitoring on the computing device. Optionally, the processor 901 may include one or more processing units, and the processor 901 may integrate an application processor and a modem processor, where the processor 901 mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 901. In some embodiments, the processor 901 and the memory 902 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 901 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, that may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
Memory 902, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 902 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 902 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 902 of the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
The communication interface 903 is a transmission interface that can be used for communication, and can receive data or transmit data via the communication interface 903, for example.
Referring to the further schematic structural diagram of the data processing apparatus shown in fig. 10, the data processing apparatus further includes a basic input/output system (I/O system) 1001 to facilitate information transfer between the various devices within the data processing apparatus, and a mass storage device 1005 for storing an operating system 1002, application programs 1003, and other program modules 1004.
The basic input/output system 1001 includes a display 1006 for displaying information and an input device 1007 such as a mouse, keyboard, etc. for a user to input information. Wherein the display 1006 and input device 10010 are coupled to the processor 601 through a basic input/output system 1001 coupled to the system bus 900. The basic input/output system 1001 may also include an input/output controller for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, an input-output controller may also provide output to a display screen, a printer, or other type of output device.
The mass storage device 1005 is connected to the processor 901 through a mass storage controller (not shown) connected to the system bus 900. The mass storage device 1005 and its associated computer-readable media provide non-volatile storage for the server package. That is, the mass storage device 1005 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.
According to various embodiments of the invention, the computing device package may also be operated by a remote computer connected to the network through a network, such as the Internet. That is, the computing device may be connected to the network 1008 through the communication interface 903 coupled to the system bus 900, or may be connected to another type of network or remote computer system (not shown) using the communication interface 903.
Based on the same inventive concept, embodiments of the present application further provide a computer-readable storage medium, which stores computer instructions that, when executed on a computer, cause the computer to perform the steps of the data processing method as described above.
Based on the same inventive concept, embodiments of the present application further provide a data processing apparatus, which includes at least one processor and a storage medium, and when instructions included in the storage medium are executed by the at least one processor, the steps of the data processing method as described above may be performed.
Based on the same inventive concept, the embodiment of the present application further provides a chip system, where the chip system includes a processor and may further include a memory, and is used to implement the steps of the foregoing data processing method. The chip system may be formed by a chip, and may also include a chip and other discrete devices.
In some possible embodiments, the aspects of the data processing method provided in the embodiments of the present application may also be implemented in the form of a program product, which includes program code for causing a computer to perform the steps in the data processing method according to the various exemplary embodiments of the present invention described above when the program product runs on the computer.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (14)

1. A data processing system, the system comprising a primary node and a backup node, wherein:
the main node is used for performing online transaction processing on the data to be processed stored in a line storage format; obtaining target data according to the data to be processed; and sending the target data to the standby node;
the standby node is used for storing the received target data in a column storage format; and performing online analysis processing on the stored target data.
2. The system of claim 1, wherein the master node is to:
calling a heterogeneous copying process;
and extracting the target data from the data to be processed through the heterogeneous replication process, and sending the target data to the standby node.
3. The system of claim 1, wherein the master node is to:
if the target data is sent to the standby node for the first time, determining the copy data obtained by extracting the data to be processed in full quantity as the target data;
and if the target data is sent to the standby node again after the first time, obtaining data used for representing the updating of the data to be processed and determining the data as the target data.
4. The system of claim 3, wherein the master node is to:
if the target data is sent to the standby node again after the first time, determining the copied data obtained by incrementally extracting the data to be processed as the target data, or determining the log data corresponding to the data to be processed as the target data.
5. The system of any of claims 1-4, wherein the master node is to:
obtaining a heterogeneous replication instruction, wherein the heterogeneous replication instruction is used for indicating a data table needing to be replicated and a column needing to be replicated in the data table needing to be replicated;
according to the heterogeneous replication instruction, determining a target data table to be replicated and a target column to be replicated in the target data table from the data table corresponding to the data to be processed;
and obtaining the target data according to the target data table and the target list.
6. The system of claim 5, wherein the heterogeneous replication instruction is sent by the standby node; or, the heterogeneous replication instruction is sent by a coordinating node connected with the master node.
7. The system of any of claims 1-4, wherein the master node is to:
determining data which needs to be hidden to the standby node in the data to be processed;
desensitizing the data needing to be hidden from the standby node, and obtaining the target data according to the data to be processed after desensitizing.
8. The system of any of claims 1-4, wherein the master node is to:
and when the heterogeneous replication triggering condition is met, obtaining the target data according to the data to be processed.
9. The system of any of claims 1-4, further comprising a first coordinating node coupled to the primary node and configured as optimizer logic to perform online transactions on the obtained query statement, and a second coordinating node coupled to the standby node and configured as optimizer logic to perform online analytical processing on the obtained query statement.
10. A method of data processing, the method comprising:
the method comprises the steps that a main node obtains target data according to-be-processed data for executing online transaction processing and sends the target data to a standby node, wherein the to-be-processed data are stored in the main node in a line storage format;
and the standby node receives the target data, stores the target data in a column storage format, and performs online analysis processing on the stored target data.
11. A method of data processing, the method comprising:
obtaining target data according to-be-processed data for executing online transaction processing, wherein the to-be-processed data is stored in a line storage format;
and sending the target data to a standby node so that the standby node stores the target data in a column storage format and executes online analysis processing.
12. A data processing system is characterized in that the system comprises a main node and a standby node, wherein the main node comprises a first processing module and a sending module, and the standby node comprises a receiving module, a storage module and a second processing module; wherein:
the first processing module is used for performing online transaction processing on to-be-processed data stored in a line storage format and obtaining target data according to the to-be-processed data; the sending module is used for sending the target data to the receiving module;
the receiving module is used for receiving the target data, the storage module is used for storing the target data in a column storage format, and the second processing module is used for performing online analysis processing on the stored target data.
13. A data processing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps comprised by the method of claim 10 or 11 when executing the computer program.
14. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the steps comprising the method of claim 10 or 11.
CN201910224202.7A 2019-03-22 2019-03-22 A kind of data processing system, method and apparatus Pending CN110019251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910224202.7A CN110019251A (en) 2019-03-22 2019-03-22 A kind of data processing system, method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910224202.7A CN110019251A (en) 2019-03-22 2019-03-22 A kind of data processing system, method and apparatus

Publications (1)

Publication Number Publication Date
CN110019251A true CN110019251A (en) 2019-07-16

Family

ID=67189850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910224202.7A Pending CN110019251A (en) 2019-03-22 2019-03-22 A kind of data processing system, method and apparatus

Country Status (1)

Country Link
CN (1) CN110019251A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874290A (en) * 2019-10-09 2020-03-10 上海交通大学 Transaction analysis hybrid processing method of distributed memory database and database
CN111209350A (en) * 2019-12-31 2020-05-29 优地网络有限公司 System development method, device, terminal equipment and storage medium
CN111291403A (en) * 2020-01-15 2020-06-16 上海新炬网络信息技术股份有限公司 Data desensitization device based on distributed cluster
CN111475588A (en) * 2020-06-19 2020-07-31 阿里云计算有限公司 Data processing method and device
CN111475584A (en) * 2020-06-19 2020-07-31 阿里云计算有限公司 Data processing method, system and device
CN111723078A (en) * 2020-06-24 2020-09-29 苏州松鼠山人工智能科技有限公司 Data storage method and device
CN111858759A (en) * 2020-07-08 2020-10-30 平凯星辰(北京)科技有限公司 HTAP database based on consensus algorithm
CN112363838A (en) * 2020-11-20 2021-02-12 浙江大华技术股份有限公司 Data processing method and device, storage medium and electronic device
CN112434036A (en) * 2020-11-24 2021-03-02 上海浦东发展银行股份有限公司 Account management system data processing method
CN113704270A (en) * 2021-09-03 2021-11-26 携程金融科技(上海)有限公司 Method, system, equipment and medium for expanding capacity of self-increment key of SQL Server database
WO2022001629A1 (en) * 2020-06-29 2022-01-06 华为技术有限公司 Database system, and method and apparatus for managing transactions
CN115599790A (en) * 2022-11-10 2023-01-13 星环信息科技(上海)股份有限公司(Cn) Data storage system, data processing method, electronic device and storage medium
WO2024082693A1 (en) * 2022-10-21 2024-04-25 华为云计算技术有限公司 Data processing method, and apparatus
CN118626492A (en) * 2024-06-21 2024-09-10 广州逸虎网络科技有限公司 Business processing method, device, storage medium and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360386A (en) * 2011-10-12 2012-02-22 朱一超 Intelligent shopping guide system and method of electronic commerce website
CN102591910A (en) * 2010-12-08 2012-07-18 达索系统艾诺维亚公司 Computer method and system combining OLTP database and OLAP database environments
CN102906743A (en) * 2010-05-17 2013-01-30 慕尼黑技术大学 Hybrid OLTP and OLAP high-performance database system
CN103942342A (en) * 2014-05-12 2014-07-23 中国人民大学 Memory database OLTP and OLAP concurrency query optimization method
CN104391891A (en) * 2014-11-11 2015-03-04 上海新炬网络信息技术有限公司 Heterogeneous replication method for database
CN105574027A (en) * 2014-10-15 2016-05-11 中兴通讯股份有限公司 On-line transaction processing/on-line analytical processing (OLTP/OLAP) hybrid application based multi-dimensional performance data storage method, device and system
CN106777027A (en) * 2016-12-08 2017-05-31 北京国电通网络技术有限公司 MPP ranks blended data storage device and storage, querying method
CN107423390A (en) * 2017-07-21 2017-12-01 上海德拓信息技术股份有限公司 A kind of real time data synchronization algorithm based on inside OLTP OLAP mixed relationship type Database Systems
CN109189561A (en) * 2018-08-08 2019-01-11 广东亿迅科技有限公司 A kind of transacter and its method based on MPP framework

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102906743A (en) * 2010-05-17 2013-01-30 慕尼黑技术大学 Hybrid OLTP and OLAP high-performance database system
CN102591910A (en) * 2010-12-08 2012-07-18 达索系统艾诺维亚公司 Computer method and system combining OLTP database and OLAP database environments
CN102360386A (en) * 2011-10-12 2012-02-22 朱一超 Intelligent shopping guide system and method of electronic commerce website
CN103942342A (en) * 2014-05-12 2014-07-23 中国人民大学 Memory database OLTP and OLAP concurrency query optimization method
CN105574027A (en) * 2014-10-15 2016-05-11 中兴通讯股份有限公司 On-line transaction processing/on-line analytical processing (OLTP/OLAP) hybrid application based multi-dimensional performance data storage method, device and system
CN104391891A (en) * 2014-11-11 2015-03-04 上海新炬网络信息技术有限公司 Heterogeneous replication method for database
CN106777027A (en) * 2016-12-08 2017-05-31 北京国电通网络技术有限公司 MPP ranks blended data storage device and storage, querying method
CN107423390A (en) * 2017-07-21 2017-12-01 上海德拓信息技术股份有限公司 A kind of real time data synchronization algorithm based on inside OLTP OLAP mixed relationship type Database Systems
CN109189561A (en) * 2018-08-08 2019-01-11 广东亿迅科技有限公司 A kind of transacter and its method based on MPP framework

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SUPHOEBE: "高级数据库十五:查询优化器(一)", 《CSDN》 *
付志成: "商品比价系统中大数据迁移及数据转换技术研究", 《中国优秀硕士学位论文全文电子期刊信息科技辑》 *
宋春红: "传统关系型数据库向非关系型数据库迁移算法研究", 《中国优秀硕士学位论文全文电子期刊信息科技辑》 *
郑力新: "结构性数据库向HBase迁移的方法对比分析", 《福建电脑》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874290B (en) * 2019-10-09 2023-05-23 上海交通大学 Transaction analysis hybrid processing method of distributed memory database and database
CN110874290A (en) * 2019-10-09 2020-03-10 上海交通大学 Transaction analysis hybrid processing method of distributed memory database and database
CN111209350A (en) * 2019-12-31 2020-05-29 优地网络有限公司 System development method, device, terminal equipment and storage medium
CN111291403A (en) * 2020-01-15 2020-06-16 上海新炬网络信息技术股份有限公司 Data desensitization device based on distributed cluster
CN111291403B (en) * 2020-01-15 2023-09-19 上海新炬网络信息技术股份有限公司 Data desensitizing device based on distributed cluster
CN111475588A (en) * 2020-06-19 2020-07-31 阿里云计算有限公司 Data processing method and device
CN111475584A (en) * 2020-06-19 2020-07-31 阿里云计算有限公司 Data processing method, system and device
US12292885B2 (en) 2020-06-19 2025-05-06 Alibaba Cloud Computing Co. Ltd. Data processing method and data processing apparatus
CN111475584B (en) * 2020-06-19 2021-01-22 阿里云计算有限公司 Data processing method, system and device
CN111723078A (en) * 2020-06-24 2020-09-29 苏州松鼠山人工智能科技有限公司 Data storage method and device
WO2022001629A1 (en) * 2020-06-29 2022-01-06 华为技术有限公司 Database system, and method and apparatus for managing transactions
CN111858759B (en) * 2020-07-08 2021-06-11 平凯星辰(北京)科技有限公司 HTAP database system based on consensus algorithm
WO2022007339A1 (en) * 2020-07-08 2022-01-13 平凯星辰(北京)科技有限公司 Htap database based on consensus algorithm
CN111858759A (en) * 2020-07-08 2020-10-30 平凯星辰(北京)科技有限公司 HTAP database based on consensus algorithm
CN112363838A (en) * 2020-11-20 2021-02-12 浙江大华技术股份有限公司 Data processing method and device, storage medium and electronic device
CN112434036A (en) * 2020-11-24 2021-03-02 上海浦东发展银行股份有限公司 Account management system data processing method
CN113704270A (en) * 2021-09-03 2021-11-26 携程金融科技(上海)有限公司 Method, system, equipment and medium for expanding capacity of self-increment key of SQL Server database
CN113704270B (en) * 2021-09-03 2023-10-17 携程金融科技(上海)有限公司 Automajor key capacity expansion method of SQL Server database and related equipment
WO2024082693A1 (en) * 2022-10-21 2024-04-25 华为云计算技术有限公司 Data processing method, and apparatus
CN115599790A (en) * 2022-11-10 2023-01-13 星环信息科技(上海)股份有限公司(Cn) Data storage system, data processing method, electronic device and storage medium
CN115599790B (en) * 2022-11-10 2024-03-15 星环信息科技(上海)股份有限公司 Data storage system, data processing method, electronic equipment and storage medium
CN118626492A (en) * 2024-06-21 2024-09-10 广州逸虎网络科技有限公司 Business processing method, device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN110019251A (en) A kind of data processing system, method and apparatus
US11669510B2 (en) Parallel processing of disjoint change streams into a single stream
EP4254183A1 (en) Transaction processing method and apparatus, computer device, and storage medium
US10853338B2 (en) Universal data pipeline
US10437795B2 (en) Upgrading systems with changing constraints
US7974967B2 (en) Hybrid database system using runtime reconfigurable hardware
US7734615B2 (en) Performance data for query optimization of database partitions
CN111966692A (en) Data processing method, medium, device and computing equipment for data warehouse
WO2019109854A1 (en) Data processing method and device for distributed database, storage medium, and electronic device
US10824968B2 (en) Transformation of logical data object instances and updates to same between hierarchical node schemas
CN111324606B (en) Data slicing method and device
CN115114374A (en) Transaction execution method and device, computing equipment and storage medium
JP2016525734A (en) Using projector and selector component types for ETL map design
CN109885642B (en) Hierarchical storage method and device for full-text retrieval
US20240256426A1 (en) Runtime error attribution for database queries specified using a declarative database query language
CN117909020A (en) Backup and recovery method of business cluster instance and related equipment
CN115422188A (en) Table structure online changing method and device, electronic equipment and storage medium
US8229946B1 (en) Business rules application parallel processing system
CN112699118B (en) Data synchronization method and corresponding device, system and storage medium
Ji et al. Query execution optimization in spark SQL
CN117609345A (en) Data processing method, device, equipment and storage medium
CN118035270A (en) Data query method, device, software program, equipment and storage medium
CN114911801A (en) Form processing method and device, processor and electronic equipment
CN114490767B (en) Data scanning method and device, electronic equipment, storage medium and program product
Minukhin et al. Enhancing the performance of distributed big data processing systems using Hadoop and PolyBase

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination