CN106569896B

CN106569896B - A data distribution and parallel processing method and system

Info

Publication number: CN106569896B
Application number: CN201610956986.9A
Authority: CN
Inventors: 孙乔; 付兰梅; 邓卜侨; 孙雷; 陆旭; 刘炜; 崔伟; 高阿朋; 裴旭斌; 吴舜; 聂晓波; 吴芃
Original assignee: Beijing Great Opensource Software Co Ltd; State Grid Information and Telecommunication Group Co Ltd; State Grid Zhejiang Electric Power Co Ltd; State Grid Jibei Electric Power Co Ltd; Beijing Guodiantong Network Technology Co Ltd; Beijing Fibrlink Communications Co Ltd; State Grid East Inner Mongolia Electric Power Co Ltd; State Grid Corp of China SGCC
Current assignee: Beijing Great Opensource Software Co ltd; State Grid Information and Telecommunication Group Co Ltd; State Grid Zhejiang Electric Power Co Ltd; State Grid Jibei Electric Power Co Ltd; Beijing Guodiantong Network Technology Co Ltd; Beijing China Power Information Technology Co Ltd; State Grid East Inner Mongolia Electric Power Co Ltd; Beijing Zhongdian Feihua Communication Co Ltd; State Grid Corp of China SGCC
Priority date: 2016-10-25
Filing date: 2016-10-25
Publication date: 2019-02-05
Anticipated expiration: 2036-10-25
Also published as: CN106569896A

Abstract

The invention discloses a kind of data distribution and method for parallel processing and systems, which comprises parses to initial query instruction, obtains true inquiry instruction；True inquiry instruction is distributed in different back end by work distribution node；It executes the true inquiry instruction received respectively on each back end, and query result is returned into work distribution node, while keeping the update of data information table on each server node；On each server node, the corresponding inquiry table of the true inquiry instruction is backed up, so that preserving inquiry table copy on each server node；On each server node, when detecting the presence of concurrency conflict, by calling the inquiry table copy on current server node concurrently to execute true inquiry instruction.The data distribution and method for parallel processing and system can not only guarantee the consistency of data and can be improved the speed and efficiency of data query.

Description

A data distribution and parallel processing method and system

技术领域technical field

本发明涉及数据处理技术领域，特别是指一种数据分发及并行处理方法和系统。The present invention relates to the technical field of data processing, in particular to a data distribution and parallel processing method and system.

背景技术Background technique

并行处理的方法是在信息处理时采用并行手段从而达到高效开发和计算的一种处理方式。换言之，它是指在同一时间范围内完成多项工作，这些工作可以性质相同，也可以性质不同。只要时间上存在重叠，都可以称之为并行。它主要应用于高性能处理机、大型数据库管理、复杂数学建模等领域，而它的应用范围仍然在不停地扩大。它主要包含三要素：同时性、并发性、流水性。为达到高性能且满足较大计算量的需求，可以采用并行计算模型进行数据处理。The parallel processing method is a processing method that adopts parallel means in information processing to achieve efficient development and calculation. In other words, it refers to the completion of multiple tasks within the same time frame, which may or may not be of the same nature. As long as there is overlap in time, it can be called parallel. It is mainly used in high-performance processors, large-scale database management, complex mathematical modeling and other fields, and its application scope is still expanding. It mainly contains three elements: simultaneity, concurrency, and flow. In order to achieve high performance and meet the needs of a large amount of calculation, a parallel computing model can be used for data processing.

大数据分析在许多业务领域中，包括金融企业、政府机构和保险机构等越来越普遍。其应用范围从产生简单的报表到执行复杂的分析工作负载。难点在于：随着数据量的增加，如何在这些领域开展分析查询并进行存储。一般来说，分布式系统设计时需要考虑数据一致性(Consistency)、可用性(Availability)和分区容错性(Partition tolerance)。数据一致性指的是，如果数据复制到了多台服务器，当数据发生更新时全部服务器也需要更新；可用性指的是对用户需求响应及时；分区容错性即节点的可扩展性。大规模并行处理MPP(Massively Parallel Processing)结构可以通过跨多节点和进程分布式存储和查询来应对这些挑战。MPP可通过互联网通信连接，其终端可用多台成本低廉的服务器共同处理。在MPP架构中，每个模块都拥有独立的CPU、存储等对每个模块的数据进行管理，并允许动态地增加或删除服务器节点。因此这些模块之间的数据和计算能力并不共享，数据分布在本地存储。Big data analytics are increasingly common in many business areas, including financial companies, government agencies, and insurance agencies. Its applications range from generating simple reports to executing complex analytical workloads. The difficulty is: as the amount of data increases, how to perform analytical queries and storage in these areas. Generally speaking, distributed system design needs to consider data consistency (Consistency), availability (Availability) and partition tolerance (Partition tolerance). Data consistency means that if data is replicated to multiple servers, all servers also need to be updated when the data is updated; availability means timely response to user needs; partition fault tolerance is the scalability of nodes. Massively Parallel Processing (MPP) architecture can address these challenges by distributing storage and queries across multiple nodes and processes. MPP can be connected through Internet communication, and its terminals can be jointly processed by multiple low-cost servers. In the MPP architecture, each module has an independent CPU, storage, etc. to manage the data of each module, and allows to dynamically add or delete server nodes. Therefore, the data and computing power between these modules are not shared, and the data is distributed in local storage.

具体的，MPP由多个SMP(Symmetric Multi-Processor)服务器通过一定的节点互联网络进行连接，协同工作，完成相同的任务，从用户的角度可以看作是一个服务器系统。其基本特征是由多个SMP服务器，每个SMP服务器称为一个节点或一个服务器节点，通过节点互联网络连接而成，每个节点只访问自己的内存、存储等本地资源，是一种完全无共享(share nothing)结构；MPP的扩展能力较好，理论上其扩展无限制。参照图4所示，为一种MPP架构的结构示意图。Specifically, MPP is connected by a plurality of SMP (Symmetric Multi-Processor) servers through a certain node interconnection network, and works cooperatively to complete the same task. From the user's point of view, it can be regarded as a server system. Its basic feature is that it consists of multiple SMP servers. Each SMP server is called a node or a server node. It is connected through the node interconnection network. Each node only accesses its own local resources such as memory and storage. Shared (share nothing) structure; MPP has better expansion capability, and theoretically its expansion is unlimited. Referring to FIG. 4 , it is a schematic structural diagram of an MPP architecture.

在MPP系统中，每个节点内的CPU不能访问另一个节点的内存。节点之间的信息交互是通过节点互联网络实现的，这个过程一般称为数据重分配。MPP服务器需要一种复杂的机制来调度和平衡各个节点的负载和并行处理过程。目前一些基于MPP技术的服务器往往通过系统级软件(如数据库)屏蔽这种复杂性。MPP数据库将任务并行地分散到多个服务器和节点上，在每个节点计算完成后，将各自的结果汇总在一起从而得到最终结果。虽然MPP可以实现较大规模的CPU的连接，但正是由于其无共享的特点，使得数据一致性的保持存在一定难度。由于MPP系统不共享资源，因此所需占用资源比SMP要多，当通信时间较多时，MPP系统就不能充分发挥资源优势来实现高效率。In an MPP system, the CPU within each node cannot access the memory of another node. The information exchange between nodes is realized through the node interconnection network, and this process is generally called data redistribution. The MPP server requires a complex mechanism to schedule and balance the load and parallel processing of various nodes. At present, some servers based on MPP technology often shield this complexity through system-level software (such as database). The MPP database distributes tasks to multiple servers and nodes in parallel, and after each node completes the calculation, the respective results are aggregated together to obtain the final result. Although MPP can realize the connection of large-scale CPUs, it is precisely because of its non-shared characteristics that it is difficult to maintain data consistency. Since the MPP system does not share resources, it needs to occupy more resources than the SMP. When the communication time is long, the MPP system cannot give full play to the resource advantage to achieve high efficiency.

因此，发明人在实现本发明的过程中发现现有技术至少存在以下问题：数据库中数据的一致性不好以及数据的查询速度和效率较低。Therefore, the inventor found in the process of implementing the present invention that the prior art has at least the following problems: the consistency of the data in the database is not good, and the query speed and efficiency of the data are low.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提出一种数据分发及并行处理方法和系统，不仅能够保证数据的一致性而且能够提高数据查询的速度和效率。In view of this, the purpose of the present invention is to provide a data distribution and parallel processing method and system, which can not only ensure the consistency of data but also improve the speed and efficiency of data query.

基于上述目的本发明提供的数据分发及并行处理方法，包括：Based on the above-mentioned purpose, the data distribution and parallel processing method provided by the present invention include:

接收用户端输入的初始查询指令；Receive the initial query instruction input by the client;

对所述初始查询指令进行解析，得到符合标准的真实查询语句，并作为真实查询指令；Analyzing the initial query instruction to obtain a real query statement that meets the standard, and use it as a real query instruction;

将所述真实查询指令发送到工作分配节点，并通过所述工作分配节点将所述真实查询指令分发到不同的数据节点中；其中，所述工作分配节点和数据节点是指数据库中所有服务器节点按照预设功能划分的节点；且所述工作分配节点用于管理数据节点的任务分配和查询结果；Send the real query instruction to the work distribution node, and distribute the real query instruction to different data nodes through the work distribution node; wherein, the work distribution node and the data node refer to all server nodes in the database Nodes divided according to preset functions; and the work assignment nodes are used to manage task assignments and query results of data nodes;

在每个数据节点上分别执行接收到的真实查询指令，并将查询结果返回给工作分配节点，同时在每个服务器节点上保持数据信息表的更新；Execute the received real query instruction on each data node, and return the query result to the work distribution node, while keeping the data information table updated on each server node;

在每个服务器节点上，对所述真实查询指令对应的查询表进行备份，使得每个服务器节点上保存有查询表副本；On each server node, back up the query table corresponding to the real query instruction, so that each server node saves a copy of the query table;

在每个服务器节点上，当检测到存在并发冲突时，通过调用当前服务器节点上的查询表副本对真实查询指令并发执行。On each server node, when a concurrency conflict is detected, the real query instruction is executed concurrently by calling the copy of the query table on the current server node.

可选的，所述当检测到存在并发冲突时，通过调用当前服务器节点上的查询表副本对真实查询指令并发执行的步骤包括：Optionally, when it is detected that there is a concurrency conflict, the step of concurrently executing the real query instruction by calling the copy of the query table on the current server node includes:

在每个服务器节点中检测所述真实查询指令是否存在并发冲突；Detecting whether there is a concurrency conflict in the real query instruction in each server node;

若不存在并发冲突，则提交查询任务；其中，所述查询任务为真实查询指令对应的查询任务；If there is no concurrency conflict, submit a query task; wherein, the query task is a query task corresponding to a real query instruction;

若存在并发冲突，则调用当前服务器节点上的自适应备份与当前查询任务进行比较，对所述自适应备份执行查询任务；If there is a concurrency conflict, call the adaptive backup on the current server node to compare with the current query task, and execute the query task on the adaptive backup;

若全部服务器节点中的查询任务均执行成功，则返回提交查询任务的决策，否则，返回撤回查询任务的决策。If the query tasks in all server nodes are successfully executed, the decision to submit the query task is returned, otherwise, the decision to withdraw the query task is returned.

可选的，将数据库中的所有服务器节点按照预设的功能划分为工作分配节点和数据节点的方法包括：Optionally, the method for dividing all server nodes in the database into work assignment nodes and data nodes according to preset functions includes:

检测每个服务器节点的本地存储量和计算处理能力；Detect the local storage capacity and computing processing capacity of each server node;

判断当前服务器节点的计算处理能力是否大于预设的性能阈值，若是，则当前服务器节点设定为工作分配节点，否则当前服务器节点设定为数据节点；Determine whether the computing processing capability of the current server node is greater than the preset performance threshold, if so, the current server node is set as a work distribution node, otherwise the current server node is set as a data node;

或者，or,

判断当前服务器节点的本地存储量是否大于预设的储存阈值，若是，则当前服务器节点设定为数据节点；否则当前服务器节点设定为工作分配节点。It is judged whether the local storage amount of the current server node is greater than the preset storage threshold, and if so, the current server node is set as a data node; otherwise, the current server node is set as a work distribution node.

可选的，所述将所述真实查询指令发送到工作分配节点，并通过所述工作分配节点将所述真实查询指令分发到不同的数据节点中的步骤还包括：Optionally, the step of sending the real query instruction to a work distribution node and distributing the real query instruction to different data nodes through the work distribution node further includes:

检测每个服务器节点的具体使用情况；Detect the specific usage of each server node;

根据检测结果采用预设的均衡策略，在每个服务器节点上分配数据节点和对应的空间。According to the detection results, a preset balancing strategy is adopted, and data nodes and corresponding spaces are allocated on each server node.

可选的，所述均衡策略为空间使用率最低的策略。Optionally, the balancing strategy is a strategy with the lowest space usage rate.

可选的，所述在每个服务器节点上保持数据信息表的更新的步骤还包括：Optionally, the step of maintaining the update of the data information table on each server node further includes:

复制并存储所述数据信息表的位置信息和相关操作信息。The location information and related operation information of the data information table are copied and stored.

将每个数据节点中的全部数据均备份到指定的数据节点中；Back up all data in each data node to the specified data node;

保持每个数据节点中全部数据以及对应数据备份的同步更新；Keep all data in each data node and the synchronous update of the corresponding data backup;

在每个数据节点上保存每个数据节点的位置信息。The location information of each data node is saved on each data node.

可选的，所述对所述初始查询指令进行解析的步骤还包括：Optionally, the step of parsing the initial query instruction further includes:

对用户端输入的字符进行词法分析，得到符合标准的单词；Perform lexical analysis on the characters input by the user to obtain words that meet the standard;

对多个连续的单词进行语法分析，得到符合语法逻辑的语句，并同时构建得到抽象语法树；Perform grammatical analysis on multiple consecutive words to obtain sentences that conform to grammatical logic, and construct an abstract syntax tree at the same time;

根据所述抽象语法树进行逻辑SQL语句到真实SQL语句的转换，得到符合标准的真实查询语句，并作为真实查询指令。Convert the logical SQL statement to the real SQL statement according to the abstract syntax tree, and obtain the real query statement that meets the standard, which is used as the real query instruction.

本发明还提供了一种数据分发及并行处理系统，包括：The present invention also provides a data distribution and parallel processing system, including:

输入单元，用于接收用户端输入的初始查询指令；an input unit for receiving an initial query instruction input by the user terminal;

解析单元，用于对所述初始查询指令进行解析，得到符合标准的真实查询语句，并作为真实查询指令；a parsing unit, configured to parse the initial query instruction to obtain a real query statement that meets the standard, and use it as a real query instruction;

分发单元，用于将所述真实查询指令发送到工作分配节点，并通过所述工作分配节点将所述真实查询指令分发到不同的数据节点中；其中，所述工作分配节点和数据节点是指数据库中所有服务器节点按照预设功能划分的节点；且所述工作分配节点用于管理数据节点的任务分配和查询结果；A distribution unit, configured to send the real query instruction to a work distribution node, and distribute the real query instruction to different data nodes through the work distribution node; wherein, the work distribution node and the data node refer to All server nodes in the database are divided into nodes according to preset functions; and the work assignment nodes are used to manage task assignment and query results of data nodes;

处理单元，用于在每个数据节点上分别执行接收到的真实查询指令，并将查询结果返回给工作分配节点，同时在每个服务器节点上保持数据信息表的更新；The processing unit is used to execute the received real query instruction on each data node respectively, and return the query result to the work distribution node, and at the same time keep the update of the data information table on each server node;

备份单元，用于在每个服务器节点上，对所述真实查询指令对应的查询表进行备份，使得每个服务器节点上保存有查询表副本；a backup unit, configured to back up the lookup table corresponding to the real inquiry instruction on each server node, so that each server node saves a copy of the lookup table;

并发单元，用于在每个服务器节点上，当检测到存在并发冲突时，通过调用当前服务器节点上的查询表副本对真实查询指令并发执行。The concurrency unit is used on each server node to concurrently execute the real query instruction by calling the copy of the query table on the current server node when a concurrency conflict is detected.

从上面所述可以看出，本发明提供的数据分发及并行处理方法和系统，通过将数据库中的所有服务器节点划分为工作分配节点和数据节点，进而使得工作分配节点能够管理数据节点的任务分配和查询结果的返回；这样，能够使得数据节点根据不同的使用状态达到负载均衡。通过保持数据信息表的更新能够保证服务器节点中数据的一致同步性。通过在每个服务器节点上备份查询表副本，能够使得每个服务器节点能够对查询指令并发执行，进而缩短相应时间，提高查询效率。通过在并发冲突时调用自适应备份与当前查询任务比较，能够保证分布式数据的一致性。因此，本发明所述的数据分发及并行处理方法和系统不仅能够保证数据的一致性而且能够提高数据查询的速度和效率。It can be seen from the above that the method and system for data distribution and parallel processing provided by the present invention divides all server nodes in the database into work distribution nodes and data nodes, so that the work distribution nodes can manage the task distribution of the data nodes. and the return of query results; in this way, data nodes can achieve load balancing according to different usage states. By keeping the data information table updated, the consistent synchronization of the data in the server node can be ensured. By backing up a copy of the query table on each server node, each server node can execute query instructions concurrently, thereby shortening corresponding time and improving query efficiency. The consistency of distributed data can be guaranteed by invoking an adaptive backup to compare with the current query task when there is a concurrency conflict. Therefore, the data distribution and parallel processing method and system of the present invention can not only ensure the consistency of data but also improve the speed and efficiency of data query.

附图说明Description of drawings

图1为本发明提供的数据分发及并行处理方法的一个实施例的流程图；1 is a flowchart of an embodiment of a data distribution and parallel processing method provided by the present invention;

图2为本发明提供的数据分发及并行处理方法中解决并发冲突的流程图；Fig. 2 is the flow chart of solving the concurrent conflict in the data distribution and parallel processing method provided by the present invention;

图3为本发明提供的数据分发及并行处理系统的一个实施例的流程图；3 is a flowchart of an embodiment of a data distribution and parallel processing system provided by the present invention;

图4为本发明提供的一种MPP结构示意图。FIG. 4 is a schematic structural diagram of an MPP provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本发明进一步详细说明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to specific embodiments and accompanying drawings.

需要说明的是，本发明实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量，可见“第一”“第二”仅为了表述的方便，不应理解为对本发明实施例的限定，后续实施例对此不再一一说明。It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are for the purpose of distinguishing two entities with the same name but not the same or non-identical parameters. It can be seen that "first" and "second" It is only for the convenience of expression and should not be construed as a limitation to the embodiments of the present invention, and subsequent embodiments will not describe them one by one.

参照图1所示，为本发明提供的数据分发及并行处理方法的一个实施例的流程图。所述数据分发及并行处理方法包括：Referring to FIG. 1 , it is a flowchart of an embodiment of a data distribution and parallel processing method provided by the present invention. The data distribution and parallel processing method includes:

步骤101，接收用户端输入的初始查询指令；其中，用户在用户端通常输入的都是一些连续的字符串，而数据库通常需要经过解析才能过得到可识别的语句或指令信息。Step 101: Receive an initial query instruction input by the user terminal; wherein, the user usually inputs some continuous character strings at the user terminal, and the database usually needs to be parsed to obtain identifiable sentences or instruction information.

步骤102，对所述初始查询指令进行解析，得到符合标准的真实查询语句，并作为真实查询指令；其中，常见的查询解析部分主要包括语法分析、词法分析、语义分析、语句解析等步骤。语法分析的作用是将一个输入的字符串换为一个描述这个字符串的结构体，让计算机可以更容易的理解用户输入的字符串是什么意义。这个阶段包含三个过程，分别是词法分析、语法分析、输出抽象语法树。词法分析器是一个确定有限自动机(DeterministicFinite Automata)，可以按照我们定义好的词法，将输入的字符集转换为单词。在词法分析之后是语法分析，词法分析的结果会作为语法分析的输入，语法分析在词法分析的基础上，来判断用户输入的单词是否符合语法逻辑。以SQL为例，语义分析涉及到SQL标准、SQL优化的相关理论和概念，语义分析包括逻辑分析和物理分析，逻辑分析是一般的数学分析过程，与底层的分布式环境无关。而物理分析则是将逻辑分析后的结果做变换，与底层的执行环境密切相关。Step 102: Parse the initial query instruction to obtain a real query statement that meets the standard, and use it as a real query instruction; wherein, the common query parsing part mainly includes steps such as syntax analysis, lexical analysis, semantic analysis, and statement analysis. The function of syntax analysis is to replace an input string with a structure that describes the string, so that the computer can more easily understand what the user-input string means. This stage consists of three processes, namely lexical analysis, syntax analysis, and outputting abstract syntax trees. The lexical analyzer is a DeterministicFinite Automata, which can convert the input character set into words according to our defined lexical grammar. After lexical analysis comes grammatical analysis. The result of lexical analysis is used as the input of grammatical analysis. On the basis of lexical analysis, grammatical analysis determines whether the words input by the user conform to grammatical logic. Taking SQL as an example, semantic analysis involves related theories and concepts of SQL standards and SQL optimization. Semantic analysis includes logical analysis and physical analysis. Logical analysis is a general mathematical analysis process and has nothing to do with the underlying distributed environment. Physical analysis, on the other hand, transforms the results of logical analysis, which is closely related to the underlying execution environment.

抽象语法树(Abstract Syntax Tree，AST)是用户输入语句的树形结构的表现形式，树上的每一个节点都是一个单词，树的结构体现了语法。抽象语法树是随着语法分析的过程构造的，当语法分析结束后，语法分析器就会输出一个抽象语法树，用户的输入和抽象语法树的结构内容是对应的。Abstract Syntax Tree (AST) is a representation of the tree structure of user input sentences, each node on the tree is a word, and the structure of the tree reflects the grammar. The abstract syntax tree is constructed along with the process of syntax analysis. After the syntax analysis is completed, the syntax analyzer will output an abstract syntax tree, and the user's input corresponds to the structure content of the abstract syntax tree.

步骤103，将所述真实查询指令发送到工作分配节点，并通过所述工作分配节点将所述真实查询指令分发到不同的数据节点中；其中，所述工作分配节点和数据节点是指数据库中所有服务器节点按照预设功能划分的节点；且所述工作分配节点用于管理数据节点的任务分配和查询结果；其中，在MPP架构中，整个数据库包括大量相对独立的服务器，每个服务器分别作为一个独立的节点，因此将单个服务器称为服务器节点、节点、服务器等，其实际表示的含义是相同的。一般来说，在一个数据库中可以设置一个工作分配节点管理剩余的服务器节点，即数据节点；也可以设置多个工作分配节点分别管理各自所属的数据节点。这样，通过设定工作分配节点管理数据节点，使得每台服务器节点的存储量、访问压力等要素能够大致达到均衡状态，例如：可以把负载高的数据节点中的数据迁移到负载低的数据节点中，实现负载的均衡。Step 103: Send the real query instruction to the work distribution node, and distribute the real query instruction to different data nodes through the work distribution node; wherein, the work distribution node and the data node refer to the database All server nodes are divided into nodes according to preset functions; and the work distribution node is used to manage the task distribution and query results of the data nodes; wherein, in the MPP architecture, the entire database includes a large number of relatively independent servers, each of which serves as a An independent node, so referring to a single server as a server node, node, server, etc., means the same in what it actually means. Generally speaking, in a database, one work distribution node can be set to manage the remaining server nodes, that is, data nodes; it is also possible to set multiple work distribution nodes to manage their respective data nodes. In this way, by setting work distribution nodes to manage data nodes, the storage capacity, access pressure and other factors of each server node can be roughly balanced. For example, data in data nodes with high load can be migrated to data nodes with low load , to achieve load balancing.

步骤104，在每个数据节点上分别执行接收到的真实查询指令，并将查询结果返回给工作分配节点，同时在每个服务器节点上保持数据信息表的更新；其中，所述数据信息表用于记录数据的更新并通过服务器节点之间的通信，在每个服务器节点上保持数据的一致性；即对数据所做的任何修改都会记录在数据信息表中，这里所述的保持数据信息表的更新是指在其备份或副本上同步修改，使得数据信息表也能够保证一致。Step 104: Execute the received real query instruction on each data node respectively, and return the query result to the work distribution node, while maintaining the update of the data information table on each server node; wherein, the data information table uses In order to record the update of data and maintain the consistency of data on each server node through communication between server nodes; that is, any modification made to the data will be recorded in the data information table. The update refers to the synchronous modification on its backup or copy, so that the data information table can also be guaranteed to be consistent.

步骤105，在每个服务器节点上，对所述真实查询指令对应的查询表进行备份，使得每个服务器节点上保存有查询表副本；其中，所述查询表是指所述真实查询指令(例如：select[id]指令等)进行指令查询后，将得到的查询结果进行存储的一个数据表；在查询表中也包括所有查询指令。基于分布式数据库中的查询任务指的是一串操作序列，对分布式的任何操作最终都变成对数据库存储操作的序列，其执行也是分布的。因此，在每个服务器节点备份查询表，使得每个服务器节点能够通过查询表或者查询表副本执行查询指令，不仅能够实现查询指令的并发执行，而且通过保持查询表副本能够提高查询效率。Step 105, on each server node, back up the query table corresponding to the real query instruction, so that each server node saves a copy of the query table; wherein, the query table refers to the real query instruction (for example, : select[id] command, etc.) after command query, a data table that stores the query results obtained; the query table also includes all query commands. A query task in a distributed database refers to a sequence of operations. Any operation on a distributed database eventually becomes a sequence of storage operations on the database, and its execution is also distributed. Therefore, the query table is backed up at each server node, so that each server node can execute query instructions through the query table or a copy of the query table, which not only enables concurrent execution of query instructions, but also improves query efficiency by maintaining a copy of the query table.

步骤106，在每个服务器节点上，当检测到存在并发冲突时，通过调用当前服务器节点上的查询表副本对真实查询指令并发执行。其中，现有技术中在遇到并发冲突时，需要等待全局相应才能够进行查询，因而导致数据查询的速度和效率较低，同时使得数据一致性不好。本发明通过预先存储查询表副本，然后在遇到并发冲突时通过调用查询表副本执行查询任务，实现了查询语句的并发执行，能够大大缩短响应时间进而提高数据查询的速度和效率。Step 106: On each server node, when a concurrency conflict is detected, the real query instruction is executed concurrently by calling the copy of the query table on the current server node. Among them, in the prior art, when a concurrency conflict is encountered, it is necessary to wait for a global response before the query can be performed, resulting in low data query speed and efficiency, and poor data consistency at the same time. The invention realizes the concurrent execution of the query statement by storing the query table copy in advance, and then executing the query task by calling the query table copy when encountering a concurrency conflict, which can greatly shorten the response time and improve the speed and efficiency of data query.

由上述实施例可知，本发明提供的数据分发及并行处理方法，通过将数据库中的所有服务器节点划分为工作分配节点和数据节点，进而使得工作分配节点能够管理数据节点的任务分配和查询结果的返回；这样，能够使得数据节点根据不同的使用状态达到负载均衡。通过保持数据信息表的更新能够保证服务器节点中数据的一致同步性。通过在每个服务器节点上备份查询表副本，能够使得每个服务器节点能够对查询指令并发执行，进而缩短相应时间，提高查询效率。通过在并发冲突时调用自适应备份与当前查询任务比较，能够保证分布式数据的一致性。因此，本发明提供的所述数据分发及并行处理方法不仅能够保证数据的一致性而且能够提高数据查询的速度和效率。同时，基于上述步骤，本发明还克服了查询中存在并发冲突的问题，使得整个查询的过程更为稳定可靠。It can be seen from the above embodiments that the method for data distribution and parallel processing provided by the present invention divides all server nodes in the database into work distribution nodes and data nodes, so that the work distribution nodes can manage the task distribution of the data nodes and the query results. Return; in this way, the data nodes can achieve load balancing according to different usage states. By keeping the data information table updated, the consistent synchronization of the data in the server node can be ensured. By backing up a copy of the query table on each server node, each server node can execute query instructions concurrently, thereby shortening corresponding time and improving query efficiency. The consistency of distributed data can be guaranteed by invoking an adaptive backup to compare with the current query task when there is a concurrency conflict. Therefore, the data distribution and parallel processing method provided by the present invention can not only ensure the consistency of data, but also improve the speed and efficiency of data query. At the same time, based on the above steps, the present invention also overcomes the problem of concurrent conflicts in the query, so that the entire query process is more stable and reliable.

在本发明一些可选的实施例中，需要首先将数据库按照预设的规则分为若干子集，例如：将数据库中的全部数据按列均匀分为不相交的n等份，每一份由若干列组成；或者，将全部数据按行均匀分为不相交的n等份，每一份由若干行组成；或者，按数据的相似性将数据划分为n个不相交的子集，如大于平均值的数据归为同一个子集，其他数据归为另一个子集；或者，按数据的差异性将数据划分为n个不相交的子集，如列内方差大于某个常数k的数据归为同一个子集，其他数据归为另一个子集；这样使得整个数据库划分为若干个能够相互独立的个体，及子集，每个子集在同一台服务器上处理，一共n个服务器。使得在每个服务器上能够进行独立决策，给出当前任务是否应该被中止或提交的决策。判断的原则是：在没有并发冲突时查询任务可以提交，如果存在并发冲突时，例如一个文件在被一个程序进行写操作时，将不能被另一个程序同时执行写操作。调用当前服务器上子集的自适应复制备份并与当前任务进行比较验证，给出决策。In some optional embodiments of the present invention, it is necessary to first divide the database into several subsets according to preset rules. For example, all the data in the database are evenly divided into disjoint n equal parts by column. It consists of several columns; alternatively, divide all the data into disjoint n equal parts by rows, and each part consists of several rows; or divide the data into n disjoint subsets according to the similarity of the data, such as greater than The data with the average value is classified into the same subset, and the other data is classified into another subset; or, the data is divided into n disjoint subsets according to the difference of the data, such as the data with the variance in the column greater than a certain constant k It is classified into the same subset, and other data is classified into another subset; in this way, the entire database is divided into several independent individuals and subsets, each subset is processed on the same server, a total of n servers. Enables independent decision making on each server, given whether the current task should be aborted or submitted. The principle of judgment is that the query task can be submitted when there is no concurrency conflict. If there is a concurrency conflict, for example, when a file is being written by one program, it cannot be written by another program at the same time. Call the adaptive replication backup of the subset on the current server and compare it with the current task to make a decision.

其中，所述自适应备份是指在分布式系统中为了容灾容错，将重要数据备份在多个服务器上。自适应复制备份方法很多，包括数据块备份、动态备份等。如果想让分布式部署的多台服务器中的数据保持一致性，那么就要保证在所有服务器数据的写操作，要不全部都执行，要么全部都不执行。但是，一台服务器在执行本地事务的时候无法知道其他服务器中的本地事务的执行结果。所以也就不知道本次事务到底应该中止还是提交。如果在某个子集上存在并发冲突，即它被某个程序的写操作占用，其他程序无法对当前子集执行写操作，本发明提出通过访问当前子集的备份来取代对当前子集执行本地事务，这样可以得到全部服务器的本地事务执行结果，并给出决策，具体方法为：如果任务的全部参与子集的事务操作实际执行成功，则返回提交查询任务的决策；如果任务的全部参与子集的事务操作实际执行失败，则它返回一个撤回查询任务的决策。因此，基于MPP的无共享资源结构需要全局死锁检测机制、两阶段提交协议等来保证事务的完整性和恢复。而本发明采用的方案无需这些费时的方法，提高了查询任务的速度。The adaptive backup refers to backing up important data on multiple servers for disaster and fault tolerance in a distributed system. There are many adaptive replication backup methods, including data block backup, dynamic backup, etc. If you want to maintain the consistency of data in multiple servers deployed in a distributed manner, you must ensure that data write operations on all servers are performed, or none of them are performed. However, when a server executes a local transaction, it cannot know the result of the execution of local transactions in other servers. So I don't know whether this transaction should be aborted or committed. If there is a concurrency conflict on a certain subset, that is, it is occupied by the write operation of a certain program, and other programs cannot perform the write operation on the current subset, the present invention proposes to replace the local execution of the current subset by accessing the backup of the current subset. In this way, the local transaction execution results of all servers can be obtained, and a decision can be given. The specific method is: if the transaction operations of all participating subsets of the task are actually executed successfully, the decision to submit the query task is returned; If the actual execution of the set's transaction operation fails, it returns a decision to withdraw the query task. Therefore, the MPP-based shared-nothing resource structure requires a global deadlock detection mechanism, a two-phase commit protocol, etc. to ensure transaction integrity and recovery. However, the solution adopted in the present invention does not require these time-consuming methods, thereby improving the speed of the query task.

在本发明一些可选的实施例中，参照图2所示，所述当检测到存在并发冲突时，通过调用当前服务器节点上的查询表副本对真实查询指令并发执行的步骤106还包括：In some optional embodiments of the present invention, referring to FIG. 2 , when a concurrent conflict is detected, the step 106 of concurrently executing the real query instruction by calling the copy of the query table on the current server node further includes:

步骤201，在每个服务器节点中对所述真实查询指令进行检测，判断是否存在并发冲突，若不存在并发冲突，则执行步骤202，若存在并发冲突，则执行步骤203；Step 201, the real query instruction is detected in each server node to determine whether there is a concurrency conflict, if there is no concurrency conflict, go to step 202, if there is a concurrency conflict, go to step 203;

步骤202，根据步骤201，若不存在并发冲突，则提交查询任务；其中，所述查询任务为真实查询指令对应的查询任务；Step 202, according to Step 201, if there is no concurrency conflict, submit a query task; wherein, the query task is the query task corresponding to the real query instruction;

步骤203，根据步骤201，若存在并发冲突，则调用当前服务器节点上的自适应备份与当前查询任务进行比较，对所述自适应备份执行查询任务；Step 203, according to Step 201, if there is a concurrency conflict, call the adaptive backup on the current server node to compare with the current query task, and execute the query task on the adaptive backup;

步骤204，判断全部服务器节点中的查询任务是否均执行成功，若是，则执行步骤205，否则，执行步骤206；Step 204, determine whether the query tasks in all the server nodes are successfully executed, if so, go to step 205, otherwise, go to step 206;

步骤205，根据步骤204，若全部服务器节点中的查询任务均执行成功，则返回提交查询任务的决策；Step 205, according to step 204, if the query tasks in all the server nodes are successfully executed, the decision to submit the query task is returned;

步骤206，根据步骤204，若存在不成功的查询任务，则返回撤回查询任务的决策。Step 206, according to step 204, if there is an unsuccessful query task, the decision to withdraw the query task is returned.

这样，针对服务器节点中存在并发冲突问题，通过调用自适应备份进而也能够执行查询任务。具体的，在每个服务器中均存在一个自适应备份的记录信息，当存在查询指令并发冲突且需要调用自适应备份时，当前服务器通过访问当前服务器上的自适应备份记录，能够确定该记录存放的位置，即自适应备份存放在哪个服务器节点中，随后将此自适应备份与当前查询任务进行对比并给出决策，实现了在当前服务器上对自适应备份数据执行查询任务，进而保持数据一致性，同时使得整体的查询任务能够有效的并发执行，进而大大提高了数据查询的速度和效率。In this way, in view of the concurrency conflict problem in the server node, the query task can also be executed by invoking the adaptive backup. Specifically, each server has record information of an adaptive backup. When there is a concurrent conflict of query commands and an adaptive backup needs to be called, the current server can determine the storage of the record by accessing the adaptive backup record on the current server. location, that is, in which server node the adaptive backup is stored, and then compare the adaptive backup with the current query task and make a decision, which implements the query task on the adaptive backup data on the current server, thereby maintaining data consistency At the same time, the overall query task can be effectively executed concurrently, thereby greatly improving the speed and efficiency of data query.

在本发明一些可选的实施例中，所述将数据库中的所有服务器节点按照预设的功能划分为工作分配节点和数据节点的方法包括：In some optional embodiments of the present invention, the method for dividing all server nodes in the database into work assignment nodes and data nodes according to preset functions includes:

或者，or,

这样，使得不同的服务器节点能够根据自身的特点相应的实现不同的功能，能够最大化的利用服务器自身的资源。同时，这样的分配方式也使得数据库的管理更为有序、高效。In this way, different server nodes can correspondingly implement different functions according to their own characteristics, and can maximize the use of the resources of the server itself. At the same time, this distribution method also makes the management of the database more orderly and efficient.

在本发明一些可选的实施例中，所述将所述真实查询指令发送到工作分配节点，并通过所述工作分配节点将所述真实查询指令分发到不同的数据节点中的步骤还包括：In some optional embodiments of the present invention, the step of sending the real query instruction to a work distribution node, and distributing the real query instruction to different data nodes through the work distribution node further includes:

其中，所述具体使用情况包括：服务器的通信情况、每个数据节点的存储空间使用率等等与服务器节点性能相关的情况或数据，这样，可以根据服务器节点的不同情况，实现不同的查询任务分配。能够优化任务的处理速度和效率。Wherein, the specific usage includes: the communication situation of the server, the storage space usage rate of each data node, and other situations or data related to the performance of the server node. In this way, different query tasks can be implemented according to different situations of the server node. distribute. Ability to optimize task processing speed and efficiency.

进一步，在一些可选的实施例中，所述均衡策略为空间使用率最低的策略。Further, in some optional embodiments, the balancing strategy is a strategy with the lowest space usage rate.

在本发明一些可选的实施例中，所述在每个服务器节点上保持数据信息表的更新的步骤还包括：复制并存储所述数据信息表的位置信息和相关操作信息。这样，通过保存数据信息表的位置信息和相关操作信息能够实现对所有数据处理进行信息记录，进而使得用户能够根据这些信息实现撤回操作等处理。In some optional embodiments of the present invention, the step of maintaining the update of the data information table on each server node further includes: copying and storing the location information and related operation information of the data information table. In this way, by saving the location information and related operation information of the data information table, information recording of all data processing can be realized, so that the user can realize the withdrawal operation and other processing according to the information.

在本发明一些可选的实施例中，所述在每个服务器节点上保持数据信息表的更新的步骤还包括：In some optional embodiments of the present invention, the step of maintaining the update of the data information table on each server node further includes:

将每个数据节点中的全部数据均备份到指定的数据节点中；也即通过备份数据保证数据的安全性，得到自适应备份数据。All data in each data node is backed up to the designated data node; that is, data security is ensured by backing up the data, and adaptive backup data is obtained.

保持每个数据节点中全部数据以及对应数据备份的同步更新；通过保存数据的同步更新，使得每个数据节点中的全部数据以及备份的数据均保持一致。Keep the synchronous update of all data in each data node and the corresponding data backup; by saving the synchronous update of the data, all the data in each data node and the backed up data are kept consistent.

在每个数据节点上保存每个数据节点的位置信息，既能够使得通过位置信息查找到该数据节点，而且也便于实现撤回操作。The location information of each data node is stored on each data node, which not only enables the data node to be found through the location information, but also facilitates the implementation of a recall operation.

通过指定数据库地址的主数据库副本的更新，可以在数据库中保存每个数据节点的位置信息。原始数据的全部信息可以在数据节点中存储，根据这些信息，在相应的数据节点或备份数据节点的数据中进行备份数据的检索。The location information of each data node can be saved in the database by updating the primary database copy specifying the database address. All the information of the original data can be stored in the data node, and according to the information, the retrieval of the backup data is performed in the data of the corresponding data node or the backup data node.

在本发明一些可选的实施例中，所述对所述初始查询指令进行解析的步骤还包括：In some optional embodiments of the present invention, the step of parsing the initial query instruction further includes:

这样，通过对初始查询指令的逐步解析能够使得得到真实的查询指令，进而执行后续的数据查询任务。In this way, through the step-by-step analysis of the initial query command, the real query command can be obtained, and then the subsequent data query task can be executed.

可选的，本发明所述数据分发及并行处理方法通过在每个服务器节点上预先保存有查询表的副本，所以在当前服务器节点中的查询指令存在并发冲突时，当前服务器节点不能执行查询指令，返回给处理节点不能查询的结果后，处理节点通过数据信息表查找到当前服务器数据对应的自适应备份所在的备份服务器，然后处理节点命令备份服务器根据查询表副本中的查询指令查询备份数据，完成数据的查询。Optionally, the data distribution and parallel processing method of the present invention pre-stores a copy of the query table on each server node, so when there is a concurrent conflict between the query commands in the current server node, the current server node cannot execute the query command. , after returning the result that the processing node cannot query, the processing node finds the backup server where the adaptive backup corresponding to the current server data is located through the data information table, and then instructs the backup server to query the backup data according to the query instruction in the copy of the query table, Complete the data query.

参照图3所示，为本发明提供的数据分发及并行处理系统的一个实施例的流程图。所述数据分发及并行处理系统包括：Referring to FIG. 3, it is a flowchart of an embodiment of the data distribution and parallel processing system provided by the present invention. The data distribution and parallel processing system includes:

输入单元301，用于接收用户端输入的初始查询指令；An input unit 301, configured to receive an initial query instruction input by a user terminal;

解析单元302，用于对所述初始查询指令进行解析，得到符合标准的真实查询语句，并作为真实查询指令；A parsing unit 302, configured to parse the initial query instruction to obtain a real query statement that meets the standard, and use it as a real query instruction;

分发单元303，用于将所述真实查询指令发送到工作分配节点，并通过所述工作分配节点将所述真实查询指令分发到不同的数据节点中；其中，所述工作分配节点和数据节点是指数据库中所有服务器节点按照预设功能划分的节点；且所述工作分配节点用于管理数据节点的任务分配和查询结果；A distribution unit 303, configured to send the real query instruction to a work distribution node, and distribute the real query instruction to different data nodes through the work distribution node; wherein, the work distribution node and the data node are Refers to the nodes in which all server nodes in the database are divided according to preset functions; and the work assignment nodes are used to manage the task assignment and query results of data nodes;

处理单元304，用于在每个数据节点上分别执行接收到的真实查询指令，并将查询结果返回给工作分配节点，同时在每个服务器节点上保持数据信息表的更新；The processing unit 304 is used to execute the received real query instruction on each data node respectively, and return the query result to the work distribution node, while maintaining the update of the data information table on each server node;

备份单元305，用于在每个服务器节点上，对所述真实查询指令对应的查询表进行备份，使得每个服务器节点上保存有查询表副本；A backup unit 305, configured to back up the look-up table corresponding to the real query instruction on each server node, so that each server node saves a copy of the look-up table;

并发单元306，用于在每个服务器节点上，当检测到存在并发冲突时，通过调用当前服务器节点上的查询表副本对真实查询指令并发执行。The concurrency unit 306 is configured to, on each server node, concurrently execute the real query instruction by calling the copy of the query table on the current server node when a concurrency conflict is detected.

由上述实施例可知，本发明所述的数据分发及并行处理系统通过所述备份单元305对查询表进行备份，然后通过所述并发单元306实现查询任务的并发处理，使得所述数据分发及并行处理系统不仅能够保证数据的一致性而且能够提高数据查询的速度和效率。It can be seen from the above embodiments that the data distribution and parallel processing system of the present invention backs up the query table through the backup unit 305, and then implements the concurrent processing of query tasks through the concurrent unit 306, so that the data distribution and parallel processing are performed. The processing system can not only ensure the consistency of data but also improve the speed and efficiency of data query.

所属领域的普通技术人员应当理解：以上任何实施例的讨论仅为示例性的，并非旨在暗示本公开的范围(包括权利要求)被限于这些例子；在本发明的思路下，以上实施例或者不同实施例中的技术特征之间也可以进行组合，步骤可以以任意顺序实现，并存在如上所述的本发明的不同方面的许多其它变化，为了简明它们没有在细节中提供。Those of ordinary skill in the art should understand that the discussion of any of the above embodiments is only exemplary, and is not intended to imply that the scope of the present disclosure (including the claims) is limited to these examples; under the spirit of the present invention, the above embodiments or There may also be combinations between technical features in different embodiments, steps may be carried out in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

另外，为简化说明和讨论，并且为了不会使本发明难以理解，在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外，可以以框图的形式示出装置，以便避免使本发明难以理解，并且这也考虑了以下事实，即关于这些框图装置的实施方式的细节是高度取决于将要实施本发明的平台的(即，这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如，电路)以描述本发明的示例性实施例的情况下，对本领域技术人员来说显而易见的是，可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本发明。因此，这些描述应被认为是说明性的而不是限制性的。Additionally, well known power/ground connections to integrated circuit (IC) chips and other components may or may not be shown in the figures provided in order to simplify illustration and discussion, and in order not to obscure the present invention. . Furthermore, devices may be shown in block diagram form in order to avoid obscuring the present invention, and this also takes into account the fact that the details regarding the implementation of these block diagram devices are highly dependent on the platform on which the invention will be implemented (i.e. , these details should be fully within the understanding of those skilled in the art). Where specific details (eg, circuits) are set forth to describe exemplary embodiments of the invention, it will be apparent to those skilled in the art that these specific details may be used without or with changes The present invention is carried out below. Accordingly, these descriptions are to be considered illustrative rather than restrictive.

尽管已经结合了本发明的具体实施例对本发明进行了描述，但是根据前面的描述，这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如，其它存储器架构(例如，动态RAM(DRAM))可以使用所讨论的实施例。Although the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations to these embodiments will be apparent to those of ordinary skill in the art from the foregoing description. For example, other memory architectures (eg, dynamic RAM (DRAM)) may use the discussed embodiments.

本发明的实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此，凡在本发明的精神和原则之内，所做的任何省略、修改、等同替换、改进等，均应包含在本发明的保护范围之内。Embodiments of the present invention are intended to cover all such alternatives, modifications and variations that fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. a data distribution and parallel processing method, is characterized in that, comprises:

Receive the initial query command input by the client;

Analyzing the initial query instruction to obtain a real query statement that meets the standard, and use it as a real query instruction;

Send the real query instruction to the work distribution node, and distribute the real query instruction to different data nodes through the work distribution node; wherein, the work distribution node and the data node refer to all server nodes in the database Nodes divided according to preset functions; and the work assignment nodes are used to manage task assignments and query results of data nodes;

Execute the received real query instruction on each data node, and return the query result to the work distribution node, and keep the update of the data information table on each server node; wherein, the data information table is used to record data update;

On each server node, the query table corresponding to the real query instruction is backed up, so that a copy of the query table is saved on each server node; A data table that stores the obtained query results;

On each server node, when a concurrency conflict is detected, the real query instruction is executed concurrently by calling the copy of the query table on the current server node.

2. The method according to claim 1, wherein the step of concurrently executing the real query instruction by calling the copy of the query table on the current server node when a concurrency conflict is detected comprises:

Detecting whether there is a concurrency conflict in the real query instruction in each server node;

If there is no concurrency conflict, submit a query task; wherein, the query task is a query task corresponding to a real query instruction;

If there is a concurrency conflict, call the query table backup on the current server node to compare with the current query task, and execute the query task on the query table backup;

If the query tasks in all server nodes are successfully executed, the decision to submit the query task is returned, otherwise, the decision to withdraw the query task is returned.

3. The method according to claim 1, wherein the method for dividing all server nodes in the database into work assignment nodes and data nodes according to preset functions comprises:

Detect the local storage capacity and computing processing capacity of each server node;

Determine whether the computing processing capability of the current server node is greater than the preset performance threshold, if so, the current server node is set as a work distribution node, otherwise the current server node is set as a data node;

or,

It is judged whether the local storage amount of the current server node is greater than the preset storage threshold, and if so, the current server node is set as a data node; otherwise, the current server node is set as a work distribution node.

4. The method according to claim 1, wherein the real query instruction is sent to a work distribution node, and the real query instruction is distributed to different data nodes through the work distribution node. Steps also include:

Detect the specific usage of each server node;

According to the detection results, a preset balancing strategy is adopted, and data nodes and corresponding spaces are allocated on each server node.

5 . The method according to claim 4 , wherein the balancing strategy is a strategy with the lowest space utilization rate. 6 .

6. The method according to claim 1, wherein the step of maintaining the update of the data information table on each server node further comprises:

The location information and related operation information of the data information table are copied and stored.

7. The method according to claim 1, wherein the step of maintaining the update of the data information table on each server node further comprises:

Back up all data in each data node to the specified data node;

Keep all data in each data node and the synchronous update of the corresponding data backup;

The location information of each data node is saved on each data node.

8. The method according to claim 1, wherein the step of parsing the initial query instruction further comprises:

Perform lexical analysis on the characters input by the user to obtain words that meet the standard;

Perform grammatical analysis on multiple consecutive words to obtain sentences that conform to grammatical logic, and construct an abstract syntax tree at the same time;

Convert the logical SQL statement to the real SQL statement according to the abstract syntax tree, and obtain the real query statement that meets the standard, which is used as the real query instruction.

9. A data distribution and parallel processing system, comprising:

an input unit for receiving an initial query instruction input by the user terminal;

a parsing unit, configured to parse the initial query instruction to obtain a real query statement that meets the standard, and use it as a real query instruction;

A distribution unit, configured to send the real query instruction to a work distribution node, and distribute the real query instruction to different data nodes through the work distribution node; wherein, the work distribution node and the data node refer to All server nodes in the database are divided into nodes according to preset functions; and the work distribution nodes are used to manage task distribution and query results of data nodes;

The processing unit is used to execute the received real query instruction on each data node respectively, and return the query result to the work distribution node, and at the same time keep the update of the data information table on each server node; wherein, the data information Tables are used to record data updates;

A backup unit, configured to back up the look-up table corresponding to the real query instruction on each server node, so that each server node saves a copy of the look-up table; wherein, the look-up table refers to the real query instruction After the instruction query is performed, a data table that stores the obtained query result;

The concurrency unit is used on each server node to concurrently execute the real query instruction by calling the copy of the query table on the current server node when a concurrency conflict is detected.