CN105447141A

CN105447141A - Data processing method and node

Info

Publication number: CN105447141A
Application number: CN201510816312.4A
Authority: CN
Inventors: 孙志云; 郭美思
Original assignee: Inspur Group Co Ltd
Current assignee: Inspur Group Co Ltd
Priority date: 2015-11-20
Filing date: 2015-11-20
Publication date: 2016-03-30

Abstract

Embodiments of the invention provide a data processing method and node, which relate to the technical field of computers and are used for quickly searching for data to improve the efficiency of mass data search. The method comprises the steps that a master control node determines to-be-searched data information; the master control node generates a search instruction according to the to-be-searched data information, and sends the search instruction to at least one slave node, wherein the at least one slave node stores elastic distribution type data sent by the master control node, and the search instruction carries identifier information of the to-be-searched data information; the master control node obtains the elastic distribution type data corresponding to the to-be-searched data information and returned by the at least one slave node, and determines a response data set corresponding to the to-be-searched data information according to the elastic distribution data corresponding to the to-be-searched data information returned by the at least one slave node; and the main control node outputs the response data set.

Description

A data processing method and node

技术领域technical field

本发明涉及计算机技术领域，尤其涉及一种数据处理的方法及节点。The present invention relates to the field of computer technology, in particular to a data processing method and node.

背景技术Background technique

随着电子信息时代的到来，人们生活发生了巨大变化。每个人都被海量的数据包围着，无论是工作、学习和生活，数据都无处不在。人们生活中的各个领域如医疗、气象、生活等等的海量数据都已经发展成与业务应用相关，为人们提供了高质量的生活品质。然而，信息化盛行的今天，人们已经意识到数据的重要性，它具有强大的能量，蕴含着巨大的财富，人们可以通过利用海量数据进行决策，为企业和个人带来更有益的效果。但是面对海量的数据，如何对存储的海量数据快速搜索出用户所需的数据是目前亟待解决的问题。With the advent of the electronic information age, people's lives have undergone tremendous changes. Everyone is surrounded by massive amounts of data, whether it is work, study or life, data is everywhere. Massive data in various fields of people's lives, such as medical care, meteorology, life, etc., have been developed into business applications, providing people with a high quality of life. However, with the prevalence of informatization today, people have realized the importance of data. It has powerful energy and contains huge wealth. People can use massive data to make decisions and bring more beneficial effects to enterprises and individuals. However, in the face of massive data, how to quickly search for the data required by users from the stored massive data is an urgent problem to be solved.

发明内容Contents of the invention

本发明的实施例提供一种数据处理的方法及节点，用以实现快速搜索数据，提高海量数据搜索的效率。Embodiments of the present invention provide a data processing method and node, which are used to realize fast data search and improve the efficiency of massive data search.

为达到上述目的，本发明的实施例采用如下技术方案：In order to achieve the above object, embodiments of the present invention adopt the following technical solutions:

本发明实施例提供了一种数据处理的方法，应用于数据存储的集群中，所述集群包括主控节点及至少一个从节点，所述方法包括：所述主控节点确定待搜索的数据信息；所述主控节点根据所述待搜索的数据信息，生成搜索指令，并将所述搜索指令发送至所述至少一个从节点，以便于所述至少一个从节点根据所述搜索指令在存储的弹性分布式数据中找到所述待搜索的数据信息对应的弹性分数据，并将所述待搜索的数据信息对应的弹性分数据返回至所述主控节点；所述至少一个从节点中存储了所述主控节点发送的弹性分布式数据；所述搜索指令中携带有所述待搜索的数据信息的标识信息；所述主控节点获取所述至少一个从节点返回的所述待搜索的数据信息对应的弹性分布式数据，并根据所述至少一个从节点返回的所述待搜索的数据信息对应的弹性分布式数据，确定所述待搜索的数据信息对应的响应数据集；所述主控节点输出所述响应数据集。An embodiment of the present invention provides a data processing method, which is applied to a data storage cluster, the cluster includes a master control node and at least one slave node, and the method includes: the master control node determines the data information to be searched ; The master control node generates a search instruction according to the data information to be searched, and sends the search instruction to the at least one slave node, so that the at least one slave node is stored according to the search instruction Find the elastic data corresponding to the data information to be searched in the elastic distributed data, and return the elastic data corresponding to the data information to be searched to the master control node; the at least one slave node stores the The elastic distributed data sent by the master control node; the search command carries the identification information of the data information to be searched; the master control node acquires the data to be searched returned by the at least one slave node Elastic distributed data corresponding to the information, and according to the elastic distributed data corresponding to the data information to be searched returned by the at least one slave node, determine the response data set corresponding to the data information to be searched; The node outputs the response data set.

进一步的，在所述主控节点确定待搜索的数据信息之前，还包括：所述主控节点获取待存储数据；所述主控节点根据预设划分规则，将所述待存储数据划分为至少一个弹性分布式数据；所述主控节点将所述至少一个弹性分布式数据发送至所述至少一个从节点。Further, before the master control node determines the data information to be searched, it also includes: the master control node acquires the data to be stored; the master control node divides the data to be stored into at least One elastic distributed data; the master control node sends the at least one elastic distributed data to the at least one slave node.

进一步的，所述主控节点根据预设划分规则，将所述待存储数据划分为至少一个弹性分布式数据包括：所述主控节点根据预设划分规则，利用spark.textFile函数将所述待存储数据划分为至少一个弹性分布式数据。Further, dividing the data to be stored by the master control node into at least one elastic distributed data according to a preset division rule includes: using the spark.textFile function by the master control node to convert the data to be stored into Stored data is divided into at least one elastic distributed data.

进一步的，本发明实施例提供了一种数据处理的方法，应用于数据存储的集群中，所述集群包括主控节点及至少一个从节点，所述方法包括：所述从节点接收所述主控节点发送的搜索指令，并根据所述搜索指令中的待搜索的数据信息的标识信息在存储了弹性分布式数据的内存中进行搜索，获取搜索所述待搜索的数据信息的标识信息对应的弹性分布式数据；所述搜索指令中携带有所述待搜索的数据信息的标识信息；所述从节点将获取的所述待搜索的数据信息的标识信息对应的弹性分布式数据返回至所述主控节点。Further, an embodiment of the present invention provides a data processing method, which is applied to a data storage cluster, the cluster includes a master node and at least one slave node, and the method includes: the slave node receives the master According to the search instruction sent by the control node, and according to the identification information of the data information to be searched in the search instruction, search in the memory storing elastic distributed data, and obtain the search information corresponding to the identification information of the data information to be searched elastic distributed data; the search instruction carries the identification information of the data information to be searched; the slave node returns the acquired elastic distributed data corresponding to the identification information of the data information to be searched to the Master control node.

进一步的，在所述从节点接收所述主控节点发送的搜索指令，并根据所述搜索指令中的待搜索的数据信息的标识信息在存储了弹性分布式数据的内存中进行搜索，获取搜索所述待搜索的数据信息的标识信息对应的弹性分布式数据之前，还包括：所述从节点接收所述主控节点发送的弹性分布式数据；所述从节点将所述弹性分布式数据存储至内存中。Further, the slave node receives the search command sent by the master control node, and performs a search in the memory storing elastic distributed data according to the identification information of the data information to be searched in the search command, and obtains the search command. Before the elastic distributed data corresponding to the identification information of the data information to be searched, it also includes: the slave node receives the elastic distributed data sent by the master node; the slave node stores the elastic distributed data to memory.

进一步的，本发明实施例提供了一种主控节点，包括：确定单元，用于确定待搜索的数据信息；处理单元，用于根据所述待搜索的数据信息，生成搜索指令，并将所述搜索指令发送至所述至少一个从节点，以便于所述至少一个从节点根据所述搜索指令在存储的弹性分布式数据中找到所述待搜索的数据信息对应的弹性分数据，并将所述待搜索的数据信息对应的弹性分数据返回至所述主控节点；所述至少一个从节点中存储了所述主控节点发送的弹性分布式数据；所述搜索指令中携带有所述待搜索的数据信息的标识信息；所述处理单元，还用于获取所述至少一个从节点返回的所述待搜索的数据信息对应的弹性分布式数据，并根据所述至少一个从节点返回的所述待搜索的数据信息对应的弹性分布式数据，确定所述待搜索的数据信息对应的响应数据集；输出单元，用于输出所述响应数据集。Further, an embodiment of the present invention provides a master control node, including: a determining unit, configured to determine data information to be searched; a processing unit, configured to generate a search instruction according to the data information to be searched, and send the The search instruction is sent to the at least one slave node, so that the at least one slave node finds the elastic data corresponding to the data information to be searched in the stored elastic distributed data according to the search instruction, and sends the The elastic sub-data corresponding to the data information to be searched is returned to the master control node; the at least one slave node stores the elastic distributed data sent by the master control node; the search command carries the The identification information of the searched data information; the processing unit is further configured to acquire the elastic distributed data corresponding to the data information to be searched returned by the at least one slave node, and according to the at least one returned slave node Elastic distributed data corresponding to the data information to be searched, determining a response data set corresponding to the data information to be searched; an output unit configured to output the response data set.

进一步的，还包括：获取单元，用于获取待存储数据；划分单元，用于根据预设划分规则，将所述获取单元获取的所述待存储数据划分为至少一个弹性分布式数据；发送单元，用于将所述划分单元获取的所述至少一个弹性分布式数据发送至所述至少一个从节点。Further, it also includes: an acquisition unit, configured to acquire data to be stored; a division unit, configured to divide the data to be stored acquired by the acquisition unit into at least one elastic distributed data according to a preset division rule; a sending unit , configured to send the at least one elastic distributed data acquired by the division unit to the at least one slave node.

进一步的，所述划分单元，具体用于根据预设划分规则，利用spark.textFile函数将所述待存储数据划分为至少一个弹性分布式数据。Further, the division unit is specifically configured to divide the data to be stored into at least one piece of elastic distributed data by using the spark.textFile function according to preset division rules.

进一步的，本发明实施例提供了一种从节点，包括：接收单元，用于接收所述主控节点发送的搜索指令；处理单元，用于根据所述接收单元接收的所述搜索指令中的待搜索的数据信息的标识信息在存储了弹性分布式数据的内存中进行搜索，获取搜索所述待搜索的数据信息的标识信息对应的弹性分布式数据；所述搜索指令中携带有所述待搜索的数据信息的标识信息；发送单元，用于将所述处理单元获取的所述待搜索的数据信息的标识信息对应的弹性分布式数据返回至所述主控节点。Further, an embodiment of the present invention provides a slave node, including: a receiving unit, configured to receive the search instruction sent by the master control node; a processing unit, configured to receive the search instruction according to the receiving unit The identification information of the data information to be searched is searched in the memory storing the elastic distributed data, and the elastic distributed data corresponding to the identification information of the data information to be searched is obtained; the search command carries the Identification information of the data information to be searched; a sending unit configured to return elastic distributed data corresponding to the identification information of the data information to be searched acquired by the processing unit to the master control node.

进一步的，所述接收单元，还用于接收所述主控节点发送的弹性分布式数据；所述处理单元，还用于将所述接收单元接收的所述弹性分布式数据存储至内存中。Further, the receiving unit is further configured to receive the elastic distributed data sent by the master control node; the processing unit is further configured to store the elastic distributed data received by the receiving unit in memory.

本发明实施例提供了一种数据处理的方法及节点，应用于数据存储的集群中，集群包括主控节点及至少一个从节点，所述方法包括：主控节点确定待搜索的数据信息；主控节点根据所述待搜索的数据信息，生成搜索指令，并将搜索指令发送至所述至少一个从节点，以便于至少一个从节点根据搜索指令在存储的弹性分布式数据中找到待搜索的数据信息对应的弹性分数据，并将待搜索的数据信息对应的弹性分数据返回至主控节点；主控节点获取至少一个从节点返回的待搜索的数据信息对应的弹性分数据，并根据至少一个从节点返回的所述待搜索的数据信息对应的弹性分数据，确定待搜索的数据对应的响应数据集。主控节点输出所述响应数据集。这样，在需要进行数据搜索时，主控节点可以根据待搜索的数据生成搜索指令，将搜索指令发送至存储了弹性分布式数据的从节点中，从而使所有的从节点并行的获取待搜索的数据对应的弹性分布式数据。也就是说，在本发明的集群中，是由多个从节点同时各自存储的弹性分布式数据进行搜索，获取到待搜索的数据对应的弹性分布式数据。进而实现了快速搜索数据，提高了海量数据搜索的效率。The embodiment of the present invention provides a data processing method and node, which are applied in a data storage cluster, the cluster includes a master control node and at least one slave node, and the method includes: the master control node determines the data information to be searched; the master control node determines the data information to be searched; The control node generates a search instruction according to the data information to be searched, and sends the search instruction to the at least one slave node, so that at least one slave node finds the data to be searched in the stored elastic distributed data according to the search instruction information corresponding to the elastic sub-data, and return the elastic sub-data corresponding to the data information to be searched to the master control node; the master control node obtains at least one elastic sub-data corresponding to the data information to be searched returned from the From the elastic sub-data corresponding to the data information to be searched returned by the node, determine the response data set corresponding to the data to be searched. The master control node outputs the response data set. In this way, when data search is required, the master control node can generate search instructions based on the data to be searched, and send the search instructions to the slave nodes that store elastic distributed data, so that all slave nodes can obtain the data to be searched in parallel. The elastic distributed data corresponding to the data. That is to say, in the cluster of the present invention, the elastic distributed data stored by multiple slave nodes are searched at the same time, and the elastic distributed data corresponding to the data to be searched is obtained. In turn, fast data search is realized, and the efficiency of massive data search is improved.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1为本发明实施例提供的一种数据处理的方法的流程示意图；FIG. 1 is a schematic flowchart of a data processing method provided by an embodiment of the present invention;

图2为本发明实施例提供的另一种数据处理的方法的流程示意图；FIG. 2 is a schematic flowchart of another data processing method provided by an embodiment of the present invention;

图3为本发明实施例提供的另一种数据处理的方法的流程示意图；FIG. 3 is a schematic flowchart of another data processing method provided by an embodiment of the present invention;

图4为本发明实施例提供的一种主控节点的结构示意图；FIG. 4 is a schematic structural diagram of a master control node provided by an embodiment of the present invention;

图5为本发明实施例提供的另一种主控节点的结构示意图；FIG. 5 is a schematic structural diagram of another master control node provided by an embodiment of the present invention;

图6为本发明实施例提供的一种从节点的结构示意图。FIG. 6 is a schematic structural diagram of a slave node provided by an embodiment of the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

本发明实施例提供了一种数据处理的方法，应用于数据存储的集群中，集群包括主控节点及至少一个从节点。An embodiment of the present invention provides a data processing method, which is applied to a data storage cluster, and the cluster includes a master control node and at least one slave node.

需要说明的是，在本发明所有实施例中的集群中的主控节点及至少一个从节点均是由服务器组成的。且服务器的内存不小于96G。在集群中，主控节点及至少一个从节点均按照Spark依赖的环境进行安装部署，并在主控节点及至少一个从节点中安装部署Shark。并且，在主控节点及至少一个从节点中安装Hadoop组件，例如安装HDFS(HadoopDistributedFileSystem，分布式文件系统)组件。It should be noted that, in all embodiments of the present invention, the master control node and at least one slave node in the cluster are composed of servers. And the memory of the server is not less than 96G. In the cluster, the master control node and at least one slave node are installed and deployed according to the environment that Spark depends on, and Shark is installed and deployed on the master control node and at least one slave node. And, install Hadoop components on the master control node and at least one slave node, for example, install HDFS (Hadoop Distributed File System, distributed file system) components.

如图1所示，所述方法包括：As shown in Figure 1, the method includes:

步骤101、主控节点确定待搜索的数据信息。Step 101, the master control node determines the data information to be searched.

具体的，在用户需要在存了数据存储的集群中获取数据时，可以向主控节点发送待搜索的数据信息。即为，告诉主控节点用户需要获取哪些数据。Specifically, when the user needs to obtain data in the cluster where the data storage is stored, the data information to be searched can be sent to the master control node. That is, tell the master control node what data the user needs to obtain.

需要说明的是，集群的主控节点是预先设置。It should be noted that the master control node of the cluster is preset.

示例性的，若用户需要搜索文本数据为data数据，此时用户可以将data的信息发送至集群的主控节点中。主控节点可以获取到待搜索的数据信息即为data的信息。Exemplarily, if the user needs to search for text data as data data, the user can send the data information to the master control node of the cluster. The master control node can obtain the data information to be searched, which is data information.

步骤102、主控节点根据待搜索的数据信息，生成搜索指令，并将搜索指令发送至至少一个从节点，以便于至少一个从节点根据搜索指令在存储的弹性分布式数据中找到待搜索的数据信息对应的弹性分数据，并将待搜索的数据信息对应的弹性分数据返回至主控节点。Step 102, the master control node generates a search instruction according to the data information to be searched, and sends the search instruction to at least one slave node, so that at least one slave node can find the data to be searched in the stored elastic distributed data according to the search instruction The elastic sub-data corresponding to the information, and return the elastic sub-data corresponding to the data information to be searched to the main control node.

其中，至少一个从节点中存储了主控节点发送的弹性分布式数据。搜索指令中携带有待搜索的数据信息的标识信息。Among them, at least one slave node stores elastic distributed data sent by the master control node. The search instruction carries identification information of the data information to be searched.

具体的，主控节点在获取了待搜索的数据信息后，根据待搜索的数据信息生成搜索指令。此搜索指令中可以携带待搜索的数据信息的标识信息，以便从节点可以根据待搜索的数据信息的标识信息获知需搜索哪些数据。主控节点在生成搜索指令后，将此搜索指令发送至集群中的从节点中。由于从节点中存储了弹性分布式数据，因此，从节点可以根据搜索指令中的待搜索的数据信息的标识信息，在各自存储的弹性分布式数据中，查找到与待搜索的数据信息的标识信息相匹配的弹性分布式数据。即为查找到待搜索的数据信息对应的弹性分数据。集群中的至少一个从节点在查找到待搜索的数据信息对应的弹性分数据后，均将各自查找到的待搜索的数据信息对应的弹性分数据返回至主控节点。Specifically, after acquiring the data information to be searched, the master control node generates a search instruction according to the data information to be searched. The search instruction may carry identification information of the data information to be searched, so that the slave node may know which data needs to be searched according to the identification information of the data information to be searched. After the master control node generates the search command, it sends the search command to the slave nodes in the cluster. Since the elastic distributed data is stored in the slave node, the slave node can find the identity of the data information to be searched in the respectively stored elastic distributed data according to the identification information of the data information to be searched in the search instruction Elastic distributed data matching information. That is, find the elastic sub-data corresponding to the data information to be searched. After at least one slave node in the cluster finds the elastic data corresponding to the data information to be searched, it returns the respectively found elastic data corresponding to the data information to be searched to the master control node.

需要说明的是，在集群中的从节点查找待搜索的数据信息对应的弹性分数据时，若主控节点中也存储了弹性分布式数据，则也需要根据待搜索的数据信息在自身内存中存储的弹性分布式数据进行查找。It should be noted that when the slave nodes in the cluster search for the elastic data corresponding to the data information to be searched, if the elastic distributed data is also stored in the master control node, it also needs to be stored in its own memory according to the data information to be searched. Stored elastically distributed data for lookup.

需要说明的是，在本发明实施例中，需要将要存储的数据均先至集群中，即为将要存储的数据均先划分为多个弹性分布式数据，以弹性分布式数据的形式存储至各个从节点及主控节点中。这样，可以将集群中的主控节点及所有从节点存储的弹性分布式数据视为弹性分布式数据集。进一步的，主控节点及所有的从节点在存储弹性分布式数据时，均是将弹性分布式数据存储至内存中。It should be noted that, in the embodiment of the present invention, the data to be stored needs to be stored in the cluster first, that is, the data to be stored is first divided into multiple elastic distributed data, and stored in each cluster in the form of elastic distributed data. From the node and the master node. In this way, the elastic distributed data stored by the master control node and all slave nodes in the cluster can be regarded as an elastic distributed data set. Furthermore, when the master control node and all the slave nodes store the elastic distributed data, they all store the elastic distributed data in the memory.

如上例所述，主控节点在获取到data的信息后，可以根据data的信息，生成搜索data的搜索指令，且在此搜索指令中携带有data的标识信息。并将搜索data的搜索指令发送至集群中的所有从节点中。这样，从节点可以在接收到搜索data的搜索指令后，可以解析此搜索指令，进而可以获取大搜索指令中携带的data的标识信息，从节点在获取到此data的标识信息后，即可获知需要搜索data，此时从节点可以根据data的标识信息，在自身内存存储的弹性分布式数据中搜索data的数据，进而获取到data的弹性分布式数据。集群中的从节点可以将获取的data的弹性分布式数据均返回至主控节点。As described in the above example, after the master control node obtains the information of the data, it can generate a search command for searching the data according to the information of the data, and the search command carries the identification information of the data. And send the search command for searching data to all slave nodes in the cluster. In this way, after receiving the search command for searching data, the slave node can parse the search command, and then obtain the identification information of the data carried in the large search command. After obtaining the identification information of the data, the slave node can know It is necessary to search for data. At this time, the slave node can search for the data of data in the elastic distributed data stored in its own memory according to the identification information of data, and then obtain the elastic distributed data of data. The slave nodes in the cluster can return the elastic distributed data of the acquired data to the master node.

步骤103、主控节点获取至少一个从节点返回的待搜索的数据信息对应的弹性分布式数据，并根据至少一个从节点返回的待搜索的数据信息对应的弹性分布式数据，确定待搜索的数据信息对应的响应数据集。Step 103, the master control node obtains at least one elastic distributed data corresponding to the data information to be searched returned by the slave node, and determines the data to be searched according to the elastic distributed data corresponding to the data information to be searched returned by at least one slave node The response data set corresponding to the information.

具体的，主控节点接收到集群中的至少一个从节点返回的待搜索的数据信息对应的弹性分布式数据后，可以将上述至少一个从节点返回的待搜索的数据信息对应的弹性分布式数据组成的数据集，确定为待搜索的数据信息对应的响应数据集。Specifically, after the master control node receives the elastic distributed data corresponding to the data information to be searched returned by at least one slave node in the cluster, the elastic distributed data corresponding to the data information to be searched returned by the at least one slave node can be The formed data set is determined as the response data set corresponding to the data information to be searched.

如上例所述，主控节点接收到集群中的至少一个从节点返回的data的弹性分布式数据后，可以将从节点返回的所有data的弹性分布式数据组合到一起，形成data的响应数据集。进一步的，主控节点可以根据指令valdata＝file.filter(line＝>line.contains("data"))，获取data的响应数据集。As mentioned in the above example, after the master control node receives the elastic distributed data of data returned by at least one slave node in the cluster, it can combine all the elastic distributed data of data returned by the slave nodes to form the response data set of data . Further, the master control node can obtain the response data set of data according to the instruction valdata=file.filter(line=>line.contains("data")).

步骤104、主控节点输出响应数据集。Step 104, the master control node outputs the response data set.

具体的，主控节点在获取了待搜索的数据信息对应的响应数据集后，可以将此响应数据集输出，从而使用户获取到其所需的数据，进行后续的处理。Specifically, after obtaining the response data set corresponding to the data information to be searched, the master control node may output the response data set, so that the user can obtain the data required by the user for subsequent processing.

如上例所述，在主控节点确定出data的响应数据集后，可以将data的响应数据集输出给用户，从而使用户获取到其所需的data数据，进行后续的处理。As described in the above example, after the master control node determines the response data set of data, it can output the response data set of data to the user, so that the user can obtain the data they need for subsequent processing.

本发明实施例提供了一种数据处理的方法，应用于数据存储的集群中，集群包括主控节点及至少一个从节点，所述方法包括：主控节点确定待搜索的数据信息；主控节点根据所述待搜索的数据信息，生成搜索指令，并将搜索指令发送至所述至少一个从节点，以便于至少一个从节点根据搜索指令在存储的弹性分布式数据中找到待搜索的数据信息对应的弹性分数据，并将待搜索的数据信息对应的弹性分数据返回至主控节点；主控节点获取至少一个从节点返回的待搜索的数据信息对应的弹性分数据，并根据至少一个从节点返回的所述待搜索的数据信息对应的弹性分数据，确定待搜索的数据对应的响应数据集。主控节点输出所述响应数据集。这样，在需要进行数据搜索时，主控节点可以根据待搜索的数据生成搜索指令，将搜索指令发送至存储了弹性分布式数据的从节点中，从而使所有的从节点并行的获取待搜索的数据对应的弹性分布式数据。也就是说，在本发明的集群中，是由多个从节点同时各自存储的弹性分布式数据进行搜索，获取到待搜索的数据对应的弹性分布式数据。进而实现了快速搜索数据，提高了海量数据搜索的效率。An embodiment of the present invention provides a data processing method, which is applied to a data storage cluster. The cluster includes a master control node and at least one slave node. The method includes: the master control node determines the data information to be searched; the master control node According to the data information to be searched, a search instruction is generated, and the search instruction is sent to the at least one slave node, so that at least one slave node finds the corresponding data information to be searched in the stored elastic distributed data according to the search instruction Elastic score data, and return the elastic score data corresponding to the data information to be searched to the master control node; the master control node obtains the elastic score data corresponding to the data information to be searched returned by at least one slave node, and according to at least one slave node The elastic sub-data corresponding to the returned data information to be searched is used to determine the response data set corresponding to the data to be searched. The master control node outputs the response data set. In this way, when data search is required, the master control node can generate search instructions based on the data to be searched, and send the search instructions to the slave nodes that store elastic distributed data, so that all slave nodes can obtain the data to be searched in parallel. The elastic distributed data corresponding to the data. That is to say, in the cluster of the present invention, the elastic distributed data stored by multiple slave nodes are searched at the same time, and the elastic distributed data corresponding to the data to be searched is obtained. In turn, fast data search is realized, and the efficiency of massive data search is improved.

本发明实施例提供了一种数据处理的方法，应用于数据存储的集群中，所述集群包括主控节点及至少一个从节点。如图2所示，所述方法包括：An embodiment of the present invention provides a data processing method, which is applied to a data storage cluster, and the cluster includes a master control node and at least one slave node. As shown in Figure 2, the method includes:

步骤201、从节点接收主控节点发送的搜索指令，并根据搜索指令中的待搜索的数据信息的标识信息在存储了弹性分布式数据的内存中进行搜索，获取搜索待搜索的数据信息的标识信息对应的弹性分布式数据。Step 201, the slave node receives the search instruction sent by the master control node, and searches in the memory storing elastic distributed data according to the identification information of the data information to be searched in the search instruction, and obtains the identification of the data information to be searched Elastic distributed data corresponding to information.

其中，搜索指令中携带有待搜索的数据信息的标识信息。Wherein, the search instruction carries identification information of the data information to be searched.

具体的，从节点接收到主控节点发送的搜索指令后，可以解析此搜索指令，进而可以获取到搜索指令中携带的待搜索的数据信息的标识信息。从节点可以根据此待搜索的数据信息的标识信息在其存储了弹性分布式数据的内存中进行搜索，获取此待搜索的数据信息的标识信息对应的弹性分布式数据。即为获取到了待搜索的数据信息对应的弹性分布式数据Specifically, after receiving the search instruction sent by the master control node, the slave node can parse the search instruction, and then can obtain the identification information of the data information to be searched carried in the search instruction. The slave node may search in its memory storing elastic distributed data according to the identification information of the data information to be searched, and obtain the elastic distributed data corresponding to the identification information of the data information to be searched. That is, the elastic distributed data corresponding to the data information to be searched has been obtained

步骤202、从节点将获取的待搜索的数据信息的标识信息对应的弹性分布式数据返回至主控节点。Step 202, the slave node returns the acquired elastic distributed data corresponding to the identification information of the data information to be searched to the master control node.

具体的，从节点在获取了待搜索的数据信息的标识信息对应的弹性分布式数据后，可以将获取的待搜索的数据信息的标识信息对应的弹性分布式数据返回至主控节点。即为，将待搜索的数据信息对应的弹性分布式数据返回至主控节点。Specifically, after obtaining the elastic distributed data corresponding to the identification information of the data information to be searched, the slave node may return the obtained elastic distributed data corresponding to the identification information of the data information to be searched to the master control node. That is, the elastic distributed data corresponding to the data information to be searched is returned to the master control node.

本发明实施例提供了一种数据处理的方法，应用于数据存储的集群中，集群包括主控节点及至少一个从节点，所述方法包括：从节点接收主控节点发送的搜索指令，并根据搜索指令中的待搜索的数据信息的标识信息在存储了弹性分布式数据的内存中进行搜索，获取搜索待搜索的数据信息的标识信息对应的弹性分布式数据；从节点将获取的待搜索的数据信息的标识信息对应的弹性分布式数据返回至主控节点。这样，在需要进行数据搜索时，集群中的所有的从节点在接收到搜索指令后，可以根据搜索指令进行搜索，获取待搜索的数据信息对应的弹性分布式数据。也就是说，在本发明的集群中，是由多个从节点同时各自存储的弹性分布式数据进行搜索，获取到待搜索的数据信息对应的弹性分布式数据。进而实现了快速搜索数据，提高了海量数据搜索的效率。An embodiment of the present invention provides a data processing method, which is applied to a data storage cluster. The cluster includes a master control node and at least one slave node. The method includes: the slave node receives a search instruction sent by the master control node, and according to The identification information of the data information to be searched in the search command is searched in the memory storing the elastic distributed data, and the elastic distributed data corresponding to the identification information of the searched data information is obtained; The elastic distributed data corresponding to the identification information of the data information is returned to the master control node. In this way, when data search is required, all slave nodes in the cluster can search according to the search instruction after receiving the search instruction, and obtain elastic distributed data corresponding to the data information to be searched. That is to say, in the cluster of the present invention, elastic distributed data corresponding to the data information to be searched is obtained by searching the elastic distributed data stored separately by multiple slave nodes at the same time. In turn, fast data search is realized, and the efficiency of massive data search is improved.

本发明实施例提供了一种数据处理的方法，应用于数据存储的集群中，所述集群包括主控节点及至少一个从节点。如图3所示，所述方法包括：An embodiment of the present invention provides a data processing method, which is applied to a data storage cluster, and the cluster includes a master control node and at least one slave node. As shown in Figure 3, the method includes:

步骤301、主控节点获取待存储数据。Step 301, the master control node acquires data to be stored.

具体的，由于数据存储的集群用于存储数据，因此，在用户需要进行数据存储时，可以将数据存储至此集群中。此时，用户可以将数据发送至集群的主控节点中。此时，主控节点可以获取到待存储数据。Specifically, since the data storage cluster is used to store data, when the user needs to store data, the data can be stored in the cluster. At this point, users can send data to the master node of the cluster. At this point, the master control node can obtain the data to be stored.

当然，在其他设备需要进行数据存储时，也可以存储至集群中。此时，其他设备可以将要存储的数据发送至集群的主控节点中。此时，主控节点可以获取到待存储数据。Of course, when other devices need to store data, they can also be stored in the cluster. At this point, other devices can send the data to be stored to the master control node of the cluster. At this point, the master control node can obtain the data to be stored.

步骤302、主控节点根据预设划分规则，将待存储数据划分为至少一个弹性分布式数据。Step 302, the master control node divides the data to be stored into at least one piece of elastic distributed data according to preset division rules.

具体的，主控节点在获取到待存储数据后，可以根据预设划分规则，将获取的待存储数据划分为至少一个弹性分布式数据。Specifically, after acquiring the data to be stored, the master control node may divide the acquired data to be stored into at least one piece of elastic distributed data according to preset division rules.

需要说明的是，预设划分规则是预先设置的，用于将数据划分为多个弹性分布式数据的规则。例如，预设划分规则可以是将数据按照a大小进行划分。此时，主控节点将待存储数据，按照a大小进行划分，将待存储数据每a大小的数据划分为一个弹性分布式数据，从而将待存储数据划分为至少一个弹性分布式数据。It should be noted that the preset division rule is a preset rule for dividing data into multiple elastic distributed data. For example, the preset division rule may be to divide the data according to a size. At this time, the master control node divides the data to be stored according to the size of a, and divides each data of the size a into elastic distributed data, thereby dividing the data to be stored into at least one elastic distributed data.

进一步的，主控节点根据预设划分规则，将待存储数据划分为至少一个弹性分布式数据包括：主控节点根据预设划分规则，利用spark.textFile函数将待存储数据划分为至少一个弹性分布式数据。Further, the master control node divides the data to be stored into at least one elastic distributed data according to the preset division rules, including: the master control node uses the spark.textFile function to divide the data to be stored into at least one elastic distribution according to the preset division rules format data.

也就是说，主控节点可以根据预设划分规则，通过spark.textFile函数进行待存储数据的划分，将待存储数据划分为至少一个弹性分布式数据。That is to say, the master control node can divide the data to be stored through the spark.textFile function according to the preset division rules, and divide the data to be stored into at least one elastic distributed data.

步骤303、主控节点将至少一个弹性分布式数据发送至至少一个从节点中。从节点接收主控节点发送的弹性分布式数据。Step 303, the master control node sends at least one elastic distributed data to at least one slave node. The slave node receives the elastic distributed data sent by the master node.

具体的，主控节点在将待存储数据划分为至少一个弹性分布式数据后，需要将此至少一个弹性分布式数据分布在集群中的各个节点中。此时，主控节点可以将弹性分布式数据依次发送到集群中的各个节点中。这样，集群中的各个从节点可以分别接收主控节点发送的弹性分布式数据。Specifically, after the master control node divides the data to be stored into at least one elastic distributed data, it needs to distribute the at least one elastic distributed data among the nodes in the cluster. At this point, the master control node can sequentially send elastic distributed data to each node in the cluster. In this way, each slave node in the cluster can respectively receive elastic distributed data sent by the master control node.

需要说明的是，主控节点也可存储弹性分布式数据，此时主控节点在将弹性分布式数据发送至至少一个从节点时，可以按照一定的顺序，将弹性分布式数据均匀的发送至各个节点中。It should be noted that the master control node can also store elastic distributed data. At this time, when the master control node sends elastic distributed data to at least one slave node, it can evenly send elastic distributed data to in each node.

需要说明的是，主控节点发送至不同节点的弹性分布式数据不同。即为，主控节点将不同的弹性分布式数据发送至不同的节点中。It should be noted that the elastic distributed data sent by the master control node to different nodes is different. That is, the master control node sends different elastic distributed data to different nodes.

例如，主控节点将待存储数据划分了弹性分布式数据1，弹性分布式数据2，弹性分布式数据3，弹性分布式数据4。在集群中包含有主控节点及从节点1，从节点2及从节点3。主控节点可以将弹性分布式数据1发送至从节点1，将弹性分布式数据2发送至从节点2，将弹性分布式数据3发送至从节点3，将弹性分布式数据4在主控节点自身内存储。这样，从节点1可以接收到弹性分布式数据1。从节点2可以接收到弹性分布式数据2。从节点3可以接收到弹性分布式数据3。主控节点将弹性分布式数据4存储在自身的内存中。For example, the master control node divides the data to be stored into elastic distributed data 1, elastic distributed data 2, elastic distributed data 3, and elastic distributed data 4. The cluster includes the master control node and slave node 1, slave node 2 and slave node 3. The master node can send elastic distributed data 1 to slave node 1, elastic distributed data 2 to slave node 2, elastic distributed data 3 to slave node 3, and elastic distributed data 4 to the master node internal storage. In this way, slave node 1 can receive resilient distributed data 1 . Resilient distributed data 2 can be received from node 2. Resilient distributed data 3 can be received from node 3 . The master control node stores elastic distributed data 4 in its own memory.

步骤304、从节点将弹性分布式数据存储至内存中。Step 304, the slave node stores the elastic distributed data in memory.

具体的，从节点在接收到弹性分布式数据后，需要将弹性分布式数据存储起来。由于在存储数据的集群中，可以将数据均存储在内存中，进而可以提高数据处理的速度。因此，从节点可以将获取的弹性分布式数据均存储在自身的内存中。Specifically, after the slave node receives the elastic distributed data, it needs to store the elastic distributed data. In the cluster for storing data, all the data can be stored in the memory, thereby improving the speed of data processing. Therefore, the slave nodes can store the acquired elastic distributed data in their own memory.

进一步的，从节点可以利用cache函数将弹性分布式数据加载在内存中存储。Furthermore, slave nodes can use the cache function to load elastic distributed data into memory for storage.

进一步的，集群中的各个节点包含从节点及主控节点将弹性分布式数据存储在内存时，可以将弹性分布式数据作为配置数据存储在内存中。Further, when each node in the cluster, including the slave node and the master node, stores the elastic distributed data in the memory, the elastic distributed data may be stored in the memory as configuration data.

步骤305、主控节点确定待搜索的数据信息。Step 305, the master control node determines the data information to be searched.

具体的，可参考步骤101，在此不再赘述。For details, reference may be made to step 101, which will not be repeated here.

步骤306、主控节点根据所述待搜索的数据信息，生成搜索指令，并将搜索指令发送至至少一个从节点，以便于至少一个从节点根据搜索指令在存储的弹性分布式数据中找到待搜索的数据信息对应的弹性分数据，并将待搜索的数据信息对应的弹性分数据返回至主控节点。从节点接收主控节点发送的搜索指令，并根据搜索指令中的待搜索的数据信息的标识信息在存储了弹性分布式数据的内存中进行搜索，获取搜索待搜索的数据信息的标识信息对应的弹性分布式数据。Step 306, the master control node generates a search command according to the data information to be searched, and sends the search command to at least one slave node, so that at least one slave node can find the search command in the stored elastic distributed data according to the search command. The elastic data corresponding to the data information to be searched is returned to the main control node. The slave node receives the search command sent by the master control node, and searches in the memory storing the elastic distributed data according to the identification information of the data information to be searched in the search command, and obtains the search command corresponding to the identification information of the data information to be searched. Elastic Distributed Data.

具体的，可参考步骤102及步骤201，在此不再赘述。Specifically, reference may be made to step 102 and step 201, which will not be repeated here.

步骤307、从节点将获取的待搜索的数据信息的标识信息对应的弹性分布式数据返回至主控节点。主控节点获取至少一个从节点返回的待搜索的数据信息对应的弹性分布式数据，并根据至少一个从节点返回的待搜索的数据信息对应的弹性分布式数据，确定待搜索的数据信息对应的响应数据集。Step 307, the slave node returns the obtained elastic distributed data corresponding to the identification information of the data information to be searched to the master control node. The master control node obtains at least one elastic distributed data corresponding to the data information to be searched returned by the slave node, and determines the corresponding Response dataset.

具体的，可参考步骤103及步骤202，在此不再赘述。Specifically, reference may be made to step 103 and step 202, which will not be repeated here.

步骤308、主控节点输出所述响应数据集。Step 308, the master control node outputs the response data set.

具体的，可参考步骤104，在此不再赘述。For details, reference may be made to step 104, which will not be repeated here.

本发明实施例提供了一种主控节点，如图4所示，包括：An embodiment of the present invention provides a master control node, as shown in FIG. 4 , including:

确定单元401，用于确定待搜索的数据信息。A determining unit 401, configured to determine data information to be searched.

处理单元402，用于根据待搜索的数据信息，生成搜索指令，并将搜索指令发送至所述至少一个从节点，以便于至少一个从节点根据搜索指令在存储的弹性分布式数据中找到待搜索的数据信息对应的弹性分数据，并将待搜索的数据信息对应的弹性分数据返回至主控节点。The processing unit 402 is configured to generate a search instruction according to the data information to be searched, and send the search instruction to the at least one slave node, so that at least one slave node can find the search instruction in the stored elastic distributed data according to the search instruction. The elastic data corresponding to the data information to be searched is returned to the main control node.

处理单元402，还用于获取至少一个从节点返回的待搜索的数据信息对应的弹性分布式数据，并根据至少一个从节点返回的待搜索的数据信息对应的弹性分布式数据，确定待搜索的数据信息对应的响应数据集。The processing unit 402 is further configured to acquire at least one elastic distributed data corresponding to the data information to be searched returned from the node, and determine the elastic distributed data to be searched according to at least one elastic distributed data corresponding to the data information to be searched returned from the node The response data set corresponding to the data information.

输出单元403，用于输出响应数据集。The output unit 403 is configured to output the response data set.

进一步的，上述主控节点，如图5所示，还包括：Further, the above-mentioned master control node, as shown in Figure 5, also includes:

获取单元404，用于获取待存储数据。The acquiring unit 404 is configured to acquire data to be stored.

划分单元405，用于根据预设划分规则，将获取单元404获取的待存储数据划分为至少一个弹性分布式数据。The division unit 405 is configured to divide the data to be stored acquired by the acquisition unit 404 into at least one elastic distributed data according to a preset division rule.

具体的，划分单元405，具体用于根据预设划分规则，利用spark.textFile函数将待存储数据划分为至少一个弹性分布式数据。Specifically, the division unit 405 is specifically configured to divide the data to be stored into at least one piece of elastic distributed data by using the spark.textFile function according to preset division rules.

发送单元406，用于将划分单元405获取的至少一个弹性分布式数据发送至至少一个从节点。The sending unit 406 is configured to send at least one elastic distributed data acquired by the dividing unit 405 to at least one slave node.

本发明实施例提供了一种主控节点，包括：主控节点确定待搜索的数据信息；主控节点根据所述待搜索的数据信息，生成搜索指令，并将搜索指令发送至所述至少一个从节点，以便于至少一个从节点根据搜索指令在存储的弹性分布式数据中找到待搜索的数据信息对应的弹性分数据，并将待搜索的数据信息对应的弹性分数据返回至主控节点；主控节点获取至少一个从节点返回的待搜索的数据信息对应的弹性分数据，并根据至少一个从节点返回的所述待搜索的数据信息对应的弹性分数据，确定待搜索的数据对应的响应数据集。主控节点输出所述响应数据集。这样，在需要进行数据搜索时，主控节点可以根据待搜索的数据生成搜索指令，将搜索指令发送至存储了弹性分布式数据的从节点中，从而使所有的从节点并行的获取待搜索的数据对应的弹性分布式数据。也就是说，在本发明的集群中，是由多个从节点同时各自存储的弹性分布式数据进行搜索，获取到待搜索的数据对应的弹性分布式数据。进而实现了快速搜索数据，提高了海量数据搜索的效率。An embodiment of the present invention provides a master control node, including: the master control node determines the data information to be searched; the master control node generates a search instruction according to the data information to be searched, and sends the search instruction to the at least one The slave node is so that at least one slave node finds the elastic data corresponding to the data information to be searched in the stored elastic distributed data according to the search instruction, and returns the elastic data corresponding to the data information to be searched to the master control node; The master control node obtains at least one elastic data corresponding to the data information to be searched returned by the slave node, and determines the response corresponding to the data to be searched according to the elastic data corresponding to the data information to be searched returned by at least one slave node data set. The master control node outputs the response data set. In this way, when data search is required, the master control node can generate search instructions based on the data to be searched, and send the search instructions to the slave nodes that store elastic distributed data, so that all slave nodes can obtain the data to be searched in parallel. The elastic distributed data corresponding to the data. That is to say, in the cluster of the present invention, the elastic distributed data stored by multiple slave nodes are searched at the same time, and the elastic distributed data corresponding to the data to be searched is obtained. In turn, fast data search is realized, and the efficiency of massive data search is improved.

本发明实施例提供了一种从节点，如图6所示，包括：An embodiment of the present invention provides a slave node, as shown in FIG. 6, including:

接收单元601，用于接收主控节点发送的搜索指令。The receiving unit 601 is configured to receive a search instruction sent by the master control node.

处理单元602，用于根据接收单元601接收的搜索指令中的待搜索的数据信息的标识信息在存储了弹性分布式数据的内存中进行搜索，获取搜索待搜索的数据信息的标识信息对应的弹性分布式数据。The processing unit 602 is configured to search in the memory storing the elastic distributed data according to the identification information of the data information to be searched in the search instruction received by the receiving unit 601, and obtain the elasticity corresponding to the identification information of the data information to be searched. distributed data.

发送单元603，用于将处理单元602获取的待搜索的数据信息的标识信息对应的弹性分布式数据返回至主控节点。The sending unit 603 is configured to return the elastic distributed data corresponding to the identification information of the data information to be searched acquired by the processing unit 602 to the master control node.

进一步的，上述接收单元601，还用于接收主控节点发送的弹性分布式数据。Further, the above-mentioned receiving unit 601 is also configured to receive elastic distributed data sent by the master control node.

处理单元602，还用于将接收单元601接收的弹性分布式数据存储至内存中。The processing unit 602 is further configured to store the elastic distributed data received by the receiving unit 601 in memory.

本发明实施例提供了一种从节点，包括：从节点接收主控节点发送的搜索指令，并根据搜索指令中的待搜索的数据信息的标识信息在存储了弹性分布式数据的内存中进行搜索，获取搜索待搜索的数据信息的标识信息对应的弹性分布式数据；从节点将获取的待搜索的数据信息的标识信息对应的弹性分布式数据返回至主控节点。这样，在需要进行数据搜索时，集群中的所有的从节点在接收到搜索指令后，可以根据搜索指令进行搜索，获取待搜索的数据信息对应的弹性分布式数据。也就是说，在本发明的集群中，是由多个从节点同时各自存储的弹性分布式数据进行搜索，获取到待搜索的数据信息对应的弹性分布式数据。进而实现了快速搜索数据，提高了海量数据搜索的效率。An embodiment of the present invention provides a slave node, including: the slave node receives a search command sent by the master control node, and performs a search in the memory storing elastic distributed data according to the identification information of the data information to be searched in the search command The elastic distributed data corresponding to the identification information of the data information to be searched is acquired; the slave node returns the acquired elastic distributed data corresponding to the identification information of the data information to be searched to the master control node. In this way, when data search is required, all slave nodes in the cluster can search according to the search instruction after receiving the search instruction, and obtain elastic distributed data corresponding to the data information to be searched. That is to say, in the cluster of the present invention, elastic distributed data corresponding to the data information to be searched is obtained by searching the elastic distributed data stored separately by multiple slave nodes at the same time. In turn, fast data search is realized, and the efficiency of massive data search is improved.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims

1. A method for data processing, characterized in that it is applied in a cluster of data storage, the cluster includes a master control node and at least one slave node, and the method comprises:

The master control node determines the data information to be searched;

The master control node generates a search instruction according to the data information to be searched, and sends the search instruction to the at least one slave node, so that the at least one slave node is stored according to the elasticity of the search instruction. find the elastic sub-data corresponding to the data information to be searched in the distributed data, and return the elastic sub-data corresponding to the data information to be searched to the master control node; the at least one slave node stores the The elastic distributed data sent by the master control node; the search command carries the identification information of the data information to be searched;

The master control node acquires elastic distributed data corresponding to the data information to be searched returned by the at least one slave node, and according to the elastic distributed data corresponding to the data information to be searched returned by the at least one slave node data, determining a response data set corresponding to the data information to be searched;

The master control node outputs the response data set.

2. The method according to claim 1, wherein, before the master control node determines the data information to be searched, further comprising:

The master control node obtains the data to be stored;

The master control node divides the data to be stored into at least one elastic distributed data according to preset division rules;

The master control node sends the at least one elastic distributed data to the at least one slave node.

3. The method according to claim 2, wherein the master control node divides the data to be stored into at least one elastic distributed data according to preset division rules comprising:

The master control node divides the data to be stored into at least one piece of elastic distributed data by using a spark.textFile function according to a preset division rule.

4. A method for data processing, characterized in that it is applied in a cluster of data storage, the cluster includes a master node and at least one slave node, and the method comprises:

The slave node receives the search instruction sent by the master control node, and performs a search in the memory storing the elastic distributed data according to the identification information of the data information to be searched in the search instruction, and obtains and searches the search instruction. The elastic distributed data corresponding to the identification information of the data information; the identification information of the data information to be searched is carried in the search instruction;

The slave node returns the acquired elastic distributed data corresponding to the identification information of the data information to be searched to the master control node.

5. The method according to claim 4, wherein the slave node receives the search instruction sent by the master control node, and stores the search instruction according to the identification information of the data information to be searched in the search instruction. Searching in the memory of the elastic distributed data, before obtaining the elastic distributed data corresponding to the identification information of the data information to be searched, further includes:

The slave node receives the elastic distributed data sent by the master control node;

The slave node stores the elastic distributed data in memory.

6. A master control node, characterized in that, comprising:

a determining unit, configured to determine the data information to be searched;

a processing unit, configured to generate a search instruction according to the data information to be searched, and send the search instruction to the at least one slave node, so that the at least one slave node can store elastic find the elastic sub-data corresponding to the data information to be searched in the distributed data, and return the elastic sub-data corresponding to the data information to be searched to the master control node; the at least one slave node stores the The elastic distributed data sent by the master control node; the search command carries the identification information of the data information to be searched;

The processing unit is further configured to acquire the elastic distributed data corresponding to the data information to be searched returned by the at least one slave node, and according to the data information corresponding to the data information to be searched returned by the at least one slave node Elastic distributed data, determining a response data set corresponding to the data information to be searched;

an output unit, configured to output the response data set.

7. The master control node according to claim 6, further comprising:

an acquisition unit, configured to acquire data to be stored;

A division unit, configured to divide the data to be stored acquired by the acquisition unit into at least one piece of elastic distributed data according to a preset division rule;

A sending unit, configured to send the at least one elastic distributed data acquired by the dividing unit to the at least one slave node.

8. The master control node according to claim 7, characterized in that,

The division unit is specifically configured to divide the data to be stored into at least one piece of elastic distributed data by using a spark.textFile function according to a preset division rule.

9. A slave node, characterized in that, comprising:

a receiving unit, configured to receive a search instruction sent by the master control node;

A processing unit, configured to search in the memory storing elastic distributed data according to the identification information of the data information to be searched in the search instruction received by the receiving unit, and obtain an identification for searching the data information to be searched The elastic distributed data corresponding to the information; the search instruction carries the identification information of the data information to be searched;

A sending unit, configured to return the elastic distributed data corresponding to the identification information of the data information to be searched acquired by the processing unit to the master control node.

10. The slave node according to claim 9, characterized in that,

The receiving unit is further configured to receive elastic distributed data sent by the master control node;

The processing unit is further configured to store the elastic distributed data received by the receiving unit in a memory.