CN118069302A

CN118069302A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN118069302A
Application number: CN202211474500.XA
Authority: CN
Inventors: 张韶全; 陈奕安; 陈鹏; 黄俊奕; 蒋杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2024-05-24

Abstract

The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, electronic device, and storage medium, for improving data processing efficiency of a node. The method comprises the following steps: dividing a plan to be executed into at least one stage to be executed; when each to-be-executed stage is processed, the computing power information of each candidate working node in the service cluster is acquired, and the following operations are executed: selecting at least one target working node based on the obtained calculation force information, and scheduling a plurality of tasks to be executed to the at least one target working node; the following operations are performed for each task to be performed: and selecting a corresponding number of data subsets from a plurality of data subsets corresponding to the current stage to be executed based on the current computational power information of the target working node associated with the task to be executed, and distributing the data subsets to the target working node for data processing. The application distributes tasks and data subsets based on the calculation information of the working nodes, and improves the rationality of data distribution and the processing efficiency of the nodes.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a storage medium.

Background

With the rapid development of internet technology, various industries are expanding in the internet, so that various applications related to various industries are greatly increased, and data generated by the applications are rapidly growing. How to perform the calculation and processing of data faster becomes one of the important points of research.

In the related art, data needs to be distributed to each working node of a service cluster for processing, and the distribution is often performed according to an average distribution principle, that is, the data distributed by each working node is the same in quantity, and the distribution mode is too hard.

Further, when the computing power of a central processing unit (Central Processing Unit, CPU) of a certain working node is too low, the management node is informed of the need of performing the offline operation on the working node by calling an interface of the management node, after receiving the notification, the management node informs the working node of preparing the offline operation, and stops distributing tasks to the working node, until the data stored on the working node is completed, the task is not formally offline, and the time granularity from the fluctuation of the computing power of the CPU node to the formally offline operation of the node is coarse and long.

In summary, how to more reasonably distribute data and improve the data processing efficiency of the node is needed to be solved.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a storage medium, which are used for improving the data processing efficiency of nodes.

The data processing method provided by the embodiment of the application comprises the following steps:

Dividing a plan to be executed into at least one stage to be executed, wherein each stage to be executed comprises a plurality of parallel tasks to be executed; wherein each phase to be executed corresponds to a plurality of data subsets;

When each to-be-executed stage is processed, the computing power information of each candidate working node in the service cluster is acquired, and the following operations are executed:

selecting at least one target working node from the candidate working nodes based on the obtained calculation force information, and scheduling the tasks to be executed to the at least one target working node;

For each task to be executed, the following operations are executed respectively: based on the current computing power information of a target working node associated with a task to be executed, selecting a corresponding number of data subsets from a plurality of data subsets corresponding to a current stage to be executed, and distributing the data subsets to the target working node for data processing.

The data processing device provided by the embodiment of the application comprises:

The splitting unit is used for dividing the plan to be executed into at least one stage to be executed, and each stage to be executed comprises a plurality of parallel tasks to be executed; wherein each phase to be executed corresponds to a plurality of data subsets;

The distribution unit is used for acquiring the calculation force information of each candidate working node in the service cluster when each stage to be executed is processed, and executing the following operations:

Optionally, the distribution unit is specifically configured to:

Based on current calculation power information of a target working node associated with a task to be executed, selecting a first number of data subsets from a plurality of data subsets corresponding to a current stage to be executed, and distributing the first number of data subsets to the target working node for data processing, wherein the first number and the current calculation power information of the target working node meet a preset proportional relation.

Optionally, the apparatus further includes:

The detection unit is used for selecting a corresponding number of data subsets from the plurality of data subsets corresponding to the current stage to be executed, distributing the data subsets to the target working nodes for data processing, respectively detecting the execution condition of each target working node on the distributed data subsets, and obtaining a corresponding detection result;

And reassigning the data subset to be processed in the assigned data subsets and the task to be executed corresponding to the data subset to be processed based on the obtained detection results and the current calculation power information of each target working node.

Optionally, the detection unit is specifically configured to:

For each target working node, the following operations are performed:

If the detection result corresponding to one target working node represents that the number of the data subsets to be processed in the one target working node is not matched with the current calculation power information of the one target working node, taking the data subsets to be processed in the one target working node as the data subsets to be distributed;

And reassigning the acquired data subsets to be assigned and tasks to be executed corresponding to the data subsets to be assigned based on the current computing power information of each target working node.

Optionally, the detection unit is specifically configured to:

and based on the current calculation power information of each target working node, reassigning a second number of data subsets in the data subsets to be assigned and tasks to be executed corresponding to the second number of data subsets from the target working nodes to other target working nodes so that after reassigning:

the number of the data subsets of the target working node which is located before and the current corresponding calculation force information meet the preset proportional relation, and the number of the data subsets of the other target working nodes and the current corresponding calculation force information meet the preset proportional relation.

Optionally, the detection unit is specifically configured to determine the second number by:

And determining the second number based on the current calculation force information of each target working node, the multiple relation between the current calculation force information of each target working node and the calculation force required by one data subset, and the number of the data subsets to be processed in each target working node.

Optionally, the distribution unit is specifically configured to:

for each candidate working node, the following operations are respectively executed: if the current calculation power information of one candidate working node is not lower than a preset threshold value, taking the one candidate working node as a target working node; or (b)

And sequencing each candidate working node based on the current calculation power information of each candidate working node, and taking the candidate working node within the specified sequence range as a target working node.

Optionally, the distribution unit is further configured to:

And if the current calculation power information of the target working node is determined to be lower than the preset threshold value, executing the offline operation on the target working node.

Optionally, the detection unit is specifically configured to:

For each target working node, the following operations are performed:

Detecting the execution condition of the distributed data subsets by a target working node every other preset period; or (b)

And detecting the execution condition of the distributed data subsets by a target working node in real time.

An electronic device provided in an embodiment of the present application includes a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to execute any one of the steps of the data processing method described above.

An embodiment of the present application provides a computer-readable storage medium including a computer program for causing an electronic device to execute the steps of any one of the data processing methods described above, when the computer program is run on the electronic device.

Embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium; when the processor of the electronic device reads the computer program from the computer readable storage medium, the processor executes the computer program, so that the electronic device performs the steps of any one of the data processing methods described above.

The application has the following beneficial effects:

The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a storage medium. The application can detect the calculation power of each working node by acquiring the calculation power information of each working node in real time before the task to be executed and the data subset are distributed, and select the target working node for data processing from each working node according to the obtained calculation power information of the working node, and the task to be executed is not distributed to the working node with too low calculation power, thereby improving the data processing speed, and avoiding the situation that the working node is detected and the working node is notified in a downlink after the calculation power of the working node is too low and the data subset is distributed in the related art. Furthermore, the application dynamically allocates the tasks to be executed according to the calculation power of each target working node, and dynamically allocates the quantity of the data subsets to each task to be allocated according to the calculation power of each target working node, thereby avoiding the slow speed of overall data processing of the target working node with low calculation power, realizing full utilization of resources for the target working node with high calculation power and improving the rationality of data allocation.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

Fig. 1 is a schematic diagram of a data allocation manner in the related art according to an embodiment of the present application;

Fig. 2 is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 3 is a flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of task allocation to be performed according to an embodiment of the present application;

FIG. 5 is a logic diagram of a target node according to an embodiment of the present application;

FIG. 6 is a schematic diagram of data subset allocation according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 8 is a schematic diagram illustrating an execution of a stage to be executed according to an embodiment of the present application;

FIG. 9 is a schematic diagram of data subset reassignment according to an embodiment of the present application;

FIG. 10 is a schematic diagram of task to be performed and data subset allocation according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a change in computing force of a working node according to an embodiment of the present application;

fig. 12 is a schematic diagram of execution of a task to be executed in a native MPP architecture according to an embodiment of the present application;

FIG. 13 is a schematic diagram of an execution situation of a task to be executed under dynamic scheduling according to an embodiment of the present application;

FIG. 14 is a schematic diagram of a scheduling situation of CPU capability of each working node under dynamic scheduling according to an embodiment of the present application;

FIG. 15 is a time-consuming comparison of the present application;

FIG. 16 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

Fig. 17 is a schematic diagram of a hardware composition structure of an electronic device according to an embodiment of the present application;

Fig. 18 is a schematic diagram of a hardware composition structure of another electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.

Some of the concepts involved in the embodiments of the present application are described below.

And (3) node: the method comprises the steps of dividing the method into a management node (also called a Master) and a working node (also called a Worker), wherein the management node is responsible for receiving, processing and distributing a plan to be executed and returning results, managing all the working nodes in the architecture, and the working nodes are responsible for receiving tasks to be executed and data subsets and processing and calculating the data; in the application, the working nodes are further divided into candidate working nodes and target working nodes, all the working nodes are candidate working nodes before the task to be executed is distributed before each stage to be executed starts, and the target working nodes can be selected from the candidate working nodes.

The plan to be executed: when receiving a statement input by an object, the management node analyzes the statement and converts the statement into a query execution plan which consists of each mutually-related to-be-executed stage.

Stage to be performed (also referred to as Stage): a plan to be executed may be divided into at least one phase to be executed according to actual circumstances, and the phases are executed in a certain order. For two adjacent phases to be executed, the execution of the next phase to be executed can be implemented in combination with the execution result of the previous phase to be executed. For example, if one to-be-executed plan is to retrieve an object using the application program a and an object using the application program B, an alternative way of dividing the to-be-executed plan into two to-be-executed phases, where the first to-be-executed phase is to retrieve an object using the application program a, and the second to-be-executed phase is to retrieve an object using the application program a and an object using the application program B based on the first to-be-executed phase; in addition, the method can be divided into three phases to be executed: retrieving an object using application a; retrieving an object using application B; on the basis of the execution results of the first two execution phases, retrieving an object using application A and using application B; the first two stages can be executed in parallel, that is, the stages to be executed in parallel are equally applicable to the application, and before each stage to be executed is processed, the target working node needs to be screened, so that the task to be executed and the data subset are distributed to the target working node.

Task to be performed (also called Task): a phase to be executed can be divided into a plurality of parallel tasks to be executed, and the management node distributes each task to be executed to each target working node to process data required by the phase to be executed. Each task to be executed may pull data produced by the task to be executed in the previous execution stage, and the generated data may be used by the task to be executed in the next execution stage.

Subset of data (also known as Split): each input data of a task to be performed is divided into at least one data subset, each data subset being understood as a part of the data, i.e. one data subset contains at least one data.

Calculating force information: the computing power refers to computing power, and the computing power information represents the computing power of one working node, such as CPU occupancy rate, bandwidth occupancy data, memory occupancy rate, idle computing power and the like.

Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside.

The data processing method in the embodiment of the application can be applied to cloud technology scenes, such as processing data in cloud application, storing the data in a cloud storage mode, and the like.

The following briefly describes the design concept of the embodiment of the present application:

As shown in fig. 1, a schematic diagram of a data distribution manner in the related art provided by the embodiment of the present application is shown, as the diagram shows, data is distributed to each working node with different computing power in a service cluster to be processed in an average distribution manner, so that the working node with high computing power can complete a data processing task faster, and then after waiting for the working node with low computing power to complete the task, all data processing in the present stage is completed, so that the data processing in the next stage can be continued, the speed of data processing is slowed down, and the resources of the working node with high computing power cannot be fully utilized. Furthermore, because the working node with too low calculation power needs to be subjected to offline processing, the related technology detects the execution condition of the working node at intervals by means of a heartbeat packet and the like, when the too low calculation power of a certain working node is detected, the management node is informed of the need of offline operation of the working node by calling an interface of the management node, after receiving the notification, the management node is informed of the fact that the working node is ready to be offline, and the task is not distributed to the working node to be executed, the working node is not formally offline until the data stored on the working node is completed, the too low calculation power is started due to the occurrence of calculation power fluctuation of the working node, and the time is long until the node is formally offline.

Based on the above, the embodiment of the application provides a data processing method, a device, an electronic device and a storage medium, because the application can detect the calculation power of each working node by acquiring the calculation power information of each working node in real time before the task to be executed and the data subset are distributed, and select the target working node for data processing from each working node according to the obtained calculation power information of the working node, and not distribute the task to be executed for the working node with too low calculation power, thereby improving the data processing speed, and avoiding the situation that the working node is detected and the working node is informed in a downlink after the calculation power of the working node is too low and the data subset is distributed in the related art. Further, the tasks to be executed are dynamically allocated according to the calculation power of each target working node, and the number of the data subsets is dynamically allocated to each task to be allocated according to the calculation power of each target working node, so that the speed of the whole data processing is prevented from being dragged by the target working node with low calculation power, the full utilization of resources is realized for the target working node with high calculation power, and the rationality of data allocation is improved.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and embodiments of the present application and features of the embodiments may be combined with each other without conflict.

Fig. 2 is a schematic diagram of an application scenario according to an embodiment of the present application. The application scenario diagram is a service cluster comprising a management node 210 and a plurality of working nodes 220.

For example, the service cluster is a hybrid cluster for hybrid deployment of online applications and offline applications, and the hybrid deployment can fully utilize the resources of the cluster and improve the utilization rate of the resources, so that the service cluster gradually becomes a trend of application and cluster management. The online application in the hybrid cluster has high response delay requirement, so that the resources (especially CPU computing power) of the offline application can be preempted by the high-priority online application because the online application has the priority of using the resources. As the load of the online application changes, the CPU power available to the offline application fluctuates relatively more. Under such a scenario, resources of each application in the related art are still allocated according to a preset mode, dynamic adaptation of the resources cannot be performed according to actual conditions, and further serious calculation long tail problems may be caused, so that overall data processing speed is affected.

It should be noted that the above hybrid clusters are only examples, and the present application is equally applicable to other clusters.

In the embodiment of the present application, the management node 210 is a node with management capability, and the working node 220 is a node with operation capability.

Optionally, the management node 210 and the working node 220 may be terminal devices, including but not limited to mobile phones, tablet computers, notebook computers, desktop computers, electronic book readers, intelligent voice interaction devices, intelligent home appliances, vehicle terminals, and the like; the terminal device may be provided with a data processing related client, which may be software (e.g. a browser, data processing software, etc.), or may be a web page, applet, etc.

Optionally, the management node 210 and the working node 220 may also be servers for data processing. The cloud server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDNs), basic cloud computing services such as big data and artificial intelligent platforms.

Optionally, in the management node 210 and the working node 220, part of the nodes may be terminal devices, and part of the nodes may be servers, which is not specifically limited herein.

It should be noted that, the data processing method in the embodiments of the present application may be performed by an electronic device, which may be the management node 210. For example, when executed by the management node 210, one to-be-executed plan is that an object using the application program a and an object using the application program B need to be retrieved, and the management node 210 divides the to-be-executed plan into two to-be-executed phases, wherein the first to-be-executed phase is that of retrieving the object using the application program a, and the second to-be-executed phase is that of retrieving the object using the application program a and the object using the application program B on the basis of the first to-be-executed phase. Taking the first stage to be executed as an example, the management node 210 divides the first stage to be executed into a plurality of tasks to be executed. The management node 210 obtains respective calculation power information of each candidate working node 220 in the service cluster, determines the candidate working node 220 with the CPU calculation power meeting the preset calculation power threshold condition as the target working node 220 based on the obtained calculation power information, and executes the offline operation if the calculation power is not met.

Thereafter, the management node 210 schedules the plurality of tasks to be performed to at least one target work node 220. For each task to be executed, the management node 210 allocates a certain number of data subsets to the target working node 220 where the task to be executed is located based on the current CPU computing power of the target working node 220, and the number of the data subsets and the current CPU computing power of the target working node 220 meet a preset proportional relation. After the allocation is completed, the management node 210 detects the fluctuation condition of the CPU computing power of each target working node 220 in real time, and if it detects that the CPU computing power of a certain target working node 220 is reduced, the number of data subsets waiting to be processed on the node and the current CPU computing power of the node do not satisfy the preset proportional relation, the data subsets waiting to be processed on the node are redistributed.

In an alternative embodiment, the management node 210 and the working node 220 may communicate via a communication network.

In an alternative embodiment, the communication network is a wired network or a wireless network.

It should be noted that, the number of terminal devices and servers shown in fig. 2 is merely illustrative, and the number of terminal devices and servers is not limited in practice, and is not particularly limited in the embodiment of the present application.

In the embodiment of the application, when the number of the servers is multiple, the multiple servers can be formed into a blockchain, and the servers are nodes on the blockchain; the data processing method disclosed in the embodiment of the application can store the related data of the related data processing on the blockchain, such as node calculation force information, data subsets and the like.

In addition, the embodiment of the application can be applied to various scenes, including not only data processing scenes, but also cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and other scenes.

The data processing method provided by the exemplary embodiments of the present application will be described below with reference to the accompanying drawings in conjunction with the application scenarios described above, and it should be noted that the application scenarios described above are only shown for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in any way in this respect.

Referring to fig. 3, a flowchart of an implementation of a data processing method according to an embodiment of the present application is shown, taking a management node as an execution body as an example, and the specific implementation flow of the method is as follows S301 to S3022:

S301: the management node divides the plan to be executed into at least one stage to be executed, and each stage to be executed comprises a plurality of parallel tasks to be executed.

Each stage to be executed corresponds to a plurality of data subsets, and each data subset comprises a plurality of data.

For example, a service cluster includes a plurality of nodes, each node is a server, the service cluster architecture includes a management node and a plurality of working nodes, the management node is responsible for receiving, processing, distributing and returning results of tasks to be executed, and manages all working nodes in the architecture.

The management node divides the plan to be executed into at least one stage to be executed according to the specific content of the plan to be executed, each stage to be executed needs to be processed in sequence according to a preset sequence, each stage to be executed can be divided into a plurality of tasks to be executed, and each stage to be executed corresponds to a plurality of data subsets during execution.

Taking a specific scenario as an example, assuming that an existing program to be executed is that a male object with an age above 30 years old using an application program a is retrieved in a preconfigured data set, the management node may divide the program to be executed into three phases to be executed, a first execution phase Stage1 is an object using the application program a, a second execution phase Stage2 is an object with an age above 30 years old retrieving the application program a based on an execution result of Stage1, and a third execution phase Stage3 is an male object with an age above 30 years old retrieving the application program a based on an execution result of Stage 2. Taking the first execution Stage1 as an example, the management node divides the Stage1 into a plurality of parallel tasks Task1, task2, task3 and Task4 to be executed. The preconfigured data set includes a plurality of data subsets Split1, split2, … … and Splitn, the data included in the data subset corresponding to Stage1 and Stage2 is the object selected by Stage1 and using application program a, and the like.

S302: and when each to-be-executed stage is processed, the management node acquires the respective calculation power information of each candidate working node in the service cluster.

When each to-be-executed stage is processed, the management node needs to acquire the calculation power information of the working nodes, specifically, all the current working nodes in the service cluster can be used as candidate working nodes, and when each to-be-executed stage is processed, before a task to be executed is scheduled, the management node can acquire the CPU calculation power condition of all the working nodes in real time by calling an interface for calculation power detection.

Assume that the nodes in the current service cluster are servers, where server a is a management node, server B, C, D, E, F, G is all working nodes, and the CPU computing power obtained by server a to server B, C, D, E, F, G is P1, P2, P3, P4, P5, and P6, respectively.

In addition, at each stage to be executed, the management node performs the following operations:

s3021: based on the obtained calculation force information, the management node selects at least one target working node from the candidate working nodes, and dispatches a plurality of tasks to be executed to the at least one target working node.

In the step, the candidate working node with relatively high computational power can be selected as the target working node according to the computational power information of each candidate working node, so that the task to be executed and the data subset are distributed to the target working node, the task to be executed and the data subset are prevented from being distributed to the candidate working node with excessively low computational power, and the overall speed of data processing is further slowed down.

Optionally, the manner of selecting the target working node from the candidate working nodes is as follows:

selecting a first mode, and respectively executing the following operations by the management node on each candidate working node:

And if the current calculation power information of the candidate working node is not lower than the preset threshold value, taking the candidate working node as the target working node.

The preset threshold value may be a CPU calculation threshold value set in advance based on an actual application condition, and the management node may select the target working node based on a comparison result between the CPU calculation power of each candidate working node and the preset threshold value.

And selecting a second mode, sorting each candidate working node based on the current calculation power information of each candidate working node, and taking the candidate working node positioned in the specified order range as a target working node.

For example, the candidate working nodes may be sorted from high to low based on the current CPU power level of each candidate working node, and the candidate working node located in the first m bits may be selected as the target working node.

And then, the management node executes the offline operation on the rest candidate working nodes except at least one target working node in the candidate working nodes. And then, the management node distributes tasks to be executed to each target working node. As shown in fig. 4, a schematic diagram of task allocation to be performed is provided in an embodiment of the present application, where a management node detects a computing power situation of a working node, and for a working node with a computing power lower than a preset threshold, the management node does not allocate the task to be performed, and for a working node with a computing power higher than the preset threshold, the management node allocates the task to be performed.

Along the assumptions in S301 and S302, an optional implementation manner is to preset an calculation threshold, as shown in fig. 5, which is a logic diagram for selecting a target working node provided by the embodiment of the present application, and assume that the CPU calculation force of the candidate working node B, C, D, F is higher than the calculation threshold, the CPU calculation force of the candidate working node G is equal to the calculation threshold, and the CPU calculation force of the candidate working node E is lower than the calculation threshold, then for the candidate working node B, C, D, F, G whose CPU calculation force is not lower than the preset threshold, the management node a selects it as the target working node, and performs the offline operation on the candidate working node E whose CPU calculation force is lower than the calculation threshold.

In another alternative implementation manner, for all candidate working nodes B, C, D, E, F, G, sorting is performed according to the obtained CPU computing forces P1, P2, P3, P4, P5, P6, and the sorting results are assumed to be P1, P2, P5, P3, P6, P4 in order from big to small, and assuming that the preset specified order range is the first four bits, the management node a selects the candidate working node B, C, D, F as a target working node, and performs the offline operation on the candidate working node E, G.

Then, the management node a dispatches a plurality of parallel tasks to be executed Task1, task2, task3, task4 of Stage1 to the target working nodes, assuming that the target working node is B, C, D, F, an optional implementation allocates one Task to be executed to each target working node, for example, the management node a allocates the Task to be executed Task1 to the target working node B, the Task to be executed Task2 to the target working node C, the Task to be executed Task3 to the target working node D, and the Task to be executed Task4 to the target working node F.

In addition, in the specific implementation process of distributing the tasks to be executed, the tasks to be executed can be adaptively scheduled according to the actual computing power of the CPU of the target working node, for example, the distribution quantity of the tasks to be executed is increased for the target working node with high computing power, the distribution quantity of the tasks to be executed is reduced for the target working node with low computing power, and the tasks to be executed can be dynamically distributed according to the computing power condition of the CPU. The present application is not particularly limited.

S3022: the management node respectively executes the following operations for each task to be executed: based on the current computing power information of a target working node associated with a task to be executed, selecting a corresponding number of data subsets from a plurality of data subsets corresponding to the current stage to be executed, and distributing the data subsets to the target working node for data processing.

Specifically, the management node selects a first number of data subsets from a plurality of data subsets corresponding to a current to-be-executed stage based on current calculation power information of a target working node associated with a to-be-executed task, and distributes the first number of data subsets to the target working node for data processing, wherein the first number and the current calculation power information of the target working node meet a preset proportional relation.

Because a stage to be executed is divided into a plurality of parallel tasks to be executed and distributed on each target working node, the data subsets are input of each task to be executed, and the end of the stage to be executed is determined only when all the data subsets corresponding to the stage to be executed are processed, so that the execution time of one stage to be executed is determined by the target working node with the slowest data processing. In the related art, the data subsets are often distributed in an average distribution manner, so that the data subsets cannot be distributed dynamically according to the CPU computing power condition of the target working node, for example, the target working node with high CPU computing power obtains more data subsets, and the target working node with lower CPU computing power obtains fewer data subsets. The target working node with lower CPU computing power is caused to finish the task slowly, and the completion speed of the whole plan to be executed is dragged slowly.

In order to solve the defects, the management node can sense the CPU calculation power conditions of all the target working nodes in real time, and allocate the data subsets according to the CPU calculation power conditions as required, so that the influence of the target working nodes with low calculation power is avoided, the CPU calculation power of each target working node is fully utilized, and the resource utilization rate is improved. Each target working node dispatches the data subsets to specific tasks to be executed for calculation, the data subsets are distributed proportionally according to the CPU computing power, each task to be executed distributes a first number of data subsets, and the specific calculation mode can be as follows: the number of data subsets allocated per task to be performed = CPU power/power required for a single data subset of the target working node where the task to be performed is located. Thereby enabling adaptive scheduling of the data subsets.

As shown in fig. 6, a schematic diagram of data subset allocation is provided for an embodiment of the present application, where a management node detects a computing power condition of a working node, and for a working node with high computing power, the number of data subsets allocated by the working node with high computing power is more, and for a working node with general computing power, the number of data subsets allocated by the working node is less, so that the working node that has been offline before does not need to be allocated.

Following the assumption in S3021, an alternative implementation is:

The number of data subsets allocated by Task1 to be executed=cpu calculation power P1 of target working node B/calculation power required for a single data subset; the number of data subsets allocated by Task 2to be executed=cpu computing power P2 of target working node C/computing power required for a single data subset; the number of data subsets allocated by Task3 to be executed=cpu computing power P3 of target working node D/computing power required for a single data subset; the number of data subsets allocated by Task 4to be executed = CPU calculation power P5 of target working node F/calculation power required for a single data subset.

The present disclosure is exemplified by the fact that the calculation forces required by the single data subsets are the same, and because the CPU calculation force of the target working node has a relationship of B > C > F > D, the number of data subsets of the Task to be executed Task1 corresponding to the target working node B is the largest, then C, then F, and the number of data subsets of the Task to be executed Task3 corresponding to the target working node D is the smallest.

After the data subsets are distributed, the CPU computing power may be affected by external environment, flow peak and the like, fluctuation occurs, for example, the computing power of a target working node with high CPU computing power is attenuated before, the situation that the CPU computing power of the target working node is not matched with the number of the data subsets waiting to be processed occurs, further the speed of completing tasks of the target working node is reduced, and the completion speed of the whole plan to be executed is reduced.

To solve the above problem, after the data subset allocation is completed, the management node in the embodiment of the present application performs the following operations for each target working node:

Detecting the execution condition of the distributed data subsets by a target working node every other preset period, and obtaining a corresponding detection result.

Or detecting the execution condition of the distributed data subset by a target working node in real time, and obtaining a corresponding detection result.

If the detection result corresponding to the target working node represents that the number of the data subsets to be processed in the target working node is not matched with the current calculation power information of the target working node, the data subsets to be processed in the target working node are used as the data subsets to be distributed; and reassigning a second number of data subsets in the data subsets to be assigned and tasks to be executed corresponding to the second number of data subsets from the previous target working nodes to other target working nodes based on the current calculation force information of each target working node, so that after reassigning, the number of the data subsets of the previous target working nodes of the assigned data subsets and the current corresponding calculation force information meet a preset proportion relation, and the number of the data subsets of the other target working nodes and the current corresponding calculation force information meet a preset proportion relation. The rescheduled pseudocode may be:

it should be noted that the above pseudo code is only illustrative, and any manner that can implement rescheduling in the present application is applicable to the present application, and the present application is not limited in particular.

In the above, the specific determining process of the second number is: and determining a second number based on the current calculation power information of each target working node, the multiple relation between the calculation power information and calculation power required by one data subset, and the number of the data subsets to be processed in each target working node.

For example, the second number may be: the number of data subsets to be processed in each target working node-the current CPU power per data subset required power per target working node.

In addition, the data subset to be processed and the task to be executed where the data subset to be processed is located can be distributed to other target working nodes together.

Along S3022, after the data subset allocation is completed, the management node a detects the execution condition of each target working node in real time, and obtains a corresponding detection result, if the management node a detects that the current CPU computing power P2 of the target working node C fluctuates, the CPU computing power is greatly reduced, and is not matched with the current data subset to be processed of the working node C, the management node a takes the current data subset to be processed of the working node C as the data subset to be allocated, and obtains the second number of the data subsets to be reallocated by the working node C by calculating the number of the data subsets to be allocated of the working node c—the current CPU computing power of the working node C/the computing power required by one data subset, and reallocates the data subsets to be reallocated to other target working nodes by the working node C.

For example, the current working node B, F may accommodate the redundant data subsets, and the management node a may allocate the data subsets to be reallocated to the working node B or the working node F, or allocate the data subsets to the working node B and the working node F, after the reallocation, the number of the remaining data subsets of the working node C and the current CPU computing power of the working node C satisfy a preset proportional relationship, and the number of the data subsets of the working node B, F and the respective current CPU computing power thereof satisfy a preset proportional relationship.

An optional preset scaling relationship is: after the allocation is completed, the number of the current data subsets of the target working node is less than or equal to the current CPU calculation power of the target working node/calculation power required by a single data subset.

It should be noted that the specific content of the above-mentioned preset proportional relation is only one possible scheme, and the present application is not limited specifically.

In addition, if it is determined that the current power information of the target working node is lower than the preset threshold, the offline operation may be performed on the target working node. And when the management node detects in the follow-up detection, if the computing power of the target working node is detected to be restored to the preset threshold value or more, the node can be on line again.

Referring to fig. 7, a specific flowchart of data processing provided by an embodiment of the present application is shown, and the following steps are executed by a management node:

Step 701: the management node divides the plan to be executed into at least one phase to be executed.

Step 702: the management node divides each stage to be executed into a plurality of tasks to be executed.

As shown in fig. 8, in particular, an execution schematic diagram of a to-be-executed stage is provided in an embodiment of the present application, a management node divides a to-be-executed plan into at least one to-be-executed stage according to specific content of the to-be-executed plan, each to-be-executed stage may be divided into a plurality of to-be-executed tasks, and a subsequent management node may execute one to-be-executed stage by allocating the plurality of to-be-executed tasks to a working node.

Step 703: and the management node requests to acquire the respective calculation power information of each candidate working node in the service cluster in each stage to be executed.

Step 704: the management node determines a target working node server in candidate working node servers based on the calculation power information in each stage to be executed

Step 705: and the management node dispatches a plurality of tasks to be executed of each execution stage to at least one target working node in each execution stage.

Step 706: and the management node distributes a first number of data subsets for each task to be executed based on the current computing power information of the target working node server where each task to be executed is located in each stage to be executed.

Step 707: and the management node detects the calculation power information of each target working node server in each stage to be executed, and if the calculation power of a certain target working node server is detected to be reduced, the number of the data subsets waiting to be processed by the node is not matched with the calculation power information, the data subsets waiting to be processed on the node are redistributed.

As shown in fig. 9, a schematic diagram of data subset reassignment is provided for an embodiment of the present application, for the data subsets already assigned to the working node 1 and the working node 2, if the management node detects that the computing power of the working node 1 and the working node 2 is reduced subsequently, the management node may reassign the data subsets of the working node 1 and the working node 2, for example, in the figure, the data subsets are reassigned to the working node 3.

In summary, as shown in fig. 10, a schematic diagram of task to be executed and data subset allocation is provided in an embodiment of the present application, where a management node may invoke a computing power detection interface to obtain computing power conditions of each working node in real time or periodically, allocate the task to be executed and the data subset according to computing power information of the working node, and further adaptively adjust allocation of the data subset to each working node. As shown in fig. 11, a schematic diagram of a calculation force change of a working node provided by an embodiment of the present application, in which the calculation force of the working node with high original calculation force fluctuates and becomes a general calculation force, and the distribution of a data subset of the working node is adaptively reduced when the management node detects that the calculation force changes; the original calculation force of the general working node is fluctuated and becomes high, and the management node detects the change of the calculation force and adaptively increases the distribution of the data subset of the working node.

The method solves the problem that when the task to be executed is executed on the working node with larger CPU power fluctuation, obvious calculation long tail is caused, and the whole plan to be executed is dragged slowly. Fig. 12 is a schematic diagram of an execution situation of a task to be executed in a native MPP architecture according to an embodiment of the present application, where each working node is a server, and fig. 12 shows a predicted total amount of data lines that each working node needs to query, and a speed of querying the data lines (query amount per second); also shown are the number of bytes actually queried by each work node, and the speed of querying bytes (query per second), as well as the time taken to complete the execution task, from which it can be seen that the time required for each work node to complete the execution task varies greatly.

In the application, the task to be executed can avoid being scheduled to the node with poor CPU computing power, and effectively eliminate the long tail problem. As shown in fig. 13, a schematic diagram of an execution situation of a task to be executed under dynamic scheduling according to an embodiment of the present application may be used to see that time spent by each working node to complete the task is more balanced, so as to avoid the long tail problem from affecting the performance of the whole plan to be executed.

In addition, in the data processing manner provided by the application, the data subsets are proportionally distributed according to the CPU computing power condition of the working nodes, as shown in fig. 14, which is a schematic diagram of the scheduling condition of the CPU capability of each working node under the dynamic scheduling provided by the embodiment of the application, and the management node distributes the data subsets with different numbers for each working node based on the CPU computing power of each working node, so that the computing power of each working node is fully exerted.

As shown in fig. 15, a time-consuming comparison graph provided by the embodiment of the present application performs performance test comparison on the power-aware adaptive computing framework, the dynamically scheduled data processing result, and the data processing result of the native computing framework provided by the present application at different time points based on the load of the same cluster. It can be seen that the data processing manner provided by the application improves the computing performance (P90 time consuming) by 2 times. Wherein, single request response time is arranged from small to large, and the value in the sequence of 90% positions is the P90 value.

Based on the same inventive concept, the embodiment of the application also provides a data processing device. As shown in fig. 16, which is a schematic structural diagram of the data processing apparatus, may include:

A splitting unit 1601, configured to divide a plan to be executed into at least one phase to be executed, where each phase to be executed includes a plurality of parallel tasks to be executed; wherein each phase to be executed corresponds to a plurality of data subsets;

The allocation unit 1602 is configured to obtain, when each stage to be executed is processed, respective computing power information of each candidate working node in the service cluster, and perform the following operations:

Selecting at least one target working node from the candidate working nodes based on the obtained calculation force information, and scheduling a plurality of tasks to be executed to the at least one target working node;

For each task to be executed, the following operations are executed respectively: based on the current computing power information of a target working node associated with a task to be executed, selecting a corresponding number of data subsets from a plurality of data subsets corresponding to the current stage to be executed, and distributing the data subsets to the target working node for data processing.

Optionally, the allocation unit 1602 is specifically configured to:

Based on the current calculation power information of a target working node associated with a task to be executed, selecting a first number of data subsets from a plurality of data subsets corresponding to a current stage to be executed, and distributing the first number of data subsets to the target working node for data processing, wherein the first number and the current calculation power information of the target working node meet a preset proportional relation.

Optionally, the apparatus further comprises:

The detecting unit 1603 is configured to select a corresponding number of data subsets from a plurality of data subsets corresponding to a current stage to be executed, allocate the data subsets to target working nodes for data processing, and then respectively detect execution conditions of each target working node on the allocated data subsets, and obtain corresponding detection results;

and reassigning the data subsets to be processed in the assigned data subsets and the tasks to be executed corresponding to the data subsets to be processed based on the obtained detection results and the current calculation power information of each target working node.

Optionally, the detecting unit 1603 specifically is configured to:

For each target working node, the following operations are performed:

If the detection result corresponding to the target working node represents that the number of the data subsets to be processed in the target working node is not matched with the current calculation power information of the target working node, the data subsets to be processed in the target working node are used as the data subsets to be distributed;

and reassigning the acquired data subsets to be assigned and the tasks to be executed corresponding to the data subsets to be assigned based on the current calculation power information of each target working node.

Optionally, the detecting unit 1603 specifically is configured to:

Based on the current calculation power information of each target working node, reassigning a second number of data subsets in the data subsets to be assigned and tasks to be executed corresponding to the second number of data subsets from the previous target working nodes to other target working nodes, so that after reassigning:

The number of the data subsets of the target working nodes in front meets the preset proportional relation with the current corresponding calculation force information, and the number of the data subsets of other target working nodes meets the preset proportional relation with the current corresponding calculation force information.

Optionally, the detecting unit 1603 is specifically configured to determine the second number by:

And determining a second number based on the current calculation power information of each target working node, the multiple relation between the calculation power information and calculation power required by one data subset, and the number of the data subsets to be processed in each target working node.

Optionally, the allocation unit 1602 is specifically configured to:

Optionally, the allocation unit 1602 is further configured to:

And if the current calculation power information of one target working node is determined to be lower than the preset threshold value, executing the offline operation on the one target working node.

Optionally, the detecting unit 1603 specifically is configured to:

For each target working node, the following operations are performed:

For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.

Having described the data processing method and apparatus of an exemplary embodiment of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.

Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

The embodiment of the application also provides electronic equipment based on the same conception as the embodiment of the method. In one embodiment, the electronic device may be a server. In this embodiment, the electronic device may be configured as shown in fig. 17, including a memory 1701, a communication module 1703, and one or more processors 1702.

A memory 1701 for storing computer programs for execution by the processor 1702. The memory 1701 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a program required for running an instant messaging function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.

The memory 1701 may be a volatile memory (RAM) such as a random-access memory (RAM); the memory 1701 may also be a nonvolatile memory (non-volatile memory), such as a read-only memory, a flash memory (flash memory), a hard disk (HARD DISK DRIVE, HDD) or a solid state disk (solid-state drive (SSD); or any other medium that can be used to carry or store a desired computer program in the form of instructions or data structures and that can be accessed by a computer, without limitation. The memory 1701 may be a combination of the above.

The processor 1702 may include one or more central processing units (central processing unit, CPUs) or digital processing units, or the like. Processor 1702 is configured to implement the data processing method described above when calling the computer program stored in memory 1701.

The communication module 1703 is used for communicating with a terminal device and other servers.

The specific connection medium between the memory 1701, the communication module 1703 and the processor 1702 is not limited to the above embodiments of the present application. The embodiment of the present application is illustrated in fig. 17 by a bus 1704 between the memory 1701 and the processor 1702, and the bus 1704 is illustrated in fig. 17 by a bold line, and the connection between other components is merely illustrative and not limiting. The bus 1704 may be classified as an address bus, a data bus, a control bus, or the like. For ease of description, only one thick line is depicted in fig. 17, but only one bus or one type of bus is not depicted.

The memory 1701 stores therein a computer storage medium having stored therein computer executable instructions for implementing the data processing method of the embodiment of the present application. The processor 1702 is configured to perform the data processing method described above, as shown in fig. 3.

In another embodiment, the electronic device may also be other electronic devices. In this embodiment, the structure of the electronic device may include, as shown in fig. 18: communication component 1810, memory 1820, display unit 1830, camera 1840, sensor 1850, audio circuitry 1860, bluetooth module 1870, processor 1880, and the like.

The communication component 1810 is for communicating with a server. In some embodiments, a circuit wireless fidelity (WIRELESS FIDELITY, WIFI) module may be included, where the WiFi module belongs to a short-range wireless transmission technology, and the electronic device may help an object (such as a user) to send and receive information through the WiFi module.

Memory 1820 may be used for storing software programs and data. The processor 1880 performs various functions of the electronic device and data processing by executing software programs or data stored in the memory 1820. Memory 1820 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The memory 1820 stores an operating system that enables the electronic device to operate. The memory 1820 of the present application may store an operating system and various application programs, and may also store computer programs for performing the data processing methods of the embodiments of the present application.

The display unit 1830 may also be used to display information input by an object or information provided to an object and a graphical object interface (GRAPHICAL USER INTERFACE, GUI) of various menus of the electronic device. In particular, the display unit 1830 may include a display screen 1832 disposed on a front of the electronic device. The display 1832 may be configured in the form of a liquid crystal display, light emitting diodes, or the like. The display unit 1830 may be used to display a data processing interface or the like in the embodiment of the present application.

The display unit 1830 may also be used to receive input numeric or character information, generate signal inputs related to object settings and function control of the electronic device, and in particular, the display unit 1830 may include a touch screen 1831 disposed on the front of the electronic device, and may collect touch operations on or near the object, such as clicking buttons, dragging scroll boxes, and the like.

The touch screen 1831 may cover the display screen 1832, or the touch screen 1831 may be integrated with the display screen 1832 to implement input and output functions of the electronic device, and after integration, the touch screen may be simply referred to as a touch screen. The display unit 1830 may display an application program and corresponding operation steps in the present application.

The camera 1840 may be used to capture still images and the subject may post the images captured by the camera 1840 through the application. The camera 1840 may be one or more. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive elements convert the optical signals to electrical signals, which are then passed to a processor 1880 for conversion to digital image signals.

The electronic device may also include at least one sensor 1850, such as an acceleration sensor 1851, a distance sensor 1852, a fingerprint sensor 1853, a temperature sensor 1854. The electronic device may also be configured with other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, light sensors, motion sensors, and the like.

Audio circuitry 1860, speaker 1861, microphone 1862 may provide an audio interface between the object and the electronic device. The audio circuit 1860 may transmit the received electrical signal converted from audio data to the speaker 1861, and may be converted into a sound signal by the speaker 1861 for output. The electronic device may also be configured with a volume button for adjusting the volume of the sound signal. On the other hand, microphone 1862 converts the collected sound signals into electrical signals, which are received by audio circuitry 1860 and converted into audio data, which are output to communication component 1810 for transmission to, for example, another electronic device, or to memory 1820 for further processing.

The bluetooth module 1870 is used for exchanging information with other bluetooth devices having a bluetooth module through a bluetooth protocol. For example, the electronic device may establish a bluetooth connection with a wearable electronic device (e.g., a smart watch) that also has a bluetooth module through bluetooth module 1870, thereby performing data interaction.

The processor 1880 is a control center of the electronic device, connects various parts of the overall electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs stored in the memory 1820, and invoking data stored in the memory 1820. In some embodiments, the processor 1880 may include one or more processing units; the processor 1880 may also integrate an application processor that primarily handles operating systems, object interfaces, applications, etc., and a baseband processor that primarily handles wireless communications. It will be appreciated that the baseband processor described above may not be integrated into the processor 1880. The processor 1880 of the present application may run an operating system, application programs, object interface displays and touch responses, as well as data processing methods of embodiments of the present application. In addition, the processor 1880 is coupled to a display unit 1830.

In some possible embodiments, aspects of the data processing method provided by the present application may also be implemented in the form of a program product comprising a computer program for causing an electronic device to carry out the steps of the data processing method according to the various exemplary embodiments of the application described in the present specification when the program product is run on the electronic device, for example, the electronic device may carry out the steps as shown in fig. 3.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may take the form of a portable compact disc read only memory (CD-ROM) and comprise a computer program and may be run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer programs for performing the operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer program may execute entirely on the subject electronic device, partly on the subject electronic device, as a stand-alone software package, partly on the subject electronic device and partly on a remote electronic device or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic device may be connected to the subject electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., connected through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having a computer-usable computer program embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program commands may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the commands executed by the processor of the computer or other programmable data processing apparatus produce means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program commands may also be stored in a computer readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the commands stored in the computer readable memory produce an article of manufacture including command means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of data processing, the method comprising:

2. The method of claim 1, wherein selecting a corresponding number of data subsets from the plurality of data subsets corresponding to the current stage to be performed based on current computational power information of a target working node associated with a task to be performed, and distributing the data subsets to the target working node for data processing, comprises:

3. The method of claim 1, wherein selecting a corresponding number of data subsets from the plurality of data subsets corresponding to the current phase to be performed and distributing to the target working node for data processing, further comprises:

detecting the execution condition of each target working node on the allocated data subset respectively, and obtaining a corresponding detection result;

4. The method of claim 3, wherein reassigning the subset of data to be processed among the assigned subsets of data and the task to be performed corresponding to the subset of data to be processed based on the obtained respective detection results and the respective current computing power information of the respective target working nodes comprises:

For each target working node, the following operations are performed:

5. The method of claim 4, wherein reassigning the acquired subset of data to be assigned and the task to be performed corresponding to the subset of data based on the respective current computing power information of the target working nodes comprises:

6. The method of claim 5, wherein the second number is determined by:

7. The method of any one of claims 1-6, wherein selecting at least one target work node from the candidate work nodes based on the obtained respective computational force information comprises:

8. The method of claim 7, wherein the method further comprises:

And executing the offline operation on the rest candidate working nodes except for the at least one target working node in the candidate working nodes.

9. The method of claim 4, wherein the reassigning the subset of data to be processed and the task to be performed corresponding to the subset of data to be processed based on the respective current computing power information of the target working nodes further comprises:

10. A method according to claim 3, wherein said separately detecting execution of the assigned subset of data by each target worker node comprises:

For each target working node, the following operations are performed:

11. A data processing apparatus, comprising:

12. The apparatus of claim 11, wherein the apparatus further comprises:

13. An electronic device comprising a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 10.

14. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to perform the steps of the method according to any one of claims 1-10 when said computer program is run on the electronic device.

15. A computer program product comprising a computer program, the computer program being stored on a computer readable storage medium; when the computer program is read from the computer readable storage medium by a processor of an electronic device, the processor executes the computer program, causing the electronic device to perform the steps of the method of any one of claims 1-10.