CN109144731B

CN109144731B - Data processing method, device, computer equipment and storage medium

Info

Publication number: CN109144731B
Application number: CN201811010232.XA
Authority: CN
Inventors: 高越
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2024-08-09
Anticipated expiration: 2038-08-31
Also published as: CN109144731A

Abstract

The invention discloses a data processing method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: the central server acquires basic task information sent by the client, segments basic task data in a modular operation mode, and sends segmented data and processing instructions to the node server as target task information for execution, so that tasks with large data quantity are reasonably distributed to the distributed node servers for execution, and the data processing efficiency is improved. After receiving the target task information, the node server calculates a processable task number threshold according to the running state of the current node server, segments the task exceeding the task number threshold, executes the segmented task data in a thread pool mode, and returns an execution result to the central server, so that the node server can dynamically set the task number threshold according to the running state of the node server and process the task by using multithreading, and stability and processing efficiency in a data processing process are improved.

Description

Data processing method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method, a data processing device, a computer device, and a storage medium.

Background

With the development of society and the improvement of living standard of people, more and more people begin to pay attention to various insurance services, the service area related to the insurance field is wider, and with more and more records in an insurance policy system, the data volume in the insurance policy system becomes extremely huge.

In a policy system, due to huge policy data, various instruction processing is often required for data above the millions, and the instruction processing is generally completed by calling a storage process to execute batch tasks. Such as batch processing tasks in a billing module for renewal payment in a policy system.

During execution, the batch task mainly queries data in a large table of the database, and then circularly calculates according to the queried large batch data, and certain large table data of the database is queried when a certain piece of data is calculated, and the calculation still needs a long time even in a high-performance storage process, and the calculation of the storage process data with millions of data volume usually takes about 3 hours, so that the processing efficiency is low when the batch task with the large data volume is processed.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data processing method, apparatus, computer device, and storage medium, so as to solve the problem of low processing efficiency in the current data processing of large data volume.

A data processing method comprising the steps performed by a central server of:

acquiring basic task information sent by a client, wherein the basic task information comprises basic task data and processing instructions corresponding to the basic task data;

Dividing the basic task data into N pieces according to a preset dividing number N by adopting a mode of modular operation to obtain N pieces of basic dividing data, wherein N is a positive integer;

taking the basic fragment data and the processing instruction as target task information;

Selecting a target node server from a node server set according to a preset load balancing distribution mode, and distributing the target task information to the target node server for execution, wherein the node server set comprises a preset number of node servers;

receiving an execution result sent by the target node server, and summarizing the execution result to obtain a target result;

And sending the target result to the client.

A data processing method comprising the steps performed by a node server of:

receiving target task information sent by a central server, wherein the target task information comprises basic fragment data and a processing instruction;

calculating the number of data records of the basic fragment data;

obtaining a central processor model I ₁, a memory model I ₂, a current utilization rate P ₁ of the central processor and a current utilization rate P ₂ of the memory, and calculating a processable number threshold S according to the following formula:

Wherein J ₁ is a preset weight corresponding to the cpu model I ₁, and J ₂ is a preset weight corresponding to the memory model I ₂;

If the number of the target fragments is larger than the threshold value of the number of the target fragments, dividing the basic fragment data according to a preset dimension to obtain K target fragment data, wherein K is a positive integer;

Executing the processing instruction on the K target fragment data by using a thread pool mode to obtain an execution result;

And sending the execution result to the central server.

A data processing apparatus comprising a central server, the central server comprising:

The system comprises a data acquisition module, a client and a client, wherein the data acquisition module is used for acquiring basic task information sent by the client, and the basic task information comprises basic task data and processing instructions corresponding to the basic task data;

the data slicing module is used for slicing the basic task data according to a preset slicing number N in a mode of modular arithmetic to obtain N basic slicing data, wherein N is a positive integer;

The task generation module is used for taking the basic fragment data and the processing instruction as target task information;

The task allocation module is used for selecting a target node server from a node server set according to a preset load balancing allocation mode, and allocating the target task information to the target node server for execution, wherein the node server set comprises a preset number of node servers;

The result acquisition module is used for receiving the execution result sent by the target node server and summarizing the execution result to obtain a target result;

and the result sending module is used for sending the target result to the client.

A data processing apparatus comprising a node server, the node server comprising: the task receiving module is used for receiving target task information sent by the central server, wherein the target task information comprises basic fragment data and a processing instruction;

The number counting module is used for calculating the number of the data records of the basic fragment data;

The threshold calculating module is configured to obtain a central processor model I ₁, a memory model I ₂, a current usage rate P ₁ of the central processor, and a current usage rate P ₂ of the memory, and calculate a processable number threshold S according to the following formula:

the data segmentation module is used for segmenting the basic segmented data according to a preset dimension to obtain K target segmented data if the number of the segments is larger than the threshold value of the number of the segments, wherein K is a positive integer;

The data processing module is used for executing the processing instructions on the K pieces of target fragment data in a thread pool mode to obtain an execution result;

and the result transmission module is used for transmitting the execution result to the central server.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the data processing method described above when the computer program is executed.

A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the data processing method described above.

According to the data processing method, the device, the computer equipment and the storage medium, the central server acquires the basic task information sent by the client, and segments the basic task data in a modular operation mode to obtain the basic segment data, so that continuous data records are distributed into different basic segment data, in the subsequent data processing, the data records are favorably executed according to the sequence in the basic task information, the efficiency of merging the execution results is improved, the basic segment data and the processing instructions are used as target task information to be sent to the node server for execution, the node server calculates the threshold value of the number of the data records which can be processed simultaneously according to the running state of the current node server after receiving the target task information, segments the task which exceeds the threshold value of the number, and executes the segmented task data in a thread pool mode, and returns the execution results to the central server, so that the node server can dynamically set the threshold value of the number of the tasks which are processed simultaneously according to the running state of the node server, and processes the data records by using multithreading, the stability in the data processing process is guaranteed, the data processing efficiency is improved, all the acquired by the basic segment data and the processing instructions are used as target task information to be sent to the node server for execution, and the target task processing results are distributed to the client, and the target processing results are reasonably distributed to the node server, and the target processing results are obtained.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application environment of a data processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart of an implementation of a data processing method provided by an embodiment of the present invention;

FIG. 3 is a flowchart illustrating the implementation of step S12 in the data processing method according to the embodiment of the present invention;

fig. 4 is a flowchart illustrating implementation of step S18 in the data processing method according to the embodiment of the present invention;

fig. 5 is a flowchart of implementation of step S19 in the data processing method according to the embodiment of the present invention;

Fig. 6 is a flowchart illustrating implementation of step S21 in the data processing method according to the embodiment of the present invention;

FIG. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

Fig. 8 is a schematic diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 illustrates an application environment of a data processing method according to an embodiment of the present invention. The data processing method is applied to a data processing scene with a large data volume. The processing scene comprises a center server, a node server and a client, wherein the center server and the client are connected through a network, the client transmits data to the center server, the center server segments the data and transmits the data to the node server, the node server processes the received data and transmits a processing result to the center server, the client can be particularly but not exclusively intelligent terminal equipment such as a mobile phone, a tablet personal computer (Personal Computer, a Personal Computer (PC) and the like, and the center server and the node server can be particularly realized by an independent server or a server cluster formed by a plurality of servers. The data processing method in the embodiment of the invention specifically comprises the following steps executed by the central server:

And sending the target result to the client.

The data processing method in the embodiment of the invention specifically comprises the following steps executed by the node server:

calculating the number of data records of the basic fragment data;

And sending the execution result to the central server.

Referring to fig. 2, fig. 2 shows a data processing method according to an embodiment of the present invention, and the method is applied to the central server and the node server in fig. 1 for illustration, and is described in detail as follows:

S11: the central server acquires basic task information sent by the client, wherein the basic task information comprises basic task data and processing instructions corresponding to the basic task data.

Specifically, the central server obtains basic task information sent by the client through a network transmission protocol, wherein the basic task information comprises, but is not limited to, basic task data and processing instructions.

The basic task data is data of a large data volume, for example, data of millions or more in the policy system.

The processing instruction refers to one or more batches of instructions for driving the basic task data to process in the process of executing the basic task information.

For example, in one embodiment, the processing instructions in the base task information include: the basic task data are three million pieces of data in a policy system database, and it can be understood that when the processing instruction is executed, the three million pieces of data are first queried to obtain a first query result, then the first query result is executed to obtain a calculation result, and then the second query instruction is used to query the calculation result to obtain a final execution result.

Among them, network transport protocols include, but are not limited to: internet control message Protocol (Internet Control Message Protocol, ICMP), address resolution Protocol (ARP ADDRESS Resolution Protocol, ARP), and file transfer Protocol (FILE TRANSFER Protocol, FTP), etc.

S12: the central server adopts a mode of modular operation to segment the basic task data according to the preset segmentation number N to obtain N basic segmentation data, wherein N is a positive integer.

Specifically, the central server segments the basic task data to obtain N pieces of basic segment data, and each piece of basic segment data comprises a plurality of data records, so that the data records in the basic task data can be sent to a plurality of node servers for data processing, and the data processing efficiency is improved. In the embodiment of the invention, the central server fragments the basic task data through a fragmentation strategy based on modular Operation (modular Operation). The modulo operation is a mathematical operation, and the basic form of the modulo operation is a% b, or a mod b, which represents the remainder of dividing a by b, and the specific implementation method can be as follows: and performing modular operation on the preset number N of fragments by using the identification of the data record in the basic task data, and putting the obtained data record with the same module into the same fragment.

Optionally, the slicing policy may be an average allocation algorithm policy, a job name hash value odd-even algorithm policy, a round-robin slicing policy, etc., which may be specifically selected according to the actual situation, and is not limited herein.

S13: the central server takes the basic fragment data and the processing instruction as target task information.

Specifically, based on the N pieces of basic slice data obtained in step S12, the processing instruction and each piece of basic slice data are used as one piece of target task information, and N pieces of target task information are obtained.

For example, in a specific embodiment, the number N of obtained basic slice data is 4, and each basic slice data in the 4 basic slice data and the processing instruction in step S11 are taken as 1 target task information, so as to obtain 4 target task information.

S14: the central server selects a target node server from a node server set according to a preset load balancing distribution mode, and distributes target task information to the target node server for execution, wherein the node server set comprises a preset number of node servers.

Specifically, the central server and the node server set form a cluster, and according to the current state of each node server in the node server set, the central server sends target task information to each node server in a preset Load Balancing (Load Balancing) distribution mode.

The load balancing is divided into local load balancing (Local Load Balance) and global load balancing (Global Load Balance, also called regional load balancing) from the geographic structure of the application, and the load balancing distribution mode adopted in the embodiment can be specifically a local load balancing distribution mode, and the local load balancing distributes access requests to node servers in the cluster reasonably to bear together through flexible and diverse balancing strategies. Even if the existing node server is expanded and upgraded, a new node server is simply added to the cluster, the existing network structure is not required to be changed, the existing service is stopped, the problems of excessive access requests and overload of the network can be effectively solved, the server with excellent performance is not required to be purchased at high cost, the existing equipment is fully utilized, and the loss of the access requests caused by single-point faults of the node server is avoided.

It should be noted that the target task information sent to the node server may be one or more, and may be specifically determined according to the distribution situation after load balancing performed by the central server, which is not limited herein.

Preferably, the embodiment of the invention adopts a frame of a reliable coordination system (Zookeeper) of a distributed system based on a Quartz formula (Quartz) to realize the task scheduling of load balancing of the cluster, and the increase, the decrease and the abnormal conditions of multiple nodes are managed through the Zookeeper frame of the Quartz, so that the reliability and the computing capacity of the node tasks are improved. Meanwhile, through the concept of decentralization, the computing performance is improved and the computing efficiency is improved by dispersing the centralized data into a plurality of nodes for computing.

Wherein, quartz is an open source job scheduling framework, and the core of the Quartz framework is a scheduler. The scheduler is responsible for managing the quantiz application runtime environment. Instead of doing all of the work on its own, the scheduler relies on some very important components within the framework. Quartz is not just a thread and thread management. To ensure scalability, quartz employs a multithreaded-based architecture. At start-up, the framework initializes a set of worker threads that are used by the scheduler to execute a predetermined job.

The ZooKeeper is a distributed application coordination service of open source codes, is a realization of Chubby of Google as an open source, and is an important component of Hadoop and Hbase. It is a software providing a consistency service for distributed applications, the provided functions include: configuration maintenance, domain name service, distributed synchronization, group service, etc.

S15: the node server receives target task information sent by the center server, wherein the target task information comprises basic fragment data and processing instructions.

Specifically, the node server acquires target task information sent by the client through a network transmission protocol, wherein the target task information comprises basic fragment data and a processing instruction.

S16: the node server calculates the number of data records of the underlying shard data.

Specifically, the node server counts the number of data records of the basic fragment data, and judges the complexity of data processing according to the obtained number information, and the greater the number, the more complex the data processing, and the more system resources are consumed.

S17: the node server obtains the CPU model I ₁, the memory model I ₂, the current utilization rate P ₁ of the CPU and the current utilization rate P ₂ of the memory, and calculates a processable number threshold S according to the following formula:

Wherein, J ₁ is a preset weight corresponding to the cpu model I ₁, and J ₂ is a preset weight corresponding to the memory model I ₂.

Specifically, different node servers have different data processing capacities due to different configurations and different current running states, so that in order to avoid server faults or anomalies in the data processing process caused by overload running of the node servers, the threshold value of the number of data which can be processed currently by the node servers is calculated through the current running states of the node servers, and a formula can be used for specific calculationIs calculated, wherein S is J ₁ which is a preset weight corresponding to the type I ₁ of the central processing unit, J ₂ is a preset weight corresponding to the type I ₂ of the memory, P ₁ is the current utilization rate of the central processing unit, and P ₂ is the current utilization rate of the memory.

Preferably, in the embodiment of the present invention, the value of J ₁ is set to 0.7, and the value of J ₂ is set to 0.63, and may be set according to the actual situation, and there is no specific limitation.

It should be noted that, the steps S16 and S17 are not necessarily executed sequentially, and may be executed simultaneously in parallel, which is not limited herein.

S18: if the number of the target fragments is larger than the threshold number of the target fragments, the node server divides the basic fragment data according to a preset dimension to obtain K target fragment data, wherein K is a positive integer.

Specifically, when the number of slices calculated in step S16 is greater than the threshold number of slices calculated in step S17, it is indicated that the node server cannot process all the data in the basic slice data at the same time, and therefore, the basic slice data needs to be segmented according to a preset dimension to obtain K pieces of target slice data, so that each piece of target slice data is within the processing range of the node server.

The preset dimensions include, but are not limited to, geographic area, administrative area, time dimension, etc., and may be set according to actual needs, which is not limited herein.

S19: and executing processing instructions on the K target fragment data by the node server in a thread pool mode to obtain an execution result.

Specifically, a Thread Pool (Thread Pool) is adopted to execute processing instructions on K target fragment data to obtain an execution result, if the processing instructions include a plurality of associated instructions, the processing instructions are sequentially executed, and for specific description, please refer to an example of step S11, and for avoiding repetition, details are not repeated here.

Wherein the thread pool is a form of multi-threaded processing in which tasks are added to a queue and then automatically started after the threads are created. The thread pool threads are background threads. Each thread runs with default priority using default stack size and is in a multithreaded unit. If a thread is idle in the host, then the thread pool will insert another helper thread to keep all processors busy. If all thread pool threads remain busy all the time, but the queue contains pending work, then the thread pool will create another helper thread after a period of time but the number of threads will never exceed a maximum value. Threads that exceed a maximum may be queued but they will not start until the other threads are completed.

S20: and the node server sends the execution result to the central server.

Specifically, the node server transmits the execution result obtained in step S19 to the center server through the network transmission protocol.

S21: and the central server receives the execution results sent by the target node server and gathers the execution results to obtain target results.

Specifically, the central server gathers the execution results after receiving the execution results sent by the node server each time, and takes the gathered results as target results after gathering all the execution results.

S22: the central server sends the target result to the client.

Specifically, the central server sends the target result to the client through a network transmission protocol.

In this embodiment, the central server obtains basic task information sent by the client, segments basic task data by adopting a mode of modulo operation, obtains basic segment data, makes continuous data records dispersed in different basic segment data, in the subsequent processing of the data, is favorable for executing the data records according to the sequence in the basic task information, improves the efficiency of merging execution results, sends the basic segment data and processing instructions as target task information to the node server for execution, after receiving the target task information, calculates the threshold value of the number of the data records which can be processed simultaneously according to the running state of the current node server, segments the task exceeding the threshold value of the number, executes the segmented task data by adopting a thread pool mode, returns the execution result to the central server, makes the node server dynamically set the threshold value of the number of the tasks which are processed simultaneously according to the running state of the node server, uses multithreading to process, ensures the stability in the data processing process, simultaneously improves the data processing efficiency, and sends all the obtained execution results to the node server for merging the target task information to the node server for execution, thereby realizing the reasonable distribution of the target execution results to the client.

In an embodiment, as shown in fig. 3, in step S12, that is, the central server adopts a mode of modulo operation to segment the basic task data according to a preset segment number N to obtain N basic segment data, the method specifically includes the following steps:

S121: and acquiring the number of the data records of the basic task data, and numbering each data record sequentially.

Specifically, the number of data records of the basic task data is obtained, a slicing list is generated in the cache records, the data records in the basic task data are written into the slicing list, and each data record is numbered in sequence.

For example, in a specific embodiment, the number of the obtained data records of the basic task data is 50, and each data record is numbered to obtain 50 numbers from 1 to 50.

It should be noted that, in the modulo process from step S121 to step S123 described in this embodiment, all the obtained basic slicing data are processed in the cache, and finally, the obtained basic slicing data are also stored in the slicing list.

S122: and for each data record, performing modular operation on the number N of the fragments by using the number of the data record to obtain the number module of the data record.

Specifically, the numbers obtained in step S121 are used to perform a modulo operation on the preset number N of slices, so as to obtain a number modulo corresponding to the data record for each number.

The modulo operation is a mathematical operation, which basically has the form a% b, or a mod b, and represents the remainder of dividing a by b.

Taking 50 numbers in step S121 as an example, in a specific embodiment, the preset sliced data is 4, the modulo operation is performed on 4 by using 1 to 50, where the modulo of numbers 1,5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45 and 49 modulo 4 is 1, the modulo of numbers 2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46 and 50 modulo 4 is 2, the modulo of numbers 3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43 and 47 modulo 4 is 3, and the modulo of numbers 4,8, 12, 16, 20, 24, 28, 32, 36, 40, 44 and 48 modulo 4 is 0.

S123: and dividing the data records with the same number module into the same sliced set, and taking the data record in each sliced set as basic sliced data to obtain N basic sliced data.

Specifically, the data records with the same number mode in the slicing list are put into the same slicing set, and each slicing set is used as basic slicing data to obtain N basic slicing data.

Taking the result obtained by performing modulo operation on 50 numbers in step S122 as an example, taking the data record with the number modulo of 1 as 1 st basic slice data, namely, taking the data record corresponding to the numbers 1, 5,9, 13, 17, 21, 25, 29, 33, 37, 41, 45 and 49 into the 1 st basic slice data, taking the data record with the number modulo of 2 as 2 nd basic slice data, namely, taking the data record corresponding to the numbers 2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46 and 50 into the 2 nd basic slice data, taking the data record with the number modulo of 3 as 3 rd basic slice data, namely, taking the data record corresponding to the numbers 3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43 and 47 into the 3 rd basic slice data, taking the data record with the number modulo of 0 as 4 th basic slice data, taking the data record with the number modulo of 2, namely, taking the data record corresponding to the numbers 2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46 and 50 into the 2 nd basic slice data, taking the data record as the 3 rd basic slice data, taking the data record, taking the adjacent data record into the 4 nd basic slice data, namely, taking the data record with the adjacent data to be not to be allocated to the 4, and taking the data by performing the next-phase difference, namely, taking the data record into the adjacent data, and taking the adjacent data to the data by performing the next-allocated data, and taking the data by the next-to be in order, and taking the time-phase-by the next-by the data, and taking the next-by the data, and taking the next data, and taking the data.

In this embodiment, the number of data records of the basic task data is obtained, each data record is numbered sequentially, for each data record, the number of the data record is used to perform modular operation on the number of fragments N to obtain the number module of the data record, the data records with the same number module are divided into the same fragment set, the data records in each fragment set are used as basic fragment data to obtain N basic fragment data, so that the basic task data with large data volume is divided into multiple basic fragment data, the number of data records included in different basic fragment data is approximately the same, when the basic fragment data is distributed to different node servers for calculation, the data processing efficiency is improved, the calculation time of different node servers is not too much, the time for result summarization is saved, and the data processing efficiency is improved.

In an embodiment, the node server segments the basic fragment data exceeding the threshold number of fragments, as shown in fig. 4, in step S18, that is, the node server segments the basic fragment data according to a preset dimension to obtain K pieces of target fragment data, which specifically includes the following steps:

s181: the ratio of the number of bars to the threshold number of bars is calculated.

Specifically, the ratio between the number of pieces of data record of the base piece of data obtained in step S16 and the threshold number of pieces obtained in step S17 is calculated.

For example, in one embodiment, the number of data records of the base slice data is 20000, the threshold number of stripes is 600, and the ratio obtained by calculation is 33.33.

S182: and performing upward rounding operation on the comparison value, and taking the result of the upward rounding operation as the number of divided pieces K, wherein K is a positive integer.

Specifically, the ratio obtained in step S181 is subjected to an upward rounding operation by the upward rounding function ceil (), and the result of the upward rounding operation is taken as the number of divided pieces K.

Wherein the upward rounding operation is a mathematical operation, which means that the minimum integer equal to or larger than the minimum integer is obtained by using mathematical symbolsTaking the ratio 33.33 obtained in step 181 as an example, the ratio is rounded up to:

s183: and dividing the basic fragment data according to the number of the fragments to obtain K pieces of target fragment data.

Specifically, the basic piece data is divided according to the number K of divided pieces obtained in step S182, to obtain K pieces of target piece data.

Taking the basic slice data in step S181 and the number of slices obtained in step S182 as an example, the number of data records of the basic slice data is 20000 and the number of slices is 34, and thus, the data records are divided once every 600 data records from front to back according to the number sequence of the data records, 33 target slice data are obtained after the division is performed 33 times, and the remaining 200 data are used as 34 th target slice data.

In this embodiment, the ratio of the number of slices to the threshold number of slices is calculated, the comparison value is subjected to an upward rounding operation, and the result of the upward rounding operation is used as the number of segments K, so that the basic segment data is segmented according to the number of segments to obtain K pieces of target segment data, the number of slices of the data record in each piece of target segment data does not exceed the threshold number of slices, and the stability of the node server in the data processing process is ensured.

In an embodiment, the node server performs data processing on the target sliced data, as shown in fig. 5, in step S19, that is, in a mode of using a thread pool by the node server, a processing instruction is executed on K target sliced data to obtain an execution result, which specifically includes the following steps:

S191: establishing a fixed-length thread pool according to the preset thread quantity Q, wherein Q is a positive integer;

Specifically, in the embodiment of the present invention, a multithreading manner provided by a thread pool is used to process target sliced data, before data processing, the target sliced data is put into a waiting queue, and a thread pool with a preset thread number of Q is created, where the preset thread data can be set according to actual needs, for example, the thread number is set to 6.

Among them, thread pools include, but are not limited to, cacheable thread pools (CACHED THREAD Pool), fixed length thread pools (Fixed Thread Pool), single threaded thread pools (SINGLE THREAD Executor), and timed thread pools (Scheduled Thread Pool), among others.

Preferably, the thread pool adopted in the embodiment of the invention is a fixed-length thread pool, the multithreading operation is performed by controlling the maximum concurrency number of threads, and the excessive threads wait in the queue.

For example, in one embodiment, by using executives newfixedthread pool (6) to create a thread pool of fixed length, the thread pool can control the maximum number of threads to be 6, and the excess threads wait in the queue. When data processing is executed, the thread pool can execute processing instructions on 6 target fragment data at the same time, and the multithreading mode is used, so that the calculation time is saved, and the data processing efficiency is improved.

S192: if the number K of the target sliced data is larger than the number Q of the threads, Q target sliced data are selected from the K target sliced data and put into a fixed-length thread pool, and a processing instruction is executed.

Specifically, when the number K of the target sliced data is smaller than or equal to the number Q of threads, directly placing the K target sliced data into a thread pool for data processing, and when the number K of the target sliced data is larger than the number Q of threads, selecting Q target sliced data from the K target sliced data to execute a processing instruction.

For example, in one embodiment, the number K of target sliced data is 11, and the number Q of thread data is 6, then 6 target sliced data are selected from the 11 target sliced data and put into the thread pool for data processing, and the remaining 5 target sliced data are put into the waiting queue.

It should be noted that, if the processing instruction includes a plurality of batches of instructions, the processing instruction is sequentially executed according to the order of the instructions and the logic relationship between the instructions during the data processing, and the specific description may refer to the example in step S11, so that repetition is avoided and will not be repeated here.

S193: if any target fragment data in the fixed-length thread pool is monitored to be processed, a processing result of the target fragment data is obtained, one target fragment data is selected from unselected target fragment data, and the selected target fragment data is re-added into the fixed-length thread pool to execute a processing instruction until all the K target fragment data are processed.

Specifically, when the completion of a data processing task of target fragment data in a fixed-length thread pool is monitored, a processing result of the target fragment data is obtained, one unselected target fragment data is selected in a waiting queue, and the unselected target fragment data is added into the fixed-length thread pool again to execute a processing instruction, so that the data processing is performed to the maximum extent in the fixed-length thread pool, and if the completion of all processing of K target fragment data is known, the completion of the current data processing task of the node server is confirmed.

Taking 11 pieces of target sliced data in step S192 as an example, when it is detected that the target sliced data in the thread pool is executed, an execution result of the target sliced data is obtained, at this time, only 5 pieces of target sliced data in the thread pool are executing processing instructions, that is, only 5 pieces of threads process busy states, and 1 piece of threads are in idle states, so that the processing data efficiency is maximized, all threads in the thread pool need to be in busy states as much as possible, at this time, 1 piece of target sliced data is selected from 5 pieces of unselected target sliced data in the waiting queue and is put into the thread pool for processing until the execution result of 11 pieces of target sliced data is obtained, and then the completion of the data processing task is confirmed.

S194: summarizing the processing results of the K target fragment data to obtain an execution result.

Specifically, the K processing results obtained in step S193 are summarized, and the final execution result of the node server is obtained.

In the embodiment, the thread pool is created, and the target fragment data is subjected to data processing in a multithreading mode, so that processing instructions can be executed on a plurality of data records at the same time, the data processing time is saved, and the data processing efficiency is improved.

In an embodiment, the central server gathers the received execution results to obtain a target result, as shown in fig. 6, in step S21, that is, the central server receives the execution result sent by the target node server, gathers the execution result to obtain the target result, and specifically includes the following steps:

S211: and receiving each execution result sent by each target node server.

Specifically, the target node server sends the execution result to the center server after obtaining the execution result.

The number of the target node servers may be one or more, and it is understood that if the number of the target node servers is one, only one execution result is received, and if the number of the target node servers is multiple, each target node server receives one execution result.

S212: and storing the execution result into a summary table.

Specifically, the center server stores the execution result received in step S211 in a summary table for temporarily storing the execution result.

S213: and receiving a message of completion of the sending of the execution results sent by each target node server, and sequencing the execution results in the summary table to obtain a target result if all the target node servers are detected to send the message.

Specifically, the number of the target node servers may be one or more, after the execution result is received, the target node servers send a message that the execution result is sent to the central node server, and after receiving the message of all the target node servers, the central server receives the execution results of all the target node servers, sorts the data in the summary table according to a preset sorting mode to obtain the target result.

The preset sorting mode may be in the order of the generation time, or may be set according to actual needs, which is not limited herein.

In this embodiment, by receiving the execution results sent by the target node server, the execution results are stored in the summary table, and then, after all the execution results are received, it is determined that the execution results in the summary table are ordered according to the received information that the execution results are sent, so as to obtain the target result, and the integrity of the finally obtained target result data is ensured.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In one embodiment, a data processing apparatus is provided, where the data processing apparatus corresponds to the data processing method in the above embodiment one by one. As shown in fig. 7, the data processing apparatus includes a center server and a node server, and for convenience of explanation, only the portions related to the present embodiment are shown:

referring to fig. 7, the central server of the data processing apparatus includes: a data acquisition module 110, a data slicing module 120, a task generation module 130, a task allocation module 140, a result acquisition module 150, and a result transmission module 160. The functional modules are described in detail as follows:

The data acquisition module 110 is configured to acquire basic task information sent by the client, where the basic task information includes basic task data and a processing instruction corresponding to the basic task data;

The data slicing module 120 is configured to slice the basic task data according to a preset slicing number N by adopting a mode of modulo arithmetic, so as to obtain N basic slicing data, where N is a positive integer;

a task generating module 130, configured to take the basic fragment data and the processing instruction as target task information;

the task allocation module 140 is configured to select a target node server from a node server set according to a preset load balancing allocation manner, and allocate target task information to the target node server for execution, where the node server set includes a preset number of node servers;

the result obtaining module 150 is configured to receive an execution result sent by the target node server, and aggregate the execution result to obtain a target result;

And the result sending module 160 is configured to send the target result to the client.

Further, the data slicing module 120 includes:

a numbering unit 121, configured to obtain the number of data records of the basic task data, and sequentially number each data record;

An operation unit 122, configured to perform a modulo operation on the number of fragments N by using the number of the data record for each data record, to obtain a number modulo of the data record;

And the slicing unit 123 is configured to divide the data records with the same number module into the same slicing set, and use the data record in each slicing set as a basic slicing data to obtain N basic slicing data.

Further, the result obtaining module 150 includes:

A receiving unit 151, configured to receive an execution result sent by each target node server;

A storage unit 152 for storing each execution result into a summary table;

and the sorting unit 153 is configured to receive a message that the execution result sent by each target node server is sent, and sort the execution results in the summary table to obtain a target result if it is detected that all the target node servers have sent the message.

With continued reference to fig. 7, the node server of the data processing apparatus includes: a task receiving module 210, a number counting module 220, a threshold calculating module 230, a data dividing module 240, a data processing module 250 and a result transmitting module 260. The functional modules are described in detail as follows:

The task receiving module 210 is configured to receive target task information sent by the central server, where the target task information includes basic fragment data and a processing instruction;

A stripe count module 220, configured to calculate a stripe count of a data record of the base fragment data;

The threshold calculation module 230 is configured to obtain the cpu model I ₁, the memory model I ₂, the current cpu usage P ₁ and the current cpu usage P ₂, and calculate a processable number threshold according to the following formula

Value S:

Wherein, J ₁ is a preset weight corresponding to the cpu model I ₁, and J ₂ is a preset weight corresponding to the memory model I ₂;

The data segmentation module 240 is configured to segment the basic fragment data according to a preset dimension if the number is greater than a threshold number, to obtain K target fragment data, where K is a positive integer;

the data processing module 250 is configured to execute processing instructions on the K target fragment data by using a thread pool manner, so as to obtain an execution result;

And the result transmission module 260 is configured to send the execution result to the central server.

Further, the data segmentation module 240 includes:

a calculating unit 241 for calculating a ratio of the number of bars to a threshold value of the number of bars;

a rounding unit 242, configured to perform an upward rounding operation on the comparison value, and take a result of the upward rounding operation as a number of segments K, where K is a positive integer;

The dividing unit 243 is configured to divide the basic slice data according to the number of the divided slices, to obtain K pieces of target slice data.

Further, the data processing module 250 includes:

A creating unit 251, configured to create a fixed-length thread pool according to a preset thread number Q, where Q is a positive integer;

the execution unit 252 is configured to select Q pieces of target sliced data from the K pieces of target sliced data, and put the Q pieces of target sliced data into the fixed-length thread pool to execute the processing instruction if the number K of target sliced data is greater than the number Q of threads;

The circulation unit 253 is configured to acquire a processing result of any one of the target sliced data in the fixed-length thread pool if it is detected that the processing of the target sliced data is completed, select one of the target sliced data from the unselected target sliced data, and re-add the selected target sliced data to the fixed-length thread pool to execute a processing instruction until all of the K target sliced data are processed;

and the summarizing unit 254 is configured to summarize the processing results of the K pieces of target fragment data, so as to obtain an execution result.

For specific limitations of the data processing apparatus, reference may be made to the above limitations of the data processing method, and no further description is given here. Each of the modules in the above-described data processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a central server or a node server, and the internal structure of the computer device may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data records in the data processing method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data processing method.

In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the steps of the data processing method of the above embodiment, such as steps S11 to S22 shown in fig. 2. Or the processor, when executing the computer program, implements the functions of the modules/units of the data processing apparatus of the above embodiment, such as the functions of the modules 110 to 160 of the central server and the functions of the modules 210 to 260 of the node server shown in fig. 7. In order to avoid repetition, a description thereof is omitted.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the data processing method of the foregoing embodiment, or where the computer program is executed by the processor to implement the functions of each module/unit of the data processing apparatus of the foregoing embodiment, which are not described herein again for avoiding repetition.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A data processing method, characterized in that the data processing method comprises the following steps performed by a central server:

sending the target result to the client;

Wherein, the execution result is obtained by the target node server through the following steps:

calculating the number of data records of the basic fragment data;

And executing the processing instruction on the K target fragment data by using a thread pool mode to obtain an execution result.

2. The data processing method as claimed in claim 1, wherein the performing the slicing on the basic task data according to the preset number N of slices by using a modulo operation method, to obtain N basic sliced data includes:

Acquiring the number of data records of the basic task data, and numbering each data record sequentially;

Performing modular operation on the number N of the fragments by using the number of the data record aiming at each data record to obtain a number module of the data record;

dividing the data records with the same number module into the same slicing set, and taking the data record in each slicing set as basic slicing data to obtain N basic slicing data.

3. The data processing method as claimed in claim 1, wherein said receiving the execution result sent by the target node server and summarizing the execution result, and obtaining the target result includes:

receiving an execution result sent by each target node server;

Storing each execution result into a summary table;

and receiving a message that the sending of the execution results sent by each target node server is completed, and if all the target node servers are detected to send the message, sequencing the execution results in the summary table to obtain a target result.

4. A data processing method, characterized in that the data processing method comprises the following steps performed by a node server:

calculating the number of data records of the basic fragment data;

And sending the execution result to the central server.

5. The data processing method as claimed in claim 4, wherein the dividing the basic slice data according to a preset dimension to obtain K target slice data includes:

calculating the ratio of the number of strips to the threshold number of strips;

performing upward rounding operation on the ratio, and taking the result of the upward rounding operation as a segmentation piece number K, wherein K is a positive integer;

And dividing the basic fragment data according to the number of the fragments to obtain K target fragment data.

6. The data processing method as claimed in claim 4, wherein executing processing instructions on the K pieces of target sliced data using a thread pool manner, the execution results comprising:

Establishing a fixed-length thread pool according to the preset thread quantity Q, wherein Q is a positive integer;

If the number K of the target fragment data is larger than the number Q of the threads, Q target fragment data are selected from the K target fragment data and put into the fixed-length thread pool, and the processing instruction is executed;

If any one of the target fragment data in the fixed-length thread pool is monitored to be processed, a processing result of the target fragment data is obtained, one target fragment data is selected from the unselected target fragment data, and the target fragment data is re-added into the fixed-length thread pool to execute the processing instruction until all the K target fragment data are processed;

and summarizing the processing results of the K pieces of target fragment data to obtain the execution result.

7. A data processing apparatus, the data processing apparatus comprising a central server, the central server comprising:

The result sending module is used for sending the target result to the client;

calculating the number of data records of the basic fragment data;

8. A data processing apparatus, the data processing apparatus comprising a node server, the node server comprising:

the task receiving module is used for receiving target task information sent by the central server, wherein the target task information comprises basic fragment data and a processing instruction;

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the data processing method according to claim 1 or 3 when executing the computer program or the steps of the data processing method according to any of claims 4 to 6 when executing the computer program.

10. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the data processing method according to claim 1 or 3, or the computer program when executed by a processor implements the steps of the data processing method according to any one of claims 4 to 6.