Disclosure of Invention
In order to overcome the problems in the related art, the specification provides a task concurrent processing method, a task concurrent processing device and a computing device.
A method of task concurrency processing, the method comprising:
determining a plurality of tasks to be subjected to concurrent processing, wherein the tasks correspond to data blocks to be processed when the tasks are executed;
distributing the tasks to one or more task sets, wherein the data blocks corresponding to the tasks in the same task set are different;
and carrying out concurrent processing on the tasks in the same task set.
Optionally, the allocating the tasks to one or more task sets, where data blocks corresponding to the tasks in the same task set are different includes:
and determining data block identifications of data blocks corresponding to the tasks, and distributing the tasks to one or more task sets by using the data block identifications, wherein the tasks corresponding to the same data block identification are distributed to different task sets.
Optionally, the determining the data block identifier of the data block corresponding to the task includes:
and reading the row address stored in the row directory area in the data block, and determining the data block identifier according to the data object number, the data file number and the data block number in the row address.
Optionally, the processing states of the multiple tasks to be concurrently processed are recorded in a state data table;
the method further comprises the following steps:
after concurrent processing is carried out on tasks in the same task set, the processing state of the tasks is obtained;
deleting the processing state of the task recorded in the state data table before concurrent processing;
and newly adding the processing state of the task after concurrent processing in the state data table.
Optionally, the data block includes a data block in an Oracle database management system.
A method of task concurrency processing, the method comprising:
one or more processing tasks to be processed are obtained, and the processing tasks correspond to a data table which needs to be subjected to data processing;
splitting the processing task into a plurality of subtasks, and determining a plurality of subtasks to be concurrently processed, wherein the subtasks correspond to data blocks in a data table to be processed when the subtasks are executed;
distributing the plurality of subtasks to one or more task sets, wherein the data blocks corresponding to the subtasks in the same task set are different;
and submitting the one or more task sets to a thread pool, and carrying out concurrent processing on subtasks in the same task set.
A task concurrency processing device, the method comprising:
a task determination module to: determining a plurality of tasks to be subjected to concurrent processing, wherein the tasks correspond to data blocks to be processed when the tasks are executed;
a task allocation module to: distributing the tasks to one or more task sets, wherein the data blocks corresponding to the tasks in the same task set are different;
a concurrency processing module to: and carrying out concurrent processing on the tasks in the same task set.
Optionally, the task allocation module is further configured to:
and determining data block identifications of data blocks corresponding to the tasks, and distributing the tasks to one or more task sets by using the data block identifications, wherein the tasks corresponding to the same data block identification are distributed to different task sets.
Optionally, the task allocation module is further configured to:
and reading the row address stored in the row directory area in the data block, and determining the data block identifier according to the data object number, the data file number and the data block number in the row address.
Optionally, the processing states of the multiple tasks to be concurrently processed are recorded in a state data table;
the apparatus further comprises a state data table update module configured to:
after concurrent processing is carried out on tasks in the same task set, the processing state of the tasks is obtained;
deleting the processing state of the task recorded in the state data table before concurrent processing;
and newly adding the processing state of the task after concurrent processing in the state data table.
Optionally, the data block includes a data block in an Oracle database management system.
A task concurrency processing device, the device comprising:
a task salvage module for: one or more processing tasks to be processed are obtained, and the processing tasks correspond to a data table which needs to be subjected to data processing;
a task splitting module to: splitting the processing task into a plurality of subtasks, and determining a plurality of subtasks to be concurrently processed, wherein the subtasks correspond to data blocks in a data table to be processed when the subtasks are executed;
a task allocation module to: distributing the plurality of subtasks to one or more task sets, wherein the data blocks corresponding to the subtasks in the same task set are different;
a concurrency processing module to: and submitting the one or more task sets to a thread pool so as to carry out concurrent processing on the subtasks in the same task set.
A computing device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
determining a plurality of tasks to be subjected to concurrent processing, wherein the tasks correspond to data blocks to be processed when the tasks are executed;
distributing the tasks to one or more task sets, wherein the data blocks corresponding to the tasks in the same task set are different;
and carrying out concurrent processing on the tasks in the same task set.
A computing device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
one or more processing tasks to be processed are obtained, and the processing tasks correspond to a data table which needs to be subjected to data processing;
splitting the processing task into a plurality of subtasks, and determining a plurality of subtasks to be concurrently processed, wherein the subtasks correspond to data blocks in a data table to be processed when the subtasks are executed;
distributing the plurality of subtasks to one or more task sets, wherein the data blocks corresponding to the subtasks in the same task set are different;
and submitting the one or more task sets to a thread pool, and carrying out concurrent processing on subtasks in the same task set.
The technical scheme provided by the embodiment of the specification can have the following beneficial effects:
in this embodiment of the present specification, tasks may be distinguished with data blocks as dimensions, tasks corresponding to different data blocks may be allocated to a task set, so that tasks in the same task set correspond to different data blocks, and when performing concurrent processing, the tasks in the same task set are performed concurrent processing, so that each task is processed for a different data block in a single concurrent processing process, and thus row lock conflicts caused by a full transaction slot may be prevented, a failure in task processing may be reduced, and task processing efficiency may be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Some terms referred to in the embodiments of the present specification are explained first.
Data Blocks (Data Blocks), which are the smallest unit of storage for database management systems (e.g., Oracle databases), store Data in "Blocks". A data block occupies a certain disk space. Each time the data block management system requests data, it is in units of data blocks. That is, the data block management system requests an integer multiple of a block of data each time. If the amount of data requested by the data block management system is less than the amount of data for one data block, the data block management system will still read the entire block. That is, "data block" is the smallest unit or the most basic unit of data read and write by the data block management system.
The Data Block stores Data of the Data table and Data of the index, and the Format of the Data Block (Data Block Format) is the same regardless of the type of Data stored, as shown in fig. 1, which is a schematic diagram of the Data Block, and the Data Block includes a Block header (header/Common and Variable), a table directory area (table directory), a row directory area (rowdirectory), an available space area (free space), and a row Data area (row Data).
header/Common and Variable: basic information of the storage block includes standard contents and variable contents, such as: the physical address of the data block, the type of segment (whether data segment or index segment) to which the data block belongs.
Table Directory area (Table Directory): storing the information of the data table, namely: if the data in the data table is stored in this data block, the relevant information of the table will be stored in the "table entry".
Row Directory area (Row Directory): if there is row data in the data block, information of these rows will be recorded in the row directory, which information includes addresses of the rows, etc.
Line Data area (Row Data): is where the table data and index data are actually stored, this portion of space is space already occupied by the data row.
Available Space region (Free Space): the available space is an unused area in a data block that is used for the insertion of new rows and the updating of already existing rows.
The data block header may include one or more IT L (interrupted Transaction L ist) Transaction slots, IT L is an internal component of the data block to record all transactions occurring in the data block, an IT L may be associated with a task (also referred to as Transaction) record, if the Transaction has committed, the IT L location may be reused, if a task has not committed, the Transaction will always occupy an IT L slot, if all Transaction slots in the data block are occupied, the data block may also use Free Space in the data block to construct an IT L for use by the task, if Free Space is also occupied, subsequent task requests may wait.
The locking mechanism of database such as Oracle is a light-weight locking mechanism, which does not manage the locking of data by constructing a lock list, but directly stores the lock as the attribute of a data block at the head of the data block, when a transaction needs to update the data in the data block, an IT L slot must be obtained first, then information such as the current transaction ID, the data block address used by the transaction, the SCN number, whether the current transaction is committed and the like is written into an IT L slot, the transaction in the IT L slot can be covered by other transactions after being committed or rolled back, and when the new transaction finds that the IT L slot is insufficient, a new IT L slot is dynamically created.
In the scene of executing a plurality of tasks concurrently, a data block may face processing of a plurality of concurrent tasks, the data block is provided with a plurality of transaction slots, the plurality of concurrent tasks for the same data block occupy the transaction slots of the data block, and then the plurality of concurrent tasks are executed according to the occupied sequence.
The description will be given by taking an example of processing a data table task containing a plurality of user information, the user information in the data table reaches the level of ten million, and a certain task relates to the modification of a certain attribute of a part of users in the data table. Because the data volume related to the modification task is huge, the modification task can be concurrently processed in a mode of splitting the task into a plurality of subtasks, so that the data processing efficiency is improved. The concurrent processing process may be: and splitting the tasks into different subtasks according to concurrent processing dimensions (such as user account numbers, business document numbers and the like), importing the split subtasks into a thread pool, and fishing a plurality of subtasks from the thread pool for concurrent processing during processing. And aiming at the tasks in the thread pool, a task data table can be generated, and the task data table records the processing state of each sub task to be processed. And when concurrent processing is performed each time, locking the subtask data table in a pessimistic lock mode, then judging the processing state of the subtask, processing the task if the subtask is not processed, and updating the subtask state to be completed after the processing is completed.
When the concurrency strength is high, the plurality of split subtask data are processed aiming at the same data block, the occupation of an IT L slot is increased due to the fact that the concurrency is too large, and the IT L slot is not released in time, and finally the line lock conflict is caused.
In the embodiment of the present specification, the tasks may be distinguished by using the data blocks as dimensions, the tasks corresponding to different data blocks may be allocated to the task set, so that the tasks in the same task set correspond to different data blocks, and when performing concurrent processing, the tasks in the same task set are performed concurrent processing, so that each task is processed for a different data block in a concurrent processing process, thereby preventing row lock collision caused by full occupation of the IT L slot, reducing situations of task processing failure, and improving task processing efficiency.
As shown in fig. 2A, it is a flowchart of a task concurrent processing method shown in the present specification according to an exemplary embodiment, and the method includes:
in step 202, a plurality of tasks to be concurrently processed is determined, the tasks corresponding to data blocks to be processed when the task is executed.
In step 204, the tasks are allocated to one or more task sets, where the data blocks corresponding to the tasks in the same task set are different.
In step 206, the tasks in the same task set are processed concurrently.
Fig. 2B is an application scenario diagram of a task concurrent processing method according to an exemplary embodiment shown in this specification, where fig. 2B includes a plurality of task providers, a computing cluster including a plurality of computing devices, and a database, and the solution of the embodiment of this specification may be applied to a computing cluster that needs to perform task concurrent processing. And the task providing direction submits a processing task for operating certain data tables in the database to the computing cluster, and the computing cluster determines data in the data tables corresponding to the processing task from the database according to a processing object of the processing task.
In some examples, a computing device may be confronted with a large number of processing tasks, some of which may require manipulation of a large amount of data in one or more data tables, and thus a computing cluster may concurrently execute multiple processing tasks, or may break down some of the large-processing tasks into multiple subtasks and concurrently execute the subtasks. Optionally, the computing cluster may be configured with a computing device dedicated to acquiring a task, a thread pool configured to manage tasks to be concurrently processed, a computing device dedicated to processing a task, and the like, where a task requiring concurrent processing is placed in the thread pool, and the computing cluster may drag a plurality of tasks from the thread pool in batches for concurrent processing according to an actual processing capacity.
In the embodiment of the present specification, for a task to be concurrently processed, related information of the task (for example, a task identifier, an object for generating the task, a task generation time, or a processing object of the task, etc.) may be obtained.
The method for distributing the tasks to one or more task sets, wherein data blocks corresponding to the tasks in the same task set are different, comprises the following steps:
and determining data block identifications of data blocks corresponding to the tasks, and distributing the tasks to one or more task sets by using the data block identifications, wherein the tasks corresponding to the same data block identification are distributed to different task sets.
In this embodiment, the data block identifier may be obtained, and the specific obtaining manner of the data block identifier may be flexibly determined according to a database management system used in an actual scene. Taking the oracle database as an example, the line address stored in the line directory area in the data block may be read, and the data block identifier may be determined by the data object number, the data file number, and the data block number in the line address.
In the Oralce database, rowid is recorded in a row directory area in a data block, the rowid is a global unique address of a row in the Oralce database, and for each row in data, a rowid pseudo column returns the address of the row. The rowid value mainly includes the object number of the line of data, the data block in the data file where the line is located, the position of the data block in the line, and the data file where the data line is located.
The rowid generally includes 18 bits, with rowid: ooooooefffbbbbrrr as an example, wherein:
OOOOOOOO: indicating data object number (occupying 6 bits)
FFF: indicating the relative data file number (occupying 3 bits)
BBBBBB: indicating data block numbering (occupying 6 bits)
RRR: indicating the row number in the data block (occupying 3 bits)
Therefore, it can be seen from the format of rowid that the first 15 bits of each rowid can be used to distinguish different data blocks, so the data block identification can be determined by the data object number, data file number and data block number in the row address, thereby accurately distinguishing the data blocks.
In practical applications, the computing cluster may configure a state data table for recording processing states of a plurality of tasks to be concurrently processed, where the processing states include unprocessed, processing completed or processing failed, and the like. The computing cluster can salvage a part of tasks for concurrent processing according to the processing state of each task recorded in the state data table if unprocessed tasks exist, the state data table is locked in a pessimistic locking mode in the processing process, and after the concurrent processing is finished, the processing state of each task is updated in the state data table according to the processing state of whether each task is successfully finished.
IT can be understood that, when the status data table is updated in the processing status of the task, the status data table is also involved in the processing operation of the status data table, and since the data in the data table is involved in updating, IT can be known from the foregoing analysis that IT is necessary to occupy the IT L transaction slot of the data block when the data block is updated, in order to reduce possible line-lock conflicts, this embodiment may acquire the processing status of the task after concurrent processing of the tasks in the same task set, delete the processing status of the task before concurrent processing of the task recorded in the status data table, and add the processing status of the task after concurrent processing in the status data table.
Next, the embodiments of the present specification will be described again with reference to fig. 3A and 3B, where fig. 3A and 3B are both another task concurrency processing method shown in the present specification according to an exemplary embodiment. In practical applications, the following scenarios may exist: a server maintains tens of millions or even hundreds of millions of user data and the server may need to update some data for some or all of the users for some period of time. In the scene, the user data is stored in some data tables, and for the updating task, the service side can divide the task into a plurality of subtasks due to the huge data volume, so as to realize the rapid data updating. Wherein, the server may be configured with a computing cluster as shown in fig. 2B, and apply the method shown in fig. 3A, including:
in step 302, one or more processing tasks to be processed are obtained, wherein the processing tasks correspond to a data table which needs to be subjected to data processing;
in step 304, the processing task is divided into a plurality of subtasks, and a plurality of subtasks to be concurrently processed are determined, where the subtasks correspond to data blocks in a data table to be processed when the subtask is executed;
in step 306, the multiple subtasks are allocated to one or multiple task sets, where the data blocks corresponding to the subtasks in the same task set are different;
in step 308, the one or more task sets are submitted to a thread pool to concurrently process the subtasks in the same task set.
In this embodiment, for a processing task with a huge data size, the processing task may be split into multiple sub-tasks according to the processing performance of the actual device, the size of the data size, or the needs of the relevant application scenarios, and the like. For example, a processing task involving ten thousand users may be split into one hundred subtasks, each subtask involving one hundred user data updates, and after the split, the data blocks in the data table that each subtask needs to process may be correspondingly determined.
For all the split subtasks, each subtask can be traversed to obtain the rowid of the data block corresponding to each subtask. The first 15 bits (data object number, data file number and data block number in the row address) of each rowid are used as a data block identifier, the split subtasks are grouped by taking the data blocks as dimensions, so that the subtasks are distributed into a plurality of task sets, in order to prevent row lock conflicts, the data blocks corresponding to the subtasks in the same task set are different, and the number of specific task sets can be determined according to actual needs, equipment processing capacity and the like. And finally, dispatching the task set to a thread pool, fishing the task set from the thread pool by the computing cluster, and carrying out concurrent processing on the subtasks in the task set.
In practical application, a state data table may be configured in the computing cluster, and is used to record processing states of a plurality of sub-tasks to be concurrently processed, where the processing states include unprocessed, processed or failed processing, and after each sub-task is concurrently processed, if the sub-task is successfully processed or fails to be processed, the state of the sub-task needs to be updated in the state data table to be completed or failed to be processed.
On the other hand, after the subtask processing is completed, a mode of directly updating a state data table is not adopted, but a mode of deleting a newly added service document is adopted, the newly added data is written into a new data block, and an IT L slot of an original data block is vacated for other services, so that row lock conflicts are effectively reduced.
Corresponding to the embodiment of the task concurrent processing method, the specification also provides an embodiment of the task concurrent processing device and a computing device applied by the task concurrent processing device.
The embodiment of the task concurrent processing device can be applied to a computing device, such as a server or a terminal device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor in which the file processing is located. From a hardware aspect, as shown in fig. 4, the hardware structure of the computing device in which the task concurrent processing apparatus is located in this specification is shown, except for the processor 410, the memory 430, the network interface 420, and the nonvolatile memory 440 shown in fig. 4, the computing device in which the apparatus 431 is located in the embodiment may also include other hardware according to an actual function of the computing device, which is not described again.
As shown in fig. 5, fig. 5 is a block diagram of a task concurrency processing device shown in the present specification according to an exemplary embodiment, the device including:
a task determination module 51 for: determining a plurality of tasks to be subjected to concurrent processing, wherein the tasks correspond to data blocks to be processed when the tasks are executed;
a task assignment module 52 for: distributing the tasks to one or more task sets, wherein the data blocks corresponding to the tasks in the same task set are different;
a concurrency processing module 53, configured to: and carrying out concurrent processing on the tasks in the same task set.
Optionally, the task allocation module is further configured to:
and determining data block identifications of data blocks corresponding to the tasks, and distributing the tasks to one or more task sets by using the data block identifications, wherein the tasks corresponding to the same data block identification are distributed to different task sets.
Optionally, the task allocation module is further configured to:
and reading the row address stored in the row directory area in the data block, and determining the data block identifier according to the data object number, the data file number and the data block number in the row address.
Optionally, the processing states of the multiple tasks to be concurrently processed are recorded in a state data table;
the apparatus further comprises a state data table update module configured to:
after concurrent processing is carried out on tasks in the same task set, the processing state of the tasks is obtained;
deleting the processing state of the task recorded in the state data table before concurrent processing;
and newly adding the processing state of the task after concurrent processing in the state data table.
Optionally, the data block is a data block of an Oracle database management system.
As shown in fig. 6, fig. 6 is a block diagram of another task concurrent processing apparatus shown in the present specification according to an exemplary embodiment, the apparatus including:
a task fishing module 61 for: one or more processing tasks to be processed are obtained, and the processing tasks correspond to a data table which needs to be subjected to data processing;
a task splitting module 62 configured to: splitting the processing task into a plurality of subtasks, and determining a plurality of subtasks to be concurrently processed, wherein the subtasks correspond to data blocks in a data table to be processed when the subtasks are executed;
a task assignment module 63 configured to: distributing the plurality of subtasks to one or more task sets, wherein the data blocks corresponding to the subtasks in the same task set are different;
a concurrency processing module 64 configured to: and submitting the one or more task sets to a thread pool so as to carry out concurrent processing on the subtasks in the same task set.
Accordingly, this specification also provides a computing device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
determining a plurality of tasks to be subjected to concurrent processing, wherein the tasks correspond to data blocks to be processed when the tasks are executed;
distributing the tasks to one or more task sets, wherein the data blocks corresponding to the tasks in the same task set are different;
and carrying out concurrent processing on the tasks in the same task set.
Accordingly, this specification also provides a computing device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
one or more processing tasks to be processed are obtained, and the processing tasks correspond to a data table which needs to be subjected to data processing;
splitting the processing task into a plurality of subtasks, and determining a plurality of subtasks to be concurrently processed, wherein the subtasks correspond to data blocks in a data table to be processed when the subtasks are executed;
distributing the plurality of subtasks to one or more task sets, wherein the data blocks corresponding to the subtasks in the same task set are different;
and submitting the one or more task sets to a thread pool, and carrying out concurrent processing on subtasks in the same task set.
The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.