WO2017070900A1 - Procédé et appareil de traitement de tâche dans un système de traitement de signal numérique multicœur - Google Patents
Procédé et appareil de traitement de tâche dans un système de traitement de signal numérique multicœur Download PDFInfo
- Publication number
- WO2017070900A1 WO2017070900A1 PCT/CN2015/093248 CN2015093248W WO2017070900A1 WO 2017070900 A1 WO2017070900 A1 WO 2017070900A1 CN 2015093248 W CN2015093248 W CN 2015093248W WO 2017070900 A1 WO2017070900 A1 WO 2017070900A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- task
- ready
- memory
- data
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
Definitions
- Embodiments of the present invention relate to the field of digital signal processors and, more particularly, to methods and apparatus for processing tasks in a multi-core digital signal processing system.
- Static Task Scheduling Method in Related Art the software designer obtains the basic performance of each functional module according to the software task icon and the performance simulation of each functional algorithm module, and performs matching according to the capability of mapping hardware resources according to the target. According to the functional granularity, resource consumption deploys different software functions to different hardware resources, but the static task scheduling method has limited application scenarios, high scheduling complexity, and low memory resource utilization.
- the dynamic task scheduling scheme in the related art adopts a resource pool scheme of master-slave distributed scheduling, and each processor carries a tailored operating system (OS), which can support different priorities.
- the task can respond to external interrupts, etc., and the main core divides the task into the appropriate granularity into the task cache pool.
- the core is idle, the task is automatically acquired from the main core and executed.
- each slave core needs to carry an operating system, task switching, data loading will occupy a lot of slave load, and computing resources and memory resource utilization are low.
- Embodiments of the present invention provide a method and apparatus for processing a task in a multi-core digital signal processing system, which can determine a running scheduling process when a task runs, dynamically allocate computing resources, improve utilization of computing resources, and reduce system scheduling overhead.
- a method for processing a task in a multi-core digital signal processing system includes: determining a ready task in a task queue; determining a target computing unit that executes the ready task; performing the ready task by the target computing unit, and At the same time, the target computing unit is used as the task to be executed. Prepare data.
- the method for processing a task in a multi-core digital signal processing system when a task is executed by one operation unit, data is prepared for other tasks through the operation unit at the same time, thereby enabling data loading and algorithm service execution to be parallel, thereby reducing The waiting cost of data loading increases the degree of parallelism between tasks and reduces system scheduling overhead.
- the operation unit that executes the ready task is determined as the target operation unit.
- the operation unit that executes the ready task and the task dependent task of the ready task is the same operation unit, and when the ready task is executed, there is no need to load data again, and the congestion condition on the loading path is alleviated.
- the method before the performing the task is performed by the target computing unit, the method further includes: determining, in the near-end memory of the target computing unit, a memory block storing input data corresponding to the ready task; moving the input data corresponding to the ready task to the memory block.
- the determining, by the operating unit, the memory block of the near-end memory of the target computing unit for storing the input data corresponding to the ready task includes: The memory block is determined according to a fixed resource pool algorithm, wherein data stored in the near-end memory of the target unit supports docking until the user releases or replaces the data to the far-end memory when the near-end memory is insufficient.
- the method further includes: when the ready task is executed by the target computing unit, saving the output data of the ready task in the Near-end memory.
- the read and write memory when performing the task is the near-end memory, and the task does not wait for the data to reach the consumption delay when executing the task, and the fixed resource pool algorithm can reduce the memory fragmentation when applying for the memory, improve the memory turnover efficiency, and save the memory. .
- the determining, by the fixed resource pool algorithm, the memory block includes: a data block corresponding to all parameters required by the ready task
- the ratio of the sum of the sum to the size of a single block of memory in the near-end memory determines the number of blocks.
- the memory can be further improved. Use efficiency to reduce memory waste.
- the method before determining a ready task in the task queue, the method further includes: performing abstract processing on the service including the ready task, Obtaining abstract processing information, the abstract processing information including at least one of the following information: task dependency information, data dependency information, and sequence information of task execution.
- the task queue is a plurality of parallel task queues, wherein the determining the ready tasks in the task queue includes: prioritizing The level sequence polls the plurality of parallel task queues to determine the ready task.
- the abstract processing of the service including the ready task includes: creating a cache according to the requirement of the service; The cache ID is determined to determine the data dependency information.
- apparatus for processing a task in a multi-core digital signal processing system for performing the method of any of the first aspect or the first aspect of the first aspect, in particular A module of the method of the first aspect or any of the possible implementations of the first aspect.
- a computer readable medium for storing a computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
- a computer program product comprising: computer program code, wherein the computer program code is executed by a device of a processing task of a multi-core digital signal processing system, such that the device performs the first aspect or A method in any of the possible implementations of the first aspect.
- FIG. 1 is a schematic structural diagram of an application system to which an embodiment of the present invention is applied;
- FIG. 2 is a diagram of various management modules included in the scheduler in the application system shown in FIG. Schematic diagram of the system;
- FIG. 3 is a schematic diagram of data dependency relationships in an application system to which an embodiment of the present invention is applied;
- 4 is a schematic diagram of scheduling results when only one core is scheduled in an application system to which an embodiment of the present invention is applied;
- FIG. 5 is a schematic diagram of scheduling results when three cores are scheduled in an application system to which an embodiment of the present invention is applied;
- FIG. 6 is a schematic flowchart of a method for processing a task in a multi-core digital signal processing system according to an embodiment of the present invention
- FIG. 7 is another schematic flowchart of a method for processing a task in a multi-core digital signal processing system according to an embodiment of the present invention.
- FIG. 8 is still another schematic flowchart of a method for processing a task in a multi-core digital signal processing system according to an embodiment of the present invention.
- FIG. 9 is a schematic flowchart of a method for abstracting a service according to an embodiment of the present invention.
- FIG. 10 is a schematic flowchart of a method for implementing a processing task in a specific case according to an embodiment of the present invention.
- FIG. 11 is a schematic flowchart of a method of determining a ready task and an idle operation unit according to an embodiment of the present invention
- FIG. 12 is a schematic block diagram of an apparatus for processing a task in a multi-core digital signal processing system according to an embodiment of the present invention
- FIG. 13 is another schematic block diagram of an apparatus for processing a task in a multi-core digital signal processing system according to an embodiment of the present invention
- FIG. 14 is still another schematic block diagram of an apparatus for processing a task in a multi-core digital signal processing system according to an embodiment of the present invention.
- 15 is a schematic block diagram of an apparatus for processing a task in a multi-core digital signal processing system in accordance with another embodiment of the present invention.
- the technical solution of the embodiments of the present invention is mainly applied to a digital signal processing system that requires multi-core processing, and has a large number of parallel computing scenarios, such as a macro station baseband chip and a terminal chip.
- the multi-core feature is embodied by integrating the number of computing modules on a single chip, including but not limited to multiple general-purpose processors, multiple IP cores, multiple dedicated processors, and the like. If the number of calculation modules is greater than one, it is multi-core.
- FIG. 1 is a schematic structural diagram of an application system (multi-core digital signal processing system) to which an embodiment of the present invention is applied.
- the application system is composed of three parts: a main control layer, an execution layer, and an operation layer.
- the main control layer carries user software, and completes high-level information interaction, process control, task decomposition, and task dependency definition.
- the execution layer consists of three parts, the master core execution layer, the scheduler, and the slave core execution layer.
- the main control core execution layer provides a software programming interface, submits commands to the scheduler and receives command feedback or callback notifications;
- the scheduler is a hardware part, which is responsible for processing task scheduling, and specific functions include: dependency processing between tasks, memory management, Task assignment, data movement and other functions.
- the scheduler is internally managed by multiple management modules: command management, command queue management, event management, buffer descriptor management, shared memory management, computation memory management, computing resource state management, and scheduling master.
- Control module from the core execution layer to the software part, mainly responsible for receiving task messages, calling the algorithm function library to execute the cloud, and sending the task end feedback message after the operation ends.
- the computing layer can be hardware or software, and is mainly responsible for processing tasks.
- Kernels 0 to 2 are high priority processing
- Kernels 3 to 5 are medium priority processing
- Kernels 7 to 9 are low priority processing.
- FIG. 3 in order to indicate the processed data flow (the direction pointed by the arrow indicates the direction in which the data flows), the data flow between the host and the device is indicated as a buffer input/output according to the direction (Buff_In). /Out).
- the data flow between Kernel and Kernel processing is marked as cache_M (Buff_M).
- Data dependencies between different cores can be described as:
- Kernel_0/3/7 The input of Kernel_0/3/7 is Buff_In0/1/2, which is prepared by Host (the actual application may be the output from external interface or Hardware Accelerate Control (HAC)).
- Host the actual application may be the output from external interface or Hardware Accelerate Control (HAC)).
- HAC Hardware Accelerate Control
- the data is output to the Host (in actual applications, the Host usually needs to send data processed by Digital Signal Processing ("DSP”) to the off-chip, or continue processing for Hac).
- DSP Digital Signal Processing
- Kernel_2 depends on the output of Kernel_1 and also depends on the output of Kernel_5.
- Kernel_4 depends on the output of Kernel_9 and also depends on the output of Kernel_3.
- Kernel_8 depends on the output of Kernel_7 and also depends on the output of Kernel_3.
- Kernel_3 The output processed by Kernel_3 is used by Kernel_4 in addition to Kernel_4 (Kenrel_8 may only use a part of it).
- the scheduler schedules according to the priority according to the number of actual execution cores that are scheduled, while ensuring that the data dependencies are correct.
- the Host will all submit the Kernel to the Command Queue (CommandQueue), and the input data Buff_In0/1/2 is all ready, then the scheduler can only be scheduled if only one core can be scheduled.
- the scheduling result is shown in Figure 4.
- the scheduling result is shown in Figure 5.
- Figure 5 whether the dotted line Kernel_2 and Kernel_4 are scheduled on DSP2 depends on the amount of data input. .
- Kernel_4 should be dispatched to DSP2, otherwise it should be dispatched to the core where the output of Kernel_3 is located.
- a service refers to a program that processes data independently of hardware, and is a concept different from an operating system and a driver.
- the service may be, for example, data channel estimation, Fast Fourier Transformation (Fast Fourier Transformation). , referred to as "FFT" for short, and other operations such as decoding.
- FFT Fast Fourier Transformation
- a task is a software task that is a piece of program that implements a function and usually needs to run on a core processor.
- FIG. 6 is a schematic flow diagram of a method 100 of processing a task in accordance with an embodiment of the present invention.
- the method 100 can be performed by the multi-core digital signal processing system shown in FIG. 1, as shown in FIG. 6, the method 100 includes:
- the multi-core digital signal processing system determines a target arithmetic unit capable of executing the ready task, and then executes the determined target computing unit
- the ready task is performed, and at the same time, data is prepared for the task to be executed through the target arithmetic unit.
- the method for processing a task prepares data for a task to be executed by the target operation unit while executing a ready task by the target operation unit, thereby parallelizing data loading and operation, thereby reducing waiting cost of data loading. Improve parallelism between tasks and reduce system scheduling overhead.
- the ready task refers to the task that has been completed and can start the running task.
- the task to be executed can be understood as the task that needs to be executed after the ready task.
- An arithmetic unit can be understood as a core.
- the task queue is a plurality of parallel task queues
- S110 is specifically: polling the plurality of parallel task queues in a priority order to determine the ready task.
- the multi-core digital signal processing system can create a parallel task queue. These parallel tasks are in parallel relationship, but these parallel task queues have different priorities. After the tasks are sent into the task queue, each task queue is in the queue. The task is serially executed, using the first-in, first-out order principle.
- the polling may be performed in descending order of priority of multiple parallel task queues, and if there is no ready task in the high priority task queue, polling the next priority is continued.
- the task queue ends polling until a ready queue is found or the lowest priority task queue has been polled.
- the ready task is executed by the target operation unit, and at the same time, the data is prepared for the task to be executed by the target unit, which can be understood as virtualizing one operation unit into two logical resources of ping-pong, when a logical resource is allocated.
- another logical resource can also be reassigned the task, which can ensure that one logical resource is in operation, and the data of another logical resource is also prepared, thereby reducing data waiting and improving computing resource utilization.
- S120 is specifically: when determining that the operation unit of the dependent task that executes the ready task is idle, determining the operation unit of the dependent task that executes the ready task as the target operation unit.
- the dependent task of the ready task refers to a task of outputting data as input data of the ready task
- the multi-core digital signal processing system may select the idle resource according to the data location to perform the ready task, preferably, multi-core digital signal processing.
- the system records an operation unit that processes the dependent task of the ready task, and the multi-core digital signal processing system determines the arithmetic unit that executes the dependent task
- the ready task is allocated to the arithmetic unit that executes the dependent task, that is, the arithmetic unit that executes the dependent task is determined as the target arithmetic unit. Since the arithmetic unit that processes the ready task is the same arithmetic unit as the arithmetic unit that processes the ready task, it is not necessary to load data for the ready task again, and the congestion on the loading path can be alleviated.
- an idle arithmetic unit may be randomly selected from the other idle arithmetic units as the target operation unit.
- the method 100 before performing the ready task by the target computing unit, the method 100 further includes:
- S140 Determine a memory block in the near-end memory of the target operation unit for storing input data corresponding to the ready task.
- the data required to perform the ready task can be executed on other operating units or in external memory (for example, Double Rate Rate (DDR), L3 cache, etc.).
- DDR Double Rate Rate
- L3 cache L3 cache
- the memory block is determined according to a fixed resource pool algorithm, wherein data stored in a near-end memory of the target operation unit supports camping until the user releases or replaces the data to the near-end memory.
- Remote memory
- the memory space can be ranked according to the distance from the memory, and then processed according to the memory level.
- the memory allocation algorithm for the near-end memory uses a fixed resource pool algorithm, thereby reducing memory fragmentation and increasing the application. Release efficiency.
- the near-end memory of the target computing unit may be applied according to other algorithms.
- the near-end may be applied according to a linked list memory allocation algorithm, a partner algorithm, a memory pool-based buddy algorithm, a working set algorithm, and the like. Memory, but the invention is not limited thereto.
- the S140 is specifically: determining, according to a ratio of a total of the data blocks corresponding to all parameters required by the ready task to a size of a single memory block in the near-end memory, determining a memory block in the near-end memory that needs to be applied for. Quantity.
- the task is equivalent to a function with parameters, and each parameter may be a piece.
- Data, or numerical values can be assembled into the same memory block or chunks after being assembled. For example, suppose task A has 10 parameters, each of which is a data block type. If the total size of the data blocks corresponding to the 10 parameters is 31K, and the size of a single memory block in the near-end memory is 4K, you need to apply. The number of memory blocks is eight. This can improve memory usage efficiency and reduce memory waste.
- the method 100 further includes:
- S160 Perform abstract processing on the service including the ready task, and obtain abstract processing information, where the abstract processing information includes at least one of the following information: task dependency information, data dependency information, and sequence information of task execution.
- a business can be split into multiple tasks to abstract the business.
- a buffer may be created according to the needs of the service, and the data dependency information is determined according to the cached ID of the cache.
- Buffer is a data storage space that loads data before the task starts and is destroyed when the task does not need this buffer.
- Each buffer has an ID, and the data relationship between tasks is related by this ID. Assuming that the output data of task A is the input data of task B, the output buffer of task A is buffer2, and the input of task B is also buffer2. .
- the creation of the buffer may be determined by the programmer according to the business needs, and the number of buffers actually created is dynamically determined according to the actual task execution process.
- the task dependency information is used to indicate the dependencies between the tasks, and the dependencies between the tasks may be associated by the event.
- the task A may choose to publish an event ID, and the task B needs to wait for the task A to complete.
- the event ID of event A is filled in the waiting event list of B, and the description of the waiting event includes the number of waiting events and the ID of the waiting event.
- task input and output data features may be described, and a limited number of input and output parameters are supported.
- the parameters support different features: input buffer, output buffer, external input pointer, incoming value, global pointer, and the like.
- the output data of the ready task may be saved in the near-end memory when the ready task is executed by the target operation unit.
- the next task When loading into the same arithmetic unit, it is not necessary to load again, and the congestion on the loading path is alleviated.
- the S160 may be executed by the master core execution layer in the architecture diagram shown in FIG. 1.
- the master core execution layer may be a software programming interface, and the software interface generates command submission to the schedule. Execution.
- the process of abstracting the service including the ready task, and obtaining the abstract processing information may include the following steps:
- the execution function library is called from the kernel execution layer, and the main core execution layer registers the function pointer or index into the function list;
- the set of functional resources includes resources for performing tasks, that is, which operating units are running tasks, and which direct memory access (Direct Memory Access, referred to as "DMA") is used.
- DMA Direct Memory Access
- Queues have different priorities. Queues are in parallel relationship. Tasks are sent to the queue and serially executed in the queue. The first-in, first-out ordering principle is adopted.
- the buffer is a piece of data storage space. The data is loaded before the task starts, and is destroyed when the task does not need the buffer data.
- Each buffer has an ID. The ID is used to correlate the data relationship between tasks. That is, the output data of task A is the input data of task B, then the output buffer of task A is buffer2, and the input buffer of task B is also buffer2. .
- the dependencies between the tasks can be associated by the event, that is, the task A can choose to publish an event ID, the task B needs to wait for the task A to complete, fill in the event ID of the event A in its waiting event list, and wait for the description of the event. Includes a list of the number of wait events and the ID of the wait event.
- the multi-core digital signal processing system of the embodiment of the invention supports a limited number of input and output parameters, the parameters support different features, input buffer, output buffer, external input pointer, incoming value and global pointer.
- a method 200 for processing a task includes:
- the scheduler creates a response process of the function resource set.
- the response processing for creating a set of functional resources includes initialization of the arithmetic unit, initialization of the storage manager in the arithmetic unit, and initialization of the shared memory.
- the scheduler waits for a command sent by the main core execution layer, and performs command processing.
- the scheduler polls the parallel queue according to the priority of high to low, and finds the ready task.
- the scheduler selects an idle operation unit in the operation unit set, and prepares data required for the task;
- the idle arithmetic unit can be virtualized into two logical resources of ping-pong.
- Another logical resource can also re-allocate the task, so that one logical resource is operated, and the other logical resource is also Prepare to reduce data waiting and improve computing resource utilization.
- the memory can be processed according to the level of memory.
- the memory allocation algorithm for the near-end memory uses a fixed resource pool application to reduce memory fragmentation and improve application release efficiency.
- the data for the near-end memory supports docking until the user releases, or the memory is not enough to replace the data with the remote memory (substitution according to the user-set permutation level). Because it is a fixed-size memory application, memory waste is reduced in order to improve memory usage efficiency.
- a memory lock can be set to ensure consistency of data reading and writing, thereby automatically solving the consistency problem of simultaneous reading and writing of multiple cores (arithmetic units). Specifically, it may be set that the data in the file cannot be read by the task when the data of the file is rewritten; or the data of the file cannot be overwritten when the data of the file is read by the task. However, it is permissible for multiple tasks to simultaneously read the data in the file.
- Determining the ready task and idle in steps S204 and S205 will be specifically described below with reference to FIG.
- the method of Figure 11 is performed by a scheduler in a digital signal processing system.
- the parallel queue is polled, and it is determined whether the lowest priority queue is polled;
- a queue After the task is sent to the queue, it is executed serially, and is executed according to the first-in, first-out order.
- S303 if there is no ready task in the ready queue of the current priority, it is queried whether there is a ready queue in the queue of the next priority, that is, S301 and its subsequent steps are re-executed.
- An idle computing resource should be understood as a logical resource of an arithmetic unit.
- the buddy resource of the idle computing resource refers to another logical resource of the arithmetic unit in S307.
- the task when the task has been deployed on the partner resource acquired in S308, the task is prepared on the idle computing resource that is dependent on the task deployed on the partner resource. After the data is prepared for the task that has dependencies on the tasks deployed on the partner resource, S306 and its subsequent steps can be continued to prepare data for other tasks.
- the method for processing a task prepares a data for a task to be executed by the target operation unit while executing a ready task by the target operation unit, thereby loading the data Parallel to the operation, reducing the waiting overhead of data loading, increasing the degree of parallelism between tasks, and reducing system scheduling overhead.
- the apparatus 10 includes:
- a determining module 11 for determining a ready task in the task queue
- the determining module 11 is further configured to determine a target computing unit that performs the ready task
- the task execution module 12 is configured to execute the ready task by the target operation unit, and simultaneously prepare data for the task to be executed by the target operation unit.
- the apparatus for processing a task in the multi-core digital signal processing system of the embodiment of the present invention prepares data for the task to be executed through the target arithmetic unit while executing the ready task by the target arithmetic unit, thereby enabling data loading and operation to be parallel. Reduce the waiting overhead of data loading, increase the degree of parallelism between tasks, and reduce system scheduling overhead.
- the determining module 11 is specifically configured to: when determining that the operation unit of the dependent task executing the ready task is idle, The task-dependent arithmetic unit of the ready task is determined to be the target arithmetic unit.
- the device further includes a memory application module 13;
- the memory application module 13 is specifically configured to: determine, in the near-end memory of the target operation unit, the input data corresponding to the ready task, before the task execution module 12 executes the ready task by the target operation unit. Memory block; move the input data corresponding to the ready task to the memory block.
- the memory application module 13 is specifically configured to: The resource pool algorithm determines the memory block, wherein the data stored in the near-end memory of the target unit supports docking until the user releases or replaces the data to the far-end memory when the near-end memory is insufficient.
- the memory application module 13 is specifically configured to: sum the data blocks corresponding to all parameters required by the ready task and the near The ratio of the size of a single block of memory in the end memory, determining the number of blocks of memory in the near-end memory that need to be applied.
- the device further includes:
- the service abstraction module 14 is configured to perform abstract processing on the service including the ready task before the determining module 10 determines the ready task in the task queue, to obtain abstract processing information, where the abstract processing information includes at least one of the following information: Task dependency information, data dependency information, and sequence information for task execution.
- the task queue is a plurality of parallel task queues
- the determining module 11 is specifically configured to:
- the plurality of parallel task queues are polled in order of priority to determine the ready task.
- the service abstraction module 14 is specifically configured to: create a cache according to the requirement of the service; determine according to the cached cache ID of the cache. The data depends on the relationship information.
- the task execution module 12 is further configured to: when the ready task is executed by the target operation unit, save the output data of the ready task in the near-end memory.
- the apparatus for processing a task in the multi-core digital signal processing system of the embodiment of the present invention prepares data for the task to be executed through the target arithmetic unit while executing the ready task by the target arithmetic unit, thereby enabling data loading and operation to be parallel. Reduce the waiting overhead of data loading, increase the degree of parallelism between tasks, and reduce system scheduling overhead.
- Figure 15 shows a schematic block diagram of an apparatus 100 for processing tasks in a multi-core digital signal processing system in accordance with another embodiment of the present invention.
- the hardware structure of the apparatus 100 for processing tasks in the multi-core digital signal processing system may include three parts: a transceiver device 101, a software device 102, and a hardware device 103.
- the transceiver device 101 is a hardware circuit for completing packet transmission and reception;
- the hardware device 103 can also be a "hardware processing module” or simpler, and can also be simply referred to as "hardware”.
- the hardware device 103 mainly includes a hardware circuit based on an FPGA, an ASIC (and other supporting devices, such as a memory). Hardware circuits that implement certain functions are often processed much faster than general-purpose processors, but once they are customized, they are difficult to change. Therefore, they are not flexible to implement. They are usually used to handle some fixed functions. It is noted that the hardware device 103 may also include an MCU (microprocessor such as a single chip microcomputer) or a CPU in practical applications. Processors, but the main function of these processors is not to complete the processing of big data, but mainly for some control. In this application scenario, the system with these devices is a hardware device.
- MCU microprocessor such as a single chip microcomputer
- the software device 102 (or simply "software") mainly includes a general-purpose processor (such as a CPU) and some supporting devices (such as a memory device such as a memory or a hard disk), and can be programmed to have a corresponding processing function. When implemented in software, it can be flexibly configured according to business needs, but it is often slower than hardware devices.
- the processed data may be transmitted through the transceiver device 101 through the hardware device 103, or the processed data may be transmitted to the transceiver device 101 through an interface connected to the transceiver device 101.
- the hardware device 103 is configured to: determine a ready task in the task queue; determine a target operation unit that executes the ready task; execute the ready task by the target operation unit, and simultaneously pass the target operation The unit prepares data for the task to be executed.
- the hardware device 103 in determining a target operation unit that performs the ready task, is specifically configured to: when determining that the task-dependent operation unit that executes the ready task is idle, perform the ready task
- the task-dependent arithmetic unit is determined to be the target arithmetic unit.
- the hardware device 103 before performing the ready task by the target computing unit, is specifically configured to: determine, in a near-end memory of the target computing unit, to store an input corresponding to the ready task. A memory block of data; the input data corresponding to the ready task is moved to the memory block.
- the hardware device 103 in determining a memory block in the near-end memory of the target operation unit for storing input data corresponding to the ready task, is specifically configured to: according to a fixed resource pool algorithm Determining the memory block, wherein the data stored in the near-end memory of the target arithmetic unit supports camping until the user releases or replaces the data to the remote memory when the near-end memory is insufficient.
- the hardware device 103 in determining the memory block according to the fixed resource pool algorithm, is specifically configured to: a sum of a database corresponding to all parameters required by the ready task and a memory in the near end memory The ratio of the size of a single memory block to determine the number of memory blocks.
- the hardware device 103 is further configured to: after determining a ready task in the task queue, perform abstract processing on the service including the ready task to obtain abstract processing information, where the abstract processing information includes the following information. At least one of: task dependency information, data dependency information, and sequence information of task execution.
- the task queue is a plurality of parallel task queues;
- the hardware device 103 is specifically configured to: create a cache according to the needs of the service; and determine the data dependency information according to the cached cache ID.
- the hardware device 103 is further configured to: when the ready task is executed by the target operation unit, save the output data of the ready task in the near-end memory.
- the disclosed systems, devices, and methods may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
- the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
- the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
La présente invention concerne un procédé et un appareil de traitement de tâche dans un système de traitement de signal numérique multicœur. Dans le processus de traitement d'une tâche, le surdébit d'attente de chargement de données est réduit, un degré de parallélisme entre les tâches est améliorée, et le surdébit de programmation d'un système est réduit. Le procédé comprend les étapes consistant : à déterminer une tâche prête dans une file de tâches (S110) ; à déterminer une unité d'opération cible pour exécuter la tâche prête (S120) ; et à exécuter la tâche prête au moyen de l'unité d'opération cible, et en même temps, à préparer, au moyen de l'unité d'opération cible, des données pour une tâche à exécuter (S130).
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2015/093248 WO2017070900A1 (fr) | 2015-10-29 | 2015-10-29 | Procédé et appareil de traitement de tâche dans un système de traitement de signal numérique multicœur |
| CN201580083942.3A CN108351783A (zh) | 2015-10-29 | 2015-10-29 | 多核数字信号处理系统中处理任务的方法和装置 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2015/093248 WO2017070900A1 (fr) | 2015-10-29 | 2015-10-29 | Procédé et appareil de traitement de tâche dans un système de traitement de signal numérique multicœur |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017070900A1 true WO2017070900A1 (fr) | 2017-05-04 |
Family
ID=58629684
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2015/093248 Ceased WO2017070900A1 (fr) | 2015-10-29 | 2015-10-29 | Procédé et appareil de traitement de tâche dans un système de traitement de signal numérique multicœur |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN108351783A (fr) |
| WO (1) | WO2017070900A1 (fr) |
Cited By (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109697122A (zh) * | 2017-10-20 | 2019-04-30 | 华为技术有限公司 | 任务处理方法、设备及计算机存储介质 |
| CN109725994A (zh) * | 2018-06-15 | 2019-05-07 | 中国平安人寿保险股份有限公司 | 数据抽取任务执行方法、装置、终端及可读存储介质 |
| CN110968418A (zh) * | 2018-09-30 | 2020-04-07 | 北京忆恒创源科技有限公司 | 基于信号-槽的大规模有约束并发任务的调度方法与装置 |
| CN111104168A (zh) * | 2018-10-25 | 2020-05-05 | 杭州嘉楠耘智信息科技有限公司 | 一种计算结果提交方法及装置 |
| CN111104167A (zh) * | 2018-10-25 | 2020-05-05 | 杭州嘉楠耘智信息科技有限公司 | 一种计算结果提交方法及装置 |
| CN111309482A (zh) * | 2020-02-20 | 2020-06-19 | 浙江亿邦通信科技有限公司 | 矿机控制器任务分配系统、装置及其可存储介质 |
| CN112148454A (zh) * | 2020-09-29 | 2020-12-29 | 行星算力(深圳)科技有限公司 | 一种支持串行和并行的边缘计算方法及电子设备 |
| CN112365002A (zh) * | 2020-11-11 | 2021-02-12 | 深圳力维智联技术有限公司 | 基于spark的模型构建方法、装置、系统及存储介质 |
| CN112667386A (zh) * | 2021-01-18 | 2021-04-16 | 青岛海尔科技有限公司 | 任务管理方法和装置、存储介质及电子设备 |
| CN112823343A (zh) * | 2020-03-11 | 2021-05-18 | 深圳市大疆创新科技有限公司 | 直接内存存取单元、处理器、设备、处理方法及存储介质 |
| CN113138812A (zh) * | 2021-04-23 | 2021-07-20 | 中国人民解放军63920部队 | 航天器任务调度方法及装置 |
| CN114048039A (zh) * | 2021-11-25 | 2022-02-15 | 中科计算技术西部研究院 | 一种用于分布式训练的流式计算系统、方法及装置 |
| CN115658325A (zh) * | 2022-11-18 | 2023-01-31 | 北京市大数据中心 | 数据处理方法、装置、多核处理器、电子设备以及介质 |
| CN116107724A (zh) * | 2023-04-04 | 2023-05-12 | 山东浪潮科学研究院有限公司 | 一种ai加速核调度管理方法、装置、设备及存储介质 |
| CN116205299A (zh) * | 2023-01-18 | 2023-06-02 | 华东计算技术研究所(中国电子科技集团公司第三十二研究所) | 异步并行的量子芯片自动化标定软件设计方法及系统 |
| CN117633914A (zh) * | 2024-01-25 | 2024-03-01 | 深圳市纽创信安科技开发有限公司 | 基于芯片的密码资源调度方法、设备和存储介质 |
| CN119225932A (zh) * | 2024-09-23 | 2024-12-31 | 厦门熵基科技有限公司 | 多任务编排处理方法、装置、存储介质及计算机设备 |
| CN119294315A (zh) * | 2024-12-10 | 2025-01-10 | 奕行智能科技(广州)有限公司 | 一种并行队列调度电路的验证方法 |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110825461B (zh) * | 2018-08-10 | 2024-01-05 | 北京百度网讯科技有限公司 | 数据处理方法和装置 |
| CN111026539B (zh) * | 2018-10-10 | 2022-12-02 | 上海寒武纪信息科技有限公司 | 通信任务处理方法、任务缓存装置及存储介质 |
| CN111324427B (zh) * | 2018-12-14 | 2023-07-28 | 深圳云天励飞技术有限公司 | 一种基于dsp的任务调度方法及装置 |
| CN111767121B (zh) * | 2019-04-02 | 2022-11-01 | 上海寒武纪信息科技有限公司 | 运算方法、装置及相关产品 |
| CN114265808B (zh) * | 2021-12-22 | 2024-12-10 | 杭州和利时自动化有限公司 | 一种通信方法、装置、ProfibusDP主站及介质 |
| CN114900486B (zh) * | 2022-05-09 | 2023-08-08 | 江苏新质信息科技有限公司 | 基于fpga的多算法核调用方法及系统 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102955685A (zh) * | 2011-08-17 | 2013-03-06 | 上海贝尔股份有限公司 | 多核dsp及其系统和调度器 |
| CN103440173A (zh) * | 2013-08-23 | 2013-12-11 | 华为技术有限公司 | 一种多核处理器的调度方法和相关装置 |
| US20140068624A1 (en) * | 2012-09-04 | 2014-03-06 | Microsoft Corporation | Quota-based resource management |
| CN104598426A (zh) * | 2013-10-30 | 2015-05-06 | 联发科技股份有限公司 | 用于异构多核处理器系统的任务调度方法 |
| CN104714785A (zh) * | 2015-03-31 | 2015-06-17 | 中芯睿智(北京)微电子科技有限公司 | 任务调度装置、方法及并行处理数据的设备 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2005284749A (ja) * | 2004-03-30 | 2005-10-13 | Kyushu Univ | 並列処理コンピュータ |
| CN1329825C (zh) * | 2004-10-08 | 2007-08-01 | 华为技术有限公司 | 基于数字信号处理器的多任务处理方法 |
| CN100361081C (zh) * | 2005-01-18 | 2008-01-09 | 华为技术有限公司 | 处理多线程/多任务/多处理器的方法 |
| WO2007104330A1 (fr) * | 2006-03-15 | 2007-09-20 | Freescale Semiconductor, Inc. | Procédé et appareil de programmation de tâches |
| CN101610399B (zh) * | 2009-07-22 | 2010-12-08 | 杭州华三通信技术有限公司 | 计划类业务调度系统和实现计划类业务调度的方法 |
| CN102542379B (zh) * | 2010-12-20 | 2015-03-11 | 中国移动通信集团公司 | 一种计划任务处理方法、系统及装置 |
| CN102096857B (zh) * | 2010-12-27 | 2013-05-29 | 大唐软件技术股份有限公司 | 一种数据处理过程的协同方法和装置 |
-
2015
- 2015-10-29 CN CN201580083942.3A patent/CN108351783A/zh active Pending
- 2015-10-29 WO PCT/CN2015/093248 patent/WO2017070900A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102955685A (zh) * | 2011-08-17 | 2013-03-06 | 上海贝尔股份有限公司 | 多核dsp及其系统和调度器 |
| US20140068624A1 (en) * | 2012-09-04 | 2014-03-06 | Microsoft Corporation | Quota-based resource management |
| CN103440173A (zh) * | 2013-08-23 | 2013-12-11 | 华为技术有限公司 | 一种多核处理器的调度方法和相关装置 |
| CN104598426A (zh) * | 2013-10-30 | 2015-05-06 | 联发科技股份有限公司 | 用于异构多核处理器系统的任务调度方法 |
| CN104714785A (zh) * | 2015-03-31 | 2015-06-17 | 中芯睿智(北京)微电子科技有限公司 | 任务调度装置、方法及并行处理数据的设备 |
Cited By (25)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109697122A (zh) * | 2017-10-20 | 2019-04-30 | 华为技术有限公司 | 任务处理方法、设备及计算机存储介质 |
| CN109697122B (zh) * | 2017-10-20 | 2024-03-15 | 华为技术有限公司 | 任务处理方法、设备及计算机存储介质 |
| CN109725994A (zh) * | 2018-06-15 | 2019-05-07 | 中国平安人寿保险股份有限公司 | 数据抽取任务执行方法、装置、终端及可读存储介质 |
| CN109725994B (zh) * | 2018-06-15 | 2024-02-06 | 中国平安人寿保险股份有限公司 | 数据抽取任务执行方法、装置、终端及可读存储介质 |
| CN110968418A (zh) * | 2018-09-30 | 2020-04-07 | 北京忆恒创源科技有限公司 | 基于信号-槽的大规模有约束并发任务的调度方法与装置 |
| CN111104167A (zh) * | 2018-10-25 | 2020-05-05 | 杭州嘉楠耘智信息科技有限公司 | 一种计算结果提交方法及装置 |
| CN111104168B (zh) * | 2018-10-25 | 2023-05-12 | 上海嘉楠捷思信息技术有限公司 | 一种计算结果提交方法及装置 |
| CN111104168A (zh) * | 2018-10-25 | 2020-05-05 | 杭州嘉楠耘智信息科技有限公司 | 一种计算结果提交方法及装置 |
| CN111104167B (zh) * | 2018-10-25 | 2023-07-21 | 上海嘉楠捷思信息技术有限公司 | 一种计算结果提交方法及装置 |
| CN111309482A (zh) * | 2020-02-20 | 2020-06-19 | 浙江亿邦通信科技有限公司 | 矿机控制器任务分配系统、装置及其可存储介质 |
| CN111309482B (zh) * | 2020-02-20 | 2023-08-15 | 浙江亿邦通信科技有限公司 | 基于哈希算法的区块链任务分配系统、装置及可存储介质 |
| CN112823343A (zh) * | 2020-03-11 | 2021-05-18 | 深圳市大疆创新科技有限公司 | 直接内存存取单元、处理器、设备、处理方法及存储介质 |
| CN112148454A (zh) * | 2020-09-29 | 2020-12-29 | 行星算力(深圳)科技有限公司 | 一种支持串行和并行的边缘计算方法及电子设备 |
| CN112365002A (zh) * | 2020-11-11 | 2021-02-12 | 深圳力维智联技术有限公司 | 基于spark的模型构建方法、装置、系统及存储介质 |
| CN112667386A (zh) * | 2021-01-18 | 2021-04-16 | 青岛海尔科技有限公司 | 任务管理方法和装置、存储介质及电子设备 |
| CN113138812A (zh) * | 2021-04-23 | 2021-07-20 | 中国人民解放军63920部队 | 航天器任务调度方法及装置 |
| CN114048039A (zh) * | 2021-11-25 | 2022-02-15 | 中科计算技术西部研究院 | 一种用于分布式训练的流式计算系统、方法及装置 |
| CN115658325A (zh) * | 2022-11-18 | 2023-01-31 | 北京市大数据中心 | 数据处理方法、装置、多核处理器、电子设备以及介质 |
| CN115658325B (zh) * | 2022-11-18 | 2024-01-23 | 北京市大数据中心 | 数据处理方法、装置、多核处理器、电子设备以及介质 |
| CN116205299A (zh) * | 2023-01-18 | 2023-06-02 | 华东计算技术研究所(中国电子科技集团公司第三十二研究所) | 异步并行的量子芯片自动化标定软件设计方法及系统 |
| CN116107724A (zh) * | 2023-04-04 | 2023-05-12 | 山东浪潮科学研究院有限公司 | 一种ai加速核调度管理方法、装置、设备及存储介质 |
| CN117633914A (zh) * | 2024-01-25 | 2024-03-01 | 深圳市纽创信安科技开发有限公司 | 基于芯片的密码资源调度方法、设备和存储介质 |
| CN117633914B (zh) * | 2024-01-25 | 2024-05-10 | 深圳市纽创信安科技开发有限公司 | 基于芯片的密码资源调度方法、设备和存储介质 |
| CN119225932A (zh) * | 2024-09-23 | 2024-12-31 | 厦门熵基科技有限公司 | 多任务编排处理方法、装置、存储介质及计算机设备 |
| CN119294315A (zh) * | 2024-12-10 | 2025-01-10 | 奕行智能科技(广州)有限公司 | 一种并行队列调度电路的验证方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108351783A (zh) | 2018-07-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2017070900A1 (fr) | Procédé et appareil de traitement de tâche dans un système de traitement de signal numérique multicœur | |
| US10467725B2 (en) | Managing access to a resource pool of graphics processing units under fine grain control | |
| US10891158B2 (en) | Task scheduling method and apparatus | |
| CN110489213B (zh) | 一种任务处理方法及处理装置、计算机系统 | |
| CN105045658B (zh) | 一种利用多核嵌入式dsp实现动态任务调度分发的方法 | |
| US10275558B2 (en) | Technologies for providing FPGA infrastructure-as-a-service computing capabilities | |
| US9973512B2 (en) | Determining variable wait time in an asynchronous call-back system based on calculated average sub-queue wait time | |
| CN108064377B (zh) | 一种多系统共享内存的管理方法及装置 | |
| US10013264B2 (en) | Affinity of virtual processor dispatching | |
| JP2009265963A (ja) | 情報処理システム及びタスクの実行制御方法 | |
| US11347546B2 (en) | Task scheduling method and device, and computer storage medium | |
| US9471387B2 (en) | Scheduling in job execution | |
| CN114168271A (zh) | 一种任务调度方法、电子设备及存储介质 | |
| CN111240813A (zh) | 一种dma调度方法、装置和计算机可读存储介质 | |
| US11494228B2 (en) | Calculator and job scheduling between jobs within a job switching group | |
| CN110543351A (zh) | 数据处理方法以及计算机设备 | |
| CN104156663A (zh) | 一种硬件虚拟端口及处理器系统 | |
| Abeni et al. | EDF scheduling of real-time tasks on multiple cores: Adaptive partitioning vs. global scheduling | |
| CN113439260B (zh) | 针对低时延存储设备的i/o完成轮询 | |
| CN116795490A (zh) | 一种vCPU调度方法、装置、设备及存储介质 | |
| US10261817B2 (en) | System on a chip and method for a controller supported virtual machine monitor | |
| US20150363227A1 (en) | Data processing unit and method for operating a data processing unit | |
| US20140237149A1 (en) | Sending a next request to a resource before a completion interrupt for a previous request | |
| EP4542386A1 (fr) | Procédé et appareil de traitement de données | |
| KR20250103201A (ko) | 다중 pe pim 아키텍처에서의 자원 관리 장치 및 방법 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15906963 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 15906963 Country of ref document: EP Kind code of ref document: A1 |