CN104391821A

CN104391821A - System level model building method of multiple core sharing SIMD coprocessor

Info

Publication number: CN104391821A
Application number: CN201410669796.XA
Authority: CN
Inventors: 郭炜; 崔鲁平; 魏继增
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2014-11-20
Filing date: 2014-11-20
Publication date: 2015-03-04

Abstract

A system-level model building method for multi-core shared SIMD coprocessors, including a system-on-chip, said system-on-chip is provided with n cores and n vector coprocessors, wherein n is a positive even number, said n The vector coprocessor is connected to the n cores through a crossbar switch, and is also connected to the n cores, the n vector coprocessors and the crossbar switch for scheduling vectors through the crossbar switch A scheduler in which the coprocessor is connected to the core, wherein the scheduler schedules the vector coprocessor according to the current state of each vector coprocessor. The invention significantly improves the resource utilization rate of the SIMD vector coprocessor through the sharing mechanism, reduces system power consumption, and under the condition of certain resources, the efficiency of task completion will be higher.

Description

A system-level model building method for multi-core shared SIMD coprocessors

技术领域technical field

本发明涉及一种处理器的系统级模型。特别是涉及一种多核共享SIMD协处理器的系统级模型构建方法。The invention relates to a system-level model of a processor. In particular, it involves a system-level model building method for multi-core shared SIMD coprocessors.

背景技术Background technique

SIMD(Single Instruction Multiple Data)是一种实现数据级并行的技术，对多个数据执行相同的操作。SIMD技术的关键是在一条单独的指令中同时执行多个运算操作，以增加处理器的吞吐量，这一特点使得SIMD技术特别适合于多媒体应用等数据密集型运算。现在主流的处理器都有其SIMD指令子集，如X86的MMX或SSE，ARM的NEON指令子集，PowerPC的Altivec指令子集等。在现代的多核处理器中，处理器上的每个核通常都会配有一个专属的SIMD协处理器，也称为Vector Coprocessor(VP)。然而，由于其专属属性，当某一个核执行一个缺乏数据级并行性的程序时，该SIMD协处理器处于闲置状态，而其他核可能正在执行有数据级并行性的程序，但只能使用属于该核的SIMD协处理器，而不能使用其他闲置的SIMD协处理器，从而造成资源的浪费以及功耗的增加。SIMD (Single Instruction Multiple Data) is a data-level parallel technology that performs the same operation on multiple data. The key to SIMD technology is to perform multiple operations simultaneously in a single instruction to increase the throughput of the processor. This feature makes SIMD technology especially suitable for data-intensive operations such as multimedia applications. Now mainstream processors have their SIMD instruction subsets, such as MMX or SSE of X86, NEON instruction subset of ARM, Altivec instruction subset of PowerPC, etc. In modern multi-core processors, each core on the processor is usually equipped with a dedicated SIMD coprocessor, also known as Vector Coprocessor (VP). However, due to its proprietary nature, when one core executes a program that lacks data-level parallelism, the SIMD coprocessor is idle, while other cores may be executing programs that have data-level parallelism, but only using the The SIMD coprocessor of this core cannot use other idle SIMD coprocessors, resulting in a waste of resources and an increase in power consumption.

如图1所示为传统的体系结构，假设一个片上系统有4个核和4个VP，。在该结构中，每个VP都是专属于某一个核，不能被其他核所共享。当某一个核没有执行数据密集型的程序时，该VP处于闲置状态，从而造成资源和功耗的浪费。As shown in Figure 1, it is a traditional architecture, assuming that a system-on-chip has 4 cores and 4 VPs. In this structure, each VP is dedicated to a certain core and cannot be shared by other cores. When a certain core is not executing a data-intensive program, the VP is in an idle state, resulting in waste of resources and power consumption.

发明内容Contents of the invention

本发明所要解决的技术问题是，提供一种能够提高向量协处理器的资源利用率，降低系统功耗多核共享SIMD协处理器的系统级模型。The technical problem to be solved by the present invention is to provide a system-level model that can improve the resource utilization rate of the vector coprocessor and reduce system power consumption and multi-core share the SIMD coprocessor.

本发明所采用的技术方案是：一种多核共享SIMD协处理器的系统级模型构建方法，包括有片上系统，所述的片上系统上设置有n个核和n个向量协处理器，其中n为正偶数，所述的n个向量协处理器通过一个交叉开关与所述的n个核相连接，还设置有分别与所述的n个核、n个向量协处理器和交叉开关相连用于通过所述的交叉开关调度向量协处理器与所述的核连通的调度器，其中，所述的调度器是根据每个向量协处理器当前状态来调度向量协处理器。The technical scheme adopted in the present invention is: a system-level model building method for multi-core shared SIMD coprocessors, including a system-on-chip, and the system-on-chip is provided with n cores and n vector coprocessors, wherein n is a positive even number, the n vector coprocessors are connected to the n cores through a crossbar switch, and are also provided with the n cores, the n vector coprocessors and the crossbar switch to be connected respectively The scheduler is used to schedule the vector coprocessor connected to the core through the crossbar switch, wherein the scheduler schedules the vector coprocessor according to the current state of each vector coprocessor.

所述的每个向量协处理器是通过3个状态寄存器来描述当前所处的状态，其中，Each of the vector coprocessors describes the current state through 3 state registers, wherein,

第一状态寄存器，用于描述所在的向量协处理器当前正在被n个核中的哪个核使用，或是没有被任何一个核使用，当向量协处理器当前没有被任何一个核使用，则设定为所述的向量协处理器处在空闲状态，能够由调度器来调度；The first status register is used to describe which core in the n cores the vector coprocessor is currently using, or is not being used by any core. When the vector coprocessor is not currently being used by any core, set It is determined that the vector coprocessor is in an idle state and can be scheduled by a scheduler;

第二状态寄存器，用于描述所在的向量协处理器当前是处于共享状态还是处于专属状态，设定处于共享状态的向量协处理器能够被调度器调度，而处于专属状态的向量协处理器无法被调度器调度；The second state register is used to describe whether the vector coprocessor is currently in the shared state or in the exclusive state. It is set that the vector coprocessor in the shared state can be scheduled by the scheduler, but the vector coprocessor in the exclusive state cannot be scheduled. Scheduled by the scheduler;

第三状态寄存器，用于描述所在的向量协处理器在所处核中所有向量协处理器中的索引。The third status register is used to describe the index of the vector coprocessor in all the vector coprocessors in the core.

当一个核当前正在使用多个向量协处理器时，其中只有一个向量协处理器处于专属状态，其他向量协处理器则都是处于共享状态。When a core is currently using multiple vector coprocessors, only one vector coprocessor is in a dedicated state, and the other vector coprocessors are all in a shared state.

片上系统在初始状态时，每个向量协处理器都处在专属状态，其中第一个向量协处理器专属于第一个核，第二个向量协处理器专属于第二个核，依次类推，每个向量协处理器的索引都为0；向量协处理器由专属状态改变为共享状态的条件是：使用所述向量协处理器的核主动让出所述向量协处理器的使用权，此后所述向量协处理器由调度器来调度。In the initial state of the SoC, each vector coprocessor is in a dedicated state, where the first vector coprocessor is dedicated to the first core, the second vector coprocessor is dedicated to the second core, and so on , the index of each vector coprocessor is 0; the condition for the vector coprocessor to change from the exclusive state to the shared state is: the core using the vector coprocessor actively gives up the use right of the vector coprocessor, Thereafter the vector coprocessor is scheduled by the scheduler.

当n个核中的任一个核因执行有数据级并行性的程序需要更多的向量协处理器来参与运算，则所述的核需向调度器申请更多的向量协处理器，调度器运行一种负载均衡的调度算法，若当前有处在空闲状态的向量协处理器，则调度器将所述的向量协处理器分配给正在申请的那个核，若当前没有空闲的向量协处理器，但是有处在共享状态的向量协处理器，则调度器根据当时情况依据负载均衡策略进行向量协处理器资源的重新分配。When any one of the n cores requires more vector coprocessors to participate in operations due to the execution of programs with data-level parallelism, the core needs to apply for more vector coprocessors from the scheduler, and the scheduler Run a load balancing scheduling algorithm. If there is currently an idle vector coprocessor, the scheduler will assign the vector coprocessor to the core that is applying for it. If there is no idle vector coprocessor currently , but there is a vector coprocessor in a shared state, the scheduler redistributes the vector coprocessor resources according to the load balancing strategy according to the current situation.

本发明的一种多核共享SIMD协处理器的系统级模型构建方法，通过共享机制，显著提高SIMD向量协处理器的资源利用率，降低系统功耗，在资源一定的情况下，任务完成的效率会更高。A system-level model building method for a multi-core shared SIMD coprocessor of the present invention can significantly improve the resource utilization rate of the SIMD vector coprocessor through the sharing mechanism, reduce system power consumption, and achieve task completion efficiency under certain resources will be higher.

附图说明Description of drawings

图1是传统的片上系统结构；Fig. 1 is a traditional system-on-chip structure;

图2是采用本发明的方法构建的一个4核通过交叉开关共享4个VP的系统级模型；Fig. 2 is a system-level model that adopts the method of the present invention to build a 4 cores to share 4 VPs by a crossbar;

图3是一个调度示例。Figure 3 is an example of scheduling.

图中in the picture

1：核 2：向量协处理器1: Core 2: Vector coprocessor

3：交叉开关 4：调度器3: Crossbar 4: Scheduler

具体实施方式Detailed ways

下面结合实施例和附图对本发明的一种多核共享SIMD协处理器的系统级模型构建方法做出详细说明。A method for constructing a system-level model of a multi-core shared SIMD coprocessor of the present invention will be described in detail below in conjunction with the embodiments and the accompanying drawings.

本发明的一种多核共享SIMD协处理器的系统级模型构建方法，包括有片上系统，所述的片上系统上设置有n个核和n个向量协处理器，其中n为正偶数，所述的n个向量协处理器通过一个交叉开关与所述的n个核相连接，还设置有分别与所述的n个核、n个向量协处理器和交叉开关相连用于通过所述的交叉开关调度向量协处理器与所述的核连通的调度器，其中，所述的调度器是根据每个向量协处理器当前状态来调度向量协处理器。A system-level model building method for a multi-core shared SIMD coprocessor of the present invention includes a system-on-chip, and the system-on-chip is provided with n cores and n vector coprocessors, wherein n is a positive even number, and the The n vector coprocessors are connected to the n cores through a crossbar switch, and are respectively connected to the n cores, n vector coprocessors and crossbar switches for passing through the crossbar A scheduler that switches and schedules the vector coprocessor in communication with the core, wherein the scheduler schedules the vector coprocessor according to the current state of each vector coprocessor.

第一状态寄存器，用于描述所在的向量协处理器当前正在被n个核中的哪个核使用，可能是核0，核1…核(n-1)。或是没有被任何一个核使用，当向量协处理器当前没有被任何一个核使用，则设定为所述的向量协处理器处在空闲状态，能够由调度器来调度；The first status register is used to describe which core of the n cores the vector coprocessor is currently using, which may be core 0, core 1 . . . core (n-1). Or it is not used by any core, when the vector coprocessor is not currently used by any core, it is set as the vector coprocessor is in an idle state and can be scheduled by the scheduler;

第二状态寄存器，用于描述所在的向量协处理器当前是处于共享状态还是处于专属状态。当一个核当前正在使用多个向量协处理器时，其中只有一个向量协处理器处于专属状态，其他向量协处理器则都是处于共享状态。设定处于共享状态的向量协处理器能够被调度器调度，而处于专属状态的向量协处理器无法被调度器调度；The second state register is used to describe whether the vector coprocessor is currently in a shared state or in an exclusive state. When a core is currently using multiple vector coprocessors, only one vector coprocessor is in a dedicated state, and the other vector coprocessors are all in a shared state. It is set that the vector coprocessor in the shared state can be scheduled by the scheduler, and the vector coprocessor in the exclusive state cannot be scheduled by the scheduler;

本发明的一种多核共享SIMD协处理器的系统级模型构建方法，片上系统在初始状态时，每个向量协处理器都处在专属状态，其中第一个向量协处理器专属于第一个核，第二个向量协处理器专属于第二个核，依次类推，每个向量协处理器的索引都为0；向量协处理器由专属状态改变为共享状态的条件是：使用所述向量协处理器的核主动让出所述向量协处理器的使用权，此后所述向量协处理器由调度器来调度。A system-level model building method for multi-core shared SIMD coprocessors of the present invention, when the system on chip is in the initial state, each vector coprocessor is in a dedicated state, wherein the first vector coprocessor is exclusive to the first Core, the second vector coprocessor is dedicated to the second core, and so on, the index of each vector coprocessor is 0; the condition for the vector coprocessor to change from the exclusive state to the shared state is: use the vector The core of the coprocessor actively surrenders the right to use the vector coprocessor, and then the vector coprocessor is scheduled by the scheduler.

为使本发明的目的、技术方案和优点更加清楚，下面结合附图对本发明实施方式作进一步地详细描述。In order to make the object, technical solution and advantages of the present invention clearer, the implementation manner of the present invention will be further described in detail below in conjunction with the accompanying drawings.

图2是依据本发明的多核共享SIMD协处理器的系统级模型构建方法设计的一个4核通过交叉开关共享4个VP的系统级模型。调度器4会与4个核(Core)1及4个向量协处理器(VP)2进行通信，根据核1的申请进行向量协处理器2资源的分配以及交叉开关3的配置。Fig. 2 is a system-level model designed according to the system-level model construction method of multi-core shared SIMD coprocessor of the present invention, in which 4 cores share 4 VPs through a crossbar switch. The scheduler 4 communicates with 4 cores (Core) 1 and 4 vector coprocessors (VP) 2, and allocates resources of the vector coprocessor 2 and configures the crossbar switch 3 according to the application of the core 1.

在系统初始状态下，4个向量协处理器2分别专属于4个核1，不可被调度器4调度。当其中某一个核1执行一个没有数据级并行性的程序时，可以向调度器4申请让出向量协处理器2的使用权，此后该向量协处理器2由调度器4进行管理，其状态也由专属状态改变为共享状态。由于程序本身的动态特点，在某一个时间段，由于操作系统的调度，让出向量协处理器2的核1有可能又会执行一个具有数据级并行性的程序。此时，该核1需要向调度器4申请一定数量的向量协处理器2资源，如果当前有空闲状态的向量协处理器2，那么调度器4会将一定量的向量协处理器2分配给该核1，并将其中一个向量协处理器2的状态由共享态改变为专属态，其他的分配给该核的向量协处理器2仍然处在共享态。在任何情况下，一个让出向量协处理器使用权的核，若再次申请向量协处理器资源，那么该核至少会申请到一个向量协处理器资源，这样就会保证每个核不会被“饿死”。In the initial state of the system, the four vector coprocessors 2 are respectively dedicated to the four cores 1 and cannot be scheduled by the scheduler 4 . When one of the cores 1 executes a program without data-level parallelism, it can apply to the scheduler 4 for the right to use the vector coprocessor 2. After that, the vector coprocessor 2 is managed by the scheduler 4, and its status Also changed from exclusive status to shared status. Due to the dynamic characteristics of the program itself, in a certain period of time, due to the scheduling of the operating system, the core 1 of the relinquished vector coprocessor 2 may execute a program with data-level parallelism. At this time, the core 1 needs to apply to the scheduler 4 for a certain amount of vector coprocessor 2 resources. If there is currently an idle vector coprocessor 2, the scheduler 4 will allocate a certain amount of vector coprocessor 2 to The core 1 changes the state of one of the vector coprocessors 2 from the shared state to the exclusive state, and the other vector coprocessors 2 assigned to the core are still in the shared state. In any case, if a core that gives up the right to use the vector coprocessor applies for vector coprocessor resources again, the core will apply for at least one vector coprocessor resource, which will ensure that each core will not be used "starve".

图3所示是一个调度示例，用以说明本发明的共享模型是如何进行工作的。分为两拦，左侧为时间状态，右侧为4个向量协处理器(VP)。在系统初始状态，也就是State 0状态，4个VP分别各属于一个核。每个VP通过三个参数来描述当前所处的状态，以VP0举例，三个参数分别为：C0/0/0。C0代表当前VP0正在被核0使用；第二个参数若为0代表当前处在专属状态，即专属于核0，不可被调度器调度，若为1，则处在共享态，可被调度器调度；第三个参数为0表示在属于核0的所有VP中，该VP的索引为0。Figure 3 shows a scheduling example to illustrate how the sharing model of the present invention works. It is divided into two blocks, the left side is the time state, and the right side is 4 vector coprocessors (VP). In the initial state of the system, that is, the State 0 state, each of the four VPs belongs to one core. Each VP describes the current state through three parameters. Taking VP0 as an example, the three parameters are: C0/0/0. C0 means that VP0 is currently being used by core 0; if the second parameter is 0, it means that it is currently in an exclusive state, that is, it is exclusive to core 0 and cannot be scheduled by the scheduler; if it is 1, it is in a shared state and can be used by the scheduler Scheduling; the third parameter being 0 means that among all VPs belonging to core 0, the index of this VP is 0.

系统运行之后，核0、核1和核2由于没有执行数据级并行性的程序，因此主动向调度器申请，让出VP的使用权，此时进入State 1状态中。在State 1状态中，以VP0为例，其三个参数为：s/1/2。第一个参数为s代表该VP由调度器负责管理，第二个参数为1表示当前VP0处在共享态，第三个参数为2代表在所有属于调度器管理的VP当中，VP0在其中的索引为2。After the system is running, core 0, core 1, and core 2 actively apply to the scheduler to give up the right to use VP because they do not execute data-level parallelism programs, and enter the State 1 state at this time. In State 1, taking VP0 as an example, its three parameters are: s/1/2. The first parameter is s, which means that the VP is managed by the scheduler. The second parameter is 1, which means that the current VP0 is in a shared state. The third parameter is 2, which means that among all the VPs managed by the scheduler, VP0 is in it. The index is 2.

系统继续运行，在某一段时间，核3由于执行大量的数据密集型任务，因此主动向调度器申请VP资源，由于调度器正在管理着3个处于闲置状态且处在共享状态的VP，因此将这3个VP资源全部分配给核3使用，进入State 1状态。在该状态中，4个VP都被核3使用，有3个VP(VP0～VP3)仍然处在共享态，一个VP(VP3)处在专属态，索引分别为0到3。The system continues to run. For a certain period of time, core 3 actively applies for VP resources from the scheduler because it performs a large number of data-intensive tasks. Since the scheduler is managing 3 idle and shared VPs, it will These 3 VP resources are all allocated to core 3 and enter the State 1 state. In this state, all 4 VPs are used by core 3, 3 VPs (VP0-VP3) are still in the shared state, and one VP (VP3) is in the exclusive state, with indexes 0 to 3 respectively.

系统继续往下运行，在某一时间段，由于操作系统的调度，核0被调度进去一个有数据级并行性的程序，因此核0主动向调度器申请VP资源，而此时核3仍然在使用着全部的VP资源。由于调度器运行的是负载均衡调度算法，因此会将被核3使用的3个处在共享状态的VP中的两个分配给核0，同时会对这两个VP的状态寄存器进行设置，随机的将这两个VP中的一个由共享态设置为专属态，另一个仍然为共享态，进入State 3状态。在State 3状态中，核0和核3分别使用2个VP资源，每个核使用的两个VP中都有一个是专属态，一个是共享态。根据系统运行情况，处在共享态的VP资源仍然可被调度器调度，整个过程一直是一个动态调整的过程。The system continues to run. In a certain period of time, due to the scheduling of the operating system, core 0 is scheduled into a program with data-level parallelism, so core 0 actively applies for VP resources from the scheduler, while core 3 is still running All VP resources are used. Since the scheduler is running a load balancing scheduling algorithm, two of the three VPs in the shared state used by core 3 will be assigned to core 0, and the status registers of these two VPs will be set at the same time. Set one of the two VPs from the shared state to the exclusive state, and the other is still in the shared state and enters the State 3 state. In State 3, core 0 and core 3 use two VP resources respectively, and one of the two VPs used by each core is in the exclusive state and the other is in the shared state. According to the operating conditions of the system, the VP resources in the shared state can still be scheduled by the scheduler, and the whole process has been a dynamic adjustment process.

本领域技术人员可以理解附图只是一个优选实施例的示意图，上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred embodiment, and the serial numbers of the above-mentioned embodiments of the present invention are for description only, and do not represent the advantages and disadvantages of the embodiments. The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within range.

Claims

1. A system-level model building method for multi-core shared SIMD coprocessors, including system-on-chip, said system-on-chip is provided with n cores and n vector coprocessors, wherein n is a positive even number, characterized in that , the n vector coprocessors are connected to the n cores through a crossbar switch, and are connected to the n cores, n vector coprocessors and crossbar switches respectively for passing through the crossbar switch The crossbar switch schedules a scheduler that communicates with the cores, wherein the scheduler schedules the vector coprocessors according to the current state of each vector coprocessor.

2. the system-level model building method of a kind of multi-core shared SIMD coprocessor according to claim 1, is characterized in that, described each vector coprocessor is to describe the current state by 3 state registers ,in,

The first status register is used to describe which core in the n cores the vector coprocessor is currently using, or is not being used by any core. When the vector coprocessor is not currently being used by any core, set It is determined that the vector coprocessor is in an idle state and can be scheduled by a scheduler;

The second state register is used to describe whether the vector coprocessor is currently in the shared state or in the exclusive state. It is set that the vector coprocessor in the shared state can be scheduled by the scheduler, but the vector coprocessor in the exclusive state cannot be scheduled. Scheduled by the scheduler;

The third status register is used to describe the index of the vector coprocessor in all the vector coprocessors in the core.

3. The system-level model building method of a kind of multi-core shared SIMD coprocessor according to claim 2, wherein when a core is currently using a plurality of vector coprocessors, only one vector coprocessor is in Exclusive state, other vector coprocessors are in shared state.

4. the system-level model building method of a kind of multi-core shared SIMD coprocessor according to claim 2, is characterized in that, when system on chip is in initial state, each vector coprocessor is all in exclusive state, wherein the first The first vector coprocessor is dedicated to the first core, the second vector coprocessor is dedicated to the second core, and so on. The index of each vector coprocessor is 0; the vector coprocessor changes from the exclusive state to The condition for sharing the state is: the core using the vector coprocessor actively surrenders the right to use the vector coprocessor, and then the vector coprocessor is scheduled by the scheduler.

5. the system-level model building method of a kind of multi-core shared SIMD coprocessor according to claim 4, is characterized in that, when any core in n cores needs more because of carrying out the program that has data-level parallelism vector coprocessor to participate in the calculation, then the core needs to apply for more vector coprocessors from the scheduler, and the scheduler runs a load-balancing scheduling algorithm. If there are currently idle vector coprocessors, Then the scheduler assigns the vector coprocessor to the core that is applying. If there is no idle vector coprocessor but there is a vector coprocessor in a shared state, the scheduler will balance the load according to the current situation. Policy for reallocation of vector coprocessor resources.