CN118796342A

CN118796342A - Instruction processing method, device and virtual machine

Info

Publication number: CN118796342A
Application number: CN202410163635.7A
Authority: CN
Inventors: 解子岩; 张昊; 刘景磊; 王升; 赵奇慧; 罗馨玥
Original assignee: China Mobile Communications Group Co Ltd; Research Institute of China Mobile Communication Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; Research Institute of China Mobile Communication Co Ltd
Priority date: 2024-02-05
Filing date: 2024-02-05
Publication date: 2024-10-18

Abstract

The invention provides an instruction processing method, an instruction processing device and a virtual machine, and relates to the technical field of communication. The method comprises the following steps: acquiring a virtual Instruction Set (ISA) carried in an application program; according to virtual ISA execution task arrangement management, determining a virtual ISA sequence corresponding to a task to be executed; and converting the virtual ISA codes in the virtual ISA sequences into target ISA codes corresponding to different types of accelerators respectively. The invention can solve the problems that the computational system realizing isomerization and parallelization based on the traditional mode needs to carry out complicated scheduling according to the change of hardware, the system load is higher and the scheduling model is complex.

Description

Instruction processing method, device and virtual machine

技术领域Technical Field

本发明涉及通信技术领域，特别是指一种指令处理方法、装置及虚拟机。The present invention relates to the field of communication technology, and in particular to an instruction processing method, device and virtual machine.

背景技术Background Art

随着人工智能、大数据等应用的迅猛发展，带来对多样化海量数据计算能力及时效性的更高要求，需要更大的算力进行计算支撑。受硬件架构、通信总线等技术工艺的限制，传统通用中央处理器(Central Processing Unit，CPU)的计算能力趋于饱和。因此，为了提高计算性能、增加系统的灵活性和适应性，从而满足不同计算任务的需求，计算系统向异构化、并行化发展。异构化指的是在计算系统中使用有别于CPU架构的处理器和加速器来优化执行特定的计算任务，例如：图形处理器(graphics processing unit，GPU)、数字信号处理器(Digital Signal Process，DSP)和张量处理器(Tensor Processing Unit，TPU)，这些加速器在特定任务上具有更高的性能和效率。通过将任务分配给适合执行的处理器和加速器，异构计算系统能够更好地利用硬件资源，并提高整体的计算性能。并行化是指将计算任务分解为多个子任务，并同时互联多个处理器节点或加速器节点形成分布式并行系统，在其上执行这些子任务。通过将任务分发到多个计算节点上，可以同时处理多个子任务，从而加速整个计算过程。而由于各厂商硬件仅能识别并运行符合自身指令集架构的应用程序，这样基于传统方式实现异构化、并行化的计算系统，需要运用不同厂商的工具分别生成各厂商加速器对应的不同指令集(Instruction Set Architecture，指令集)代码，将导致根据硬件的变化进行繁复的调度，系统负载较高且调度模型也比较复杂。With the rapid development of applications such as artificial intelligence and big data, higher requirements are brought about for the computing power and timeliness of diversified massive data, and greater computing power is needed for computing support. Limited by technical processes such as hardware architecture and communication bus, the computing power of traditional general-purpose central processing units (CPUs) tends to be saturated. Therefore, in order to improve computing performance, increase system flexibility and adaptability, and thus meet the needs of different computing tasks, computing systems are developing towards heterogeneity and parallelization. Heterogeneity refers to the use of processors and accelerators that are different from the CPU architecture in computing systems to optimize the execution of specific computing tasks, such as graphics processing units (GPUs), digital signal processors (DSPs), and tensor processing units (TPUs). These accelerators have higher performance and efficiency in specific tasks. By assigning tasks to processors and accelerators suitable for execution, heterogeneous computing systems can better utilize hardware resources and improve overall computing performance. Parallelization refers to decomposing computing tasks into multiple subtasks, and simultaneously interconnecting multiple processor nodes or accelerator nodes to form a distributed parallel system on which these subtasks are executed. By distributing tasks to multiple computing nodes, multiple subtasks can be processed simultaneously, thus accelerating the entire computing process. However, since each manufacturer's hardware can only recognize and run applications that conform to its own instruction set architecture, the traditional way of implementing heterogeneous and parallel computing systems requires the use of tools from different manufacturers to generate different instruction set (Instruction Set Architecture) codes corresponding to each manufacturer's accelerator, which will lead to complicated scheduling according to hardware changes, high system load and complex scheduling model.

发明内容Summary of the invention

本发明的目的是提供一种指令处理方法、装置及虚拟机，以解决通过基于传统方式实现异构化、并行化的计算系统，需要根据硬件的变化进行繁复的调度，系统负载较高且调度模型也比较复杂的问题。The purpose of the present invention is to provide an instruction processing method, device and virtual machine to solve the problem that a heterogeneous and parallel computing system implemented in a traditional way requires complicated scheduling according to hardware changes, the system load is high and the scheduling model is also complex.

为达到上述目的，本发明的实施例提供一种指令处理方法，包括：To achieve the above object, an embodiment of the present invention provides an instruction processing method, comprising:

获取应用程序内承载的虚拟ISA；Get the virtual ISA hosted in the application;

根据虚拟ISA执行任务编排管理，确定待执行任务对应的虚拟ISA序列；Perform task scheduling management according to the virtual ISA and determine the virtual ISA sequence corresponding to the task to be executed;

将所述虚拟ISA序列中的虚拟ISA代码转换为不同类型的加速器各自对应的目标ISA代码。The virtual ISA code in the virtual ISA sequence is converted into target ISA codes corresponding to different types of accelerators.

可选地，所述根据所述虚拟ISA执行任务编排管理，确定待执行任务对应的虚拟ISA序列，包括：Optionally, performing task scheduling management according to the virtual ISA to determine a virtual ISA sequence corresponding to the task to be executed includes:

根据所述虚拟ISA执行任务编排管理，确定至少一个虚拟任务簇；其中，每个所述虚拟任务簇包括至少一个虚拟线程；Performing task scheduling management according to the virtual ISA to determine at least one virtual task cluster; wherein each of the virtual task clusters includes at least one virtual thread;

在每个虚拟任务簇中的虚拟线程运行相同的程序，得到所述待执行任务对应的虚拟ISA序列。The virtual threads in each virtual task cluster run the same program to obtain a virtual ISA sequence corresponding to the task to be executed.

可选地，所述根据所述虚拟ISA执行任务编排管理，确定至少一个虚拟任务簇，包括：Optionally, performing task scheduling management according to the virtual ISA to determine at least one virtual task cluster includes:

根据所述虚拟ISA进行任务识别，确定待执行任务；Performing task identification according to the virtual ISA to determine the task to be performed;

按照所述待执行任务，对所述虚拟ISA进行优化编排处理，确定至少一个虚拟任务簇。According to the tasks to be executed, the virtual ISA is optimized and arranged to determine at least one virtual task cluster.

可选地，所述方法还包括：Optionally, the method further comprises:

生成虚拟任务簇相关信息；Generate virtual task cluster related information;

将所述虚拟任务簇相关信息进行寄存；Storing the virtual task cluster related information;

其中，所述虚拟任务簇相关信息包括以下至少一项：The virtual task cluster related information includes at least one of the following:

虚拟任务簇的标识信息，用于指示所述虚拟任务簇所使用输入数据的读取位置，和/或所生成输出数据的写入位置；Identification information of a virtual task cluster, used to indicate a reading position of input data used by the virtual task cluster, and/or a writing position of output data generated by the virtual task cluster;

虚拟任务簇的维度信息，用于指示所述虚拟任务簇中包含的虚拟线程个数；Dimension information of a virtual task cluster, used to indicate the number of virtual threads included in the virtual task cluster;

虚拟线程的标识信息，用于指示所述虚拟线程所使用输入数据的读取位置和/或所生成输出数据的写入位置。The identification information of the virtual thread is used to indicate a reading position of input data used by the virtual thread and/or a writing position of output data generated by the virtual thread.

按照所述待执行任务，对所述虚拟ISA进行任务分解，得到至少一个任务分区；其中，不同任务分区对应待执行任务的不同部分；According to the task to be executed, the virtual ISA is task-decomposed to obtain at least one task partition; wherein different task partitions correspond to different parts of the task to be executed;

根据每个所述任务分区，确定至少一个虚拟任务簇；其中，每个任务分区内的不同虚拟任务簇之间相互独立，且每个任务分区中的虚拟任务簇运行相同的程序。At least one virtual task cluster is determined according to each of the task partitions; wherein different virtual task clusters in each task partition are independent of each other, and the virtual task clusters in each task partition run the same program.

可选地，所述方法还包括：Optionally, the method further comprises:

生成任务分区相关信息；Generate task partition related information;

将所述任务分区相关信息进行寄存；Storing the task partition related information;

其中，所述任务分区相关信息包括以下至少一项：The task partition related information includes at least one of the following:

任务分区的标识信息，用于指示所述任务分区所使用输入数据的读取位置和/或所生成输出数据的写入位置；The identification information of the task partition is used to indicate the reading position of the input data used by the task partition and/or the writing position of the output data generated by the task partition;

任务分区的维度信息，用于指示所述任务分区中包含的虚拟任务簇个数。The dimension information of the task partition is used to indicate the number of virtual task clusters contained in the task partition.

可选地，所述将所述虚拟ISA序列中的虚拟ISA代码转换为不同类型的加速器各自对应的目标ISA代码之后，还包括：Optionally, after converting the virtual ISA code in the virtual ISA sequence into target ISA codes corresponding to different types of accelerators, the method further includes:

将所述目标ISA代码缓存至每个加速器对应的数据缓冲区。The target ISA code is cached in a data buffer corresponding to each accelerator.

本申请实施例还提供一种指令处理装置，包括：The present application also provides an instruction processing device, including:

获取模块，用于获取应用程序内承载的虚拟ISA；An acquisition module, used to acquire the virtual ISA carried in the application;

确定模块，用于根据虚拟ISA执行任务编排管理，确定待执行任务对应的虚拟ISA序列；A determination module, used for performing task scheduling management according to the virtual ISA, and determining the virtual ISA sequence corresponding to the task to be executed;

转换模块，用于将所述虚拟ISA序列中的虚拟ISA代码转换为不同类型的加速器各自对应的目标ISA代码。The conversion module is used to convert the virtual ISA code in the virtual ISA sequence into target ISA codes corresponding to different types of accelerators.

本申请实施例还提供一种虚拟机，包括：虚拟指令加载器、虚拟加速器执行引擎、虚拟指令模型转换器；其中，The embodiment of the present application also provides a virtual machine, including: a virtual instruction loader, a virtual accelerator execution engine, and a virtual instruction model converter; wherein,

所述虚拟指令加载器用于对应用程序内承载的虚拟ISA进行加载；The virtual instruction loader is used to load the virtual ISA carried in the application program;

所述虚拟加速器执行引擎对所述虚拟指令加载器加载的虚拟ISA执行任务编排管理，确定待执行任务对应的虚拟ISA序列；The virtual accelerator execution engine performs task arrangement management on the virtual ISA loaded by the virtual instruction loader, and determines the virtual ISA sequence corresponding to the task to be executed;

所述虚拟指令模型转换器将所述虚拟加速器执行引擎执行任务编排管理得到的虚拟ISA序列中的虚拟ISA代码，转换为不同类型的加速器各自对应的目标ISA代码。The virtual instruction model converter converts the virtual ISA code in the virtual ISA sequence obtained by the virtual accelerator execution engine performing task arrangement management into target ISA codes corresponding to different types of accelerators.

可选地，所述虚拟机还包括：数据缓冲区；其中，Optionally, the virtual machine further includes: a data buffer; wherein,

所述虚拟指令模型转换器将所述目标ISA代码缓存至所述数据缓冲区；其中，所述虚拟指令模型转换器与所述加速器共享内存。The virtual instruction model converter caches the target ISA code to the data buffer; wherein the virtual instruction model converter shares memory with the accelerator.

本申请实施例还提供一种虚拟机，包括：收发器、处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令；所述处理器执行所述程序或指令时实现如上所述的指令处理方法的步骤。An embodiment of the present application also provides a virtual machine, including: a transceiver, a processor, a memory, and a program or instruction stored in the memory and executable on the processor; when the processor executes the program or instruction, the steps of the instruction processing method described above are implemented.

本申请实施例还提供一种可读存储介质，其上存储有程序或指令，所述程序或指令被处理器执行时实现如上所述的指令处理方法的步骤。An embodiment of the present application also provides a readable storage medium on which a program or instruction is stored. When the program or instruction is executed by a processor, the steps of the instruction processing method as described above are implemented.

本申请实施例还提供一种计算机程序产品，包括计算机指令，该计算机指令被处理器执行时实现如上所述的指令处理方法的步骤。An embodiment of the present application also provides a computer program product, including computer instructions, which implement the steps of the instruction processing method described above when executed by a processor.

本发明的上述技术方案的有益效果如下：The beneficial effects of the above technical solution of the present invention are as follows:

本申请实施例中，通过设置可兼容多种异构指令的虚拟ISA，并针对兼容多种异构指令的虚拟ISA，执行统一的任务编排管理，得到待执行任务对应的虚拟ISA序列，通过对虚拟ISA执行统一的任务编排管理得到的虚拟ISA序列进行转换，得到的各个类型的加速器分别对应的目标ISA代码。这样针对同一功能的多种异构化加速器，可以采用虚拟ISA执行统一的编排管理，得到各个类型的加速器分别对应的目标ISA代码，将极大提升混合异构系统的协同运用效能，解决了基于传统方式实现异构化、并行化的计算系统，需要根据硬件的变化进行繁复的调度，系统负载较高且调度模型也比较复杂的问题。In the embodiment of the present application, a virtual ISA compatible with multiple heterogeneous instructions is set, and a unified task scheduling management is performed for the virtual ISA compatible with multiple heterogeneous instructions to obtain a virtual ISA sequence corresponding to the task to be executed, and the virtual ISA sequence obtained by performing unified task scheduling management on the virtual ISA is converted to obtain target ISA codes corresponding to each type of accelerator. In this way, for multiple heterogeneous accelerators with the same function, a unified scheduling management can be performed using the virtual ISA to obtain target ISA codes corresponding to each type of accelerator, which will greatly improve the collaborative use efficiency of the hybrid heterogeneous system and solve the problem that the computing system based on the traditional method of heterogeneity and parallelization needs to be complicatedly scheduled according to the changes in the hardware, the system load is high, and the scheduling model is also relatively complex.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为基于传统方式实现异构化、并行化的计算系统的示意图；FIG1 is a schematic diagram of a computing system that implements heterogeneity and parallelization based on a traditional approach;

图2为本申请实施例的指令处理方法的流程图；FIG2 is a flow chart of an instruction processing method according to an embodiment of the present application;

图3为本申请实施例的指令处理装置的框图；FIG3 is a block diagram of an instruction processing device according to an embodiment of the present application;

图4为本申请实施例的虚拟机的框图之一；FIG4 is a block diagram of a virtual machine according to an embodiment of the present application;

图5为本申请实施例的虚拟机的框图之二；FIG5 is a second block diagram of a virtual machine according to an embodiment of the present application;

图6为本申请实施例的虚拟机的框图之三。FIG6 is a third block diagram of the virtual machine according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为使本发明要解决的技术问题、技术方案和优点更加清楚，下面将结合附图及具体实施例进行详细描述。In order to make the technical problems, technical solutions and advantages to be solved by the present invention more clear, a detailed description will be given below with reference to the accompanying drawings and specific embodiments.

应理解，说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本发明的至少一个实施例中。因此，在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外，这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。It should be understood that the references to "one embodiment" or "an embodiment" throughout the specification mean that the specific features, structures, or characteristics associated with the embodiment are included in at least one embodiment of the present invention. Therefore, the references to "in one embodiment" or "in an embodiment" appearing throughout the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

在本发明的各种实施例中，应理解，下述各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本发明实施例的实施过程构成任何限定。In various embodiments of the present invention, it should be understood that the size of the serial numbers of the following processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

另外，本文中术语“系统”和“网络”在本文中常可互换使用。Additionally, the terms "system" and "network" are often used interchangeably herein.

在本申请所提供的实施例中，应理解，“与A相应的B”表示B与A相关联，根据A可以确定B。但还应理解，根据A确定B并不意味着仅仅根据A确定B，还可以根据A和/或其它信息确定B。In the embodiments provided in the present application, it should be understood that "B corresponding to A" means that B is associated with A, and B can be determined according to A. However, it should also be understood that determining B according to A does not mean determining B only according to A, and B can also be determined according to A and/or other information.

高性能异构加速计算系统的指令架构主要分为单指令多数据(SingleInstruction Multiple Data，SIMD)与单指令多线程(Single Instruction MultipleThreads，SIMT)两类指令形式，而不同厂商在符合指令形式的框架下，分别衍生出各自硬件的指令集。The instruction architecture of high-performance heterogeneous accelerated computing systems is mainly divided into two types of instruction forms: Single Instruction Multiple Data (SIMD) and Single Instruction Multiple Threads (SIMT). Different manufacturers have derived their own hardware instruction sets within the framework of the instruction form.

如图1所示，各厂商硬件仅能识别并运行符合自身指令集架构的应用程序，应用程序之间无法互识，无法跨指令架构运行。同时，生成应用程序，需要运用各自厂商的编译器等工具进行生成。因此，在当前混合有多类异构计算系统的集群中，开发者要构建某一功能的应用程序，需要运用不同厂商的工具分别进行开发及封装，生成同一功能应用的多份不同ISA版本的二进制文件，集成有这些二进制文件的业务应用如果需要在集群内并行运行，需要根据硬件的变化进行繁复的调度，系统负载极高、调度模型极为复杂。As shown in Figure 1, hardware from each manufacturer can only recognize and run applications that conform to its own instruction set architecture. Applications cannot recognize each other and cannot run across instruction architectures. At the same time, to generate applications, it is necessary to use the compiler and other tools of each manufacturer to generate them. Therefore, in the current clusters that are mixed with multiple types of heterogeneous computing systems, developers need to use tools from different manufacturers to develop and package applications for a certain function, and generate multiple binary files of different ISA versions of the same functional application. If business applications that integrate these binary files need to run in parallel in the cluster, they need to be complicatedly scheduled according to hardware changes, and the system load is extremely high and the scheduling model is extremely complex.

如前所述，按如1所示的技术方案为混合异构系统开发的可并行运转的程序非常困难。通常要求编程人员知道可用的处理单元的数目及其能力(指令集、数据寄存器数目、互连件等)，以便建立处理单元可实际执行的代码。虽然机器特定的编译器可在这个方面提供相当大的帮助，但仍有必要在每次将代码移植到不同处理器时重新编译代码。As mentioned above, it is very difficult to develop a program that can run in parallel for a mixed heterogeneous system according to the technical solution shown in Figure 1. It is usually required that the programmer knows the number of available processing units and their capabilities (instruction sets, number of data registers, interconnects, etc.) in order to create code that can actually be executed by the processing units. Although machine-specific compilers can provide considerable help in this regard, it is still necessary to recompile the code every time the code is ported to a different processor.

此外，并行处理结构的各种方面正在迅速发展。举例来说，正在陆续开发新的平台结构、指令集和编程模型。随着并行结构的各种方面(例如：编程模型或指令集)从一代改变成下一代，也需要相应地改变应用程序、软件库、编译器和其它软件及工具。这种不稳定性可能会为并行处理代码的开发和维护增加相当大的额外开销。In addition, various aspects of parallel processing architectures are evolving rapidly. For example, new platform architectures, instruction sets, and programming models are being developed. As various aspects of the parallel architecture (e.g., programming models or instruction sets) change from one generation to the next, applications, software libraries, compilers, and other software and tools need to change accordingly. This instability can add considerable overhead to the development and maintenance of parallel processing code.

当需要线程之间协调时，并行编程变得更加困难。编程人员需要确定在特定处理器或计算机系统中有可用于支持(或仿真)线程间通信的机制，且需要编写利用可用机制的代码。由于不同的计算机系统上的可用和/或最优机制一般不同，所以这种类型的并行代码一般不可移植；需要针对其运行于的每种硬件平台重编写并行代码。When coordination between threads is required, parallel programming becomes more difficult. The programmer needs to determine that there are mechanisms available to support (or emulate) inter-thread communication on a particular processor or computer system, and needs to write code that takes advantage of the available mechanisms. Because the available and/or optimal mechanisms are generally different on different computer systems, this type of parallel code is generally not portable; the parallel code needs to be rewritten for each hardware platform on which it runs.

此外，除了为处理器提供可执行代码，编程人员还需要为CPU处理器提供控制代码，所述控制代码协调各种异构硬件的操作，例如：向每个处理单元指示要执行哪个程序和处理哪些输入数据等。上述控制代码通常专用于特定的主处理器和处理器间的通信协议，且如果要替换不同的主处理器则通常需要重编写所述控制代码。In addition to providing executable code for the processor, programmers also need to provide control code for the CPU processor, which coordinates the operations of various heterogeneous hardware, for example, instructing each processing unit which program to execute and which input data to process, etc. The above control code is usually dedicated to a specific host processor and the communication protocol between processors, and if a different host processor is to be replaced, the control code usually needs to be rewritten.

编译和重新编译并行处理代码中的困难可能使用户不愿随着计算技术的发展而升级其系统，同时也给系统的设计带来极大难度，系统结构及调度模型极为复杂，难以发挥系统集群的整体效能。因此，将需要使已编译的并行处理代码脱离特定的硬件平台，并提供一种作为并行应用程序和工具的目标的稳定的并行处理结构和指令集。The difficulty in compiling and recompiling parallel processing codes may make users reluctant to upgrade their systems as computing technology develops, and it also makes system design extremely difficult. The system structure and scheduling model are extremely complex, making it difficult to exert the overall performance of the system cluster. Therefore, it will be necessary to separate the compiled parallel processing code from a specific hardware platform and provide a stable parallel processing structure and instruction set as the target of parallel applications and tools.

为克服现有技术中的不足，针对目前异构SIMD、SIMT指令类型有不同参数长度和不同单指令，本发明的目的是提供一种指令处理方法、装置及虚拟机，以解决通过基于传统方式实现异构化、并行化的计算系统，需要根据硬件的变化进行繁复的调度，系统负载较高且调度模型也比较复杂的问题。In order to overcome the deficiencies in the prior art, the purpose of the present invention is to provide an instruction processing method, device and virtual machine for the current heterogeneous SIMD and SIMT instruction types with different parameter lengths and different single instructions, so as to solve the problem that a computing system that realizes heterogeneity and parallelization based on traditional methods needs to perform complicated scheduling according to hardware changes, the system load is high and the scheduling model is also complex.

如图2所示，本申请实施例提供一种指令处理方法，包括以下步骤：As shown in FIG2 , the embodiment of the present application provides an instruction processing method, comprising the following steps:

步骤21：获取应用程序内承载的虚拟ISA。Step 21: Get the virtual ISA hosted within the application.

可选地，所述虚拟ISA可兼容多种异构指令，比如：SIMD、SIMT等。具体的，所述虚拟ISA可以用于定义计算任务处理行为的一套虚拟指令，包括但不限于以下至少一项：存取指令、算术运算指令、逻辑运算指令、移位指令、选择指令、比较指令、整理指令和向量处理指令等基本指令。Optionally, the virtual ISA is compatible with multiple heterogeneous instructions, such as SIMD, SIMT, etc. Specifically, the virtual ISA can be used to define a set of virtual instructions for computing task processing behaviors, including but not limited to at least one of the following: basic instructions such as access instructions, arithmetic operation instructions, logical operation instructions, shift instructions, selection instructions, comparison instructions, sorting instructions, and vector processing instructions.

步骤22：根据虚拟ISA执行任务编排管理，确定待执行任务对应的虚拟ISA序列。Step 22: Perform task scheduling management according to the virtual ISA and determine the virtual ISA sequence corresponding to the task to be executed.

该实施例中，针对兼容多种异构指令的虚拟ISA，执行统一的任务编排管理，得到待执行任务对应的虚拟ISA序列。可选地，所述虚拟ISA序列由虚拟ISA代码构成，所述虚拟ISA代码即是针对兼容多种异构指令的虚拟ISA，执行统一的任务编排管理得到的，可以应用于不同类型的加速器的统一ISA代码。In this embodiment, a unified task scheduling management is performed for a virtual ISA compatible with multiple heterogeneous instructions to obtain a virtual ISA sequence corresponding to the task to be executed. Optionally, the virtual ISA sequence is composed of a virtual ISA code, which is a unified ISA code obtained by performing unified task scheduling management for a virtual ISA compatible with multiple heterogeneous instructions and can be applied to different types of accelerators.

步骤23：将所述虚拟ISA序列中的虚拟ISA代码转换为不同类型的加速器各自对应的目标ISA代码。Step 23: Convert the virtual ISA code in the virtual ISA sequence into target ISA codes corresponding to different types of accelerators.

可选地，由于不同类型的加速器所采用的编译方式等不同，通过对虚拟ISA执行统一的任务编排管理得到的虚拟ISA序列进行转换，得到的各个类型的加速器分别对应的目标ISA代码，以使得各个类型的加速器能够识别相应的ISA代码并执行相应功能。Optionally, since different types of accelerators use different compilation methods, etc., the virtual ISA sequence obtained by performing unified task orchestration management on the virtual ISA is converted to obtain target ISA codes corresponding to each type of accelerator, so that each type of accelerator can recognize the corresponding ISA code and execute the corresponding function.

上述方案中，通过设置可兼容多种异构指令的虚拟ISA，并针对兼容多种异构指令的虚拟ISA，执行统一的任务编排管理，得到待执行任务对应的虚拟ISA序列，通过对虚拟ISA执行统一的任务编排管理得到的虚拟ISA序列进行转换，得到的各个类型的加速器分别对应的目标ISA代码。这样针对同一功能的多种异构化加速器，可以采用虚拟ISA执行统一的编排管理，得到各个类型的加速器分别对应的目标ISA代码，将极大提升混合异构系统的协同运用效能，解决了基于传统方式实现异构化、并行化的计算系统，需要根据硬件的变化进行繁复的调度，系统负载较高且调度模型也比较复杂的问题。In the above scheme, a virtual ISA compatible with multiple heterogeneous instructions is set up, and unified task scheduling management is performed for the virtual ISA compatible with multiple heterogeneous instructions to obtain a virtual ISA sequence corresponding to the task to be executed, and the virtual ISA sequence obtained by performing unified task scheduling management on the virtual ISA is converted to obtain target ISA codes corresponding to each type of accelerator. In this way, for multiple heterogeneous accelerators with the same function, a unified scheduling management can be performed using the virtual ISA to obtain target ISA codes corresponding to each type of accelerator, which will greatly improve the collaborative efficiency of the hybrid heterogeneous system and solve the problem that the computing system based on traditional methods to achieve heterogeneity and parallelization needs to be complicatedly scheduled according to hardware changes, the system load is high, and the scheduling model is also relatively complex.

可选地，获取应用程序内承载的虚拟ISA，包括：Optionally, obtain the virtual ISA hosted in the application, including:

基于虚拟映射模型，对应用程序内所承载的虚拟指令进行加载。Based on the virtual mapping model, the virtual instructions carried in the application are loaded.

举例来说，应用程序可以定义数据处理应用，其可以采用包含虚拟任务簇或VTC的方式处理，和/或，采用包含任务分区的方式处理。可选地，应用程序可以定义以下至少一项：For example, an application may define a data processing application that may be processed in a manner including virtual task clusters or VTCs and/or in a manner including task partitions. Optionally, the application may define at least one of the following:

应用程序定义采用虚拟任务簇或VTC的行为；Applications define behaviors using virtual task clusters or VTCs;

应用程序定义虚拟任务簇或VTC的维度(比如：包含的虚拟线程的数目)，且如果将使用任务分区，那么定义任务分区的维度(比如：包含的虚拟任务簇或VTC的数目)；The application defines the dimensions of the virtual task cluster or VTC (e.g., the number of virtual threads contained), and if task partitioning is to be used, the dimensions of the task partition (e.g., the number of virtual task clusters or VTCs contained);

应用程序定义待由虚拟任务簇或VTC(或任务分区)处理的输入数据集和输出数据集将被存储的位置；The application defines the input data sets to be processed by the virtual task cluster or VTC (or task partition) and the locations where the output data sets will be stored;

应用程序定义总体处理行为，例如：何时启动每个虚拟任务簇或VTC(或任务分区)。应用程序可包含动态确定虚拟任务簇或VTC的维度(或任务分区的维度)、是否不断启动新的虚拟任务簇或VTC的维度(或任务分区)等的额外代码等。The application defines the overall processing behavior, such as when to start each virtual task cluster or VTC (or task partition). The application may include additional code that dynamically determines the dimensions of virtual task clusters or VTCs (or the dimensions of task partitions), whether to continuously start new virtual task clusters or VTC dimensions (or task partitions), etc.

例如：可以采用C/C++、FORTRAN等高级编程语言编写应用程序。在一个实例中，C/C++应用程序直接指定一个虚拟线程的行为。使用数据并行语言(例如：FORTRAN、C#或openCL等)编写应用程序，且应用程序指定对阵列和聚合数据结构的数据并行操作；此应用程序可编译成指定一个VTC线程的行为的虚拟ISA程序代码。为了允许定义虚拟线程的行为，在一些实施例中可提供语言扩充或函数库，编程人员可经由所述语言扩充或函数库来指定并行虚拟线程的行为。举例来说，可定义特殊符号或变量以对应于线程ID、VTC ID和任务分区ID，且可提供函数，编程人员可经由所述函数来指示虚拟线程应何时与其它虚拟线程同步。For example, an application may be written in a high-level programming language such as C/C++, FORTRAN, etc. In one example, a C/C++ application directly specifies the behavior of a virtual thread. An application is written in a data parallel language (e.g., FORTRAN, C#, or openCL, etc.), and the application specifies data parallel operations on arrays and aggregate data structures; this application may be compiled into a virtual ISA program code that specifies the behavior of a VTC thread. In order to allow the behavior of a virtual thread to be defined, a language extension or function library may be provided in some embodiments, through which a programmer may specify the behavior of a parallel virtual thread. For example, special symbols or variables may be defined to correspond to thread IDs, VTC IDs, and task partition IDs, and functions may be provided through which a programmer may indicate when a virtual thread should synchronize with other virtual threads.

可选地，所述虚拟任务簇是对输入数据集并发执行相同虚拟指令以产生输出数据集的若干虚拟线程组成的阵列群组，或者称为是对输入数据集并发执行相同程序以产生输出数据集的多个虚拟线程的群组。虚拟任务簇包括至少一个虚拟线程，故虚拟任务簇也可以称为虚拟线程簇(Virtual Thread Cluster，VTC)，该程序可以称为VTC程序。Optionally, the virtual task cluster is an array group composed of a plurality of virtual threads that concurrently execute the same virtual instruction on an input data set to generate an output data set, or is called a group of multiple virtual threads that concurrently execute the same program on an input data set to generate an output data set. A virtual task cluster includes at least one virtual thread, so a virtual task cluster may also be called a virtual thread cluster (Virtual Thread Cluster, VTC), and the program may be called a VTC program.

举例来说，在虚拟任务簇或VTC中，数据可由一个虚拟线程产生并由另一虚拟线程消耗。在一些实施例中，可在将共享数据的点处将同步指令插入到VTC程序代码中，以确保在消耗数据的虚拟线程试图访问数据之前已由产生数据的虚拟线程实际产生数据。可选地，如果虚拟任务簇或VTC中的虚拟线程之间的支持数据共享，则数据共享的程度可以由VTC程序确定。当然需要说明的是，在使用虚拟任务簇或VTC的特定应用中，虚拟任务簇或VTC中的虚拟线程可能或可能不实际上彼此共享数据，具体可以取决于VTC程序，本申请实施例不做具体限制。For example, in a virtual task cluster or VTC, data can be generated by one virtual thread and consumed by another virtual thread. In certain embodiments, synchronization instructions can be inserted into the VTC program code at the point where the data will be shared to ensure that the virtual thread that generates the data actually generates the data before the virtual thread that consumes the data attempts to access the data. Optionally, if the virtual threads in the virtual task cluster or VTC support data sharing, the degree of data sharing can be determined by the VTC program. Of course, it should be noted that in the specific application using the virtual task cluster or VTC, the virtual threads in the virtual task cluster or VTC may or may not actually share data with each other, which may specifically depend on the VTC program, and the embodiments of the present application are not specifically limited.

该实施例中，根据所述虚拟ISA执行任务编排管理确定至少一个虚拟任务簇，也即是对虚拟ISA所对应的计算任务执行统一的虚拟计算，虚拟计算包括但不限于任务的识别、优化编排等。In this embodiment, at least one virtual task cluster is determined according to the task scheduling management of the virtual ISA, that is, unified virtual computing is performed on the computing tasks corresponding to the virtual ISA. The virtual computing includes but is not limited to task identification, optimization scheduling, etc.

可选地，所述方法，还包括：Optionally, the method further includes:

可选地，针对虚拟任务簇中的每个虚拟线程配置有标识信息，比如：唯一线程识别符或者称为线程ID，其可由虚拟线程在其执行期间存取。可定义为一维或多维数值的线程ID控制线程的处理行为的各个方面。举例来说，虚拟线程的标识信息可用于确定虚拟线程将处理输入数据集的哪一部分，和/或，确定虚拟线程将产生或写入输出数据集的哪一部分。在虚拟任务簇或VTC中，虚拟线程可通过以依赖于标识信息(比如：唯一线程识别符或者称为线程ID)的方式彼此共享数据而进行协作。Optionally, identification information, such as a unique thread identifier or thread ID, is configured for each virtual thread in a virtual task cluster, which can be accessed by the virtual thread during its execution. The thread ID, which can be defined as a one-dimensional or multi-dimensional numerical value, controls various aspects of the processing behavior of the thread. For example, the identification information of the virtual thread can be used to determine which part of the input data set the virtual thread will process, and/or, determine which part of the output data set the virtual thread will generate or write. In a virtual task cluster or VTC, virtual threads can collaborate by sharing data with each other in a manner that depends on identification information (such as a unique thread identifier or thread ID).

可选地，针对某个待执行任务编排确定的每个虚拟任务簇配置有标识信息，比如：虚拟任务簇识别符或者VTC识别符(VTC ID)。与虚拟线程的标识信息类似，任何唯一识别符(包含但不限于数值识别符等)均可用作VTC ID。针对虚拟任务簇中的每个虚拟线程，可以采用使用虚拟任务簇的标识信息(比如：虚拟任务簇识别符或者VTC ID)与虚拟线程的标识信息(比如：唯一线程识别符或者称为线程ID)的组合，来确定用于读取输入数据和写入输出数据的位置，使得每个虚拟任务簇或VTC对输入数据集的正确部分操作并将其输出数据集部分写入到正确位置。Optionally, each virtual task cluster determined for a certain task to be executed is configured with identification information, such as a virtual task cluster identifier or a VTC identifier (VTC ID). Similar to the identification information of a virtual thread, any unique identifier (including but not limited to a numerical identifier, etc.) can be used as a VTC ID. For each virtual thread in a virtual task cluster, a combination of the identification information of the virtual task cluster (such as a virtual task cluster identifier or a VTC ID) and the identification information of the virtual thread (such as a unique thread identifier or a thread ID) can be used to determine the location for reading input data and writing output data, so that each virtual task cluster or VTC operates on the correct portion of the input data set and writes its output data set portion to the correct location.

可选地，虚拟任务簇或VTC中的某个虚拟线程与同一虚拟任务簇或VTC中的其它虚拟线程共享输入数据或中间结果。举例来说，VTC程序可以包含用于计算共享存储器中的特定数据将被写入到的地址的指令，其中所述地址可以是线程ID的函数。每个虚拟线程使用其自身的线程ID来计算所述函数，并写入到相应位置。可以定义地址函数使得不同线程写入到不同位置；只要函数是确定性的，就可预测任何虚拟线程写入到的位置。再举例来说，VTC程序还可包含用于计算共享存储器中的将从中被读取数据的地址的指令，其中所述地址是线程ID的函数。通过定义适宜的函数和提供同步技术，数据可以采用可预测的方式由虚拟任务簇或VTC的一个虚拟线程写入到共享存储器中的给定位置，并由同一虚拟任务簇或VTC的不同虚拟线程从所述位置进行读取。这样，本申请实施例中的虚拟任务簇或VTC能够支持虚拟线程之间任意所需的数据共享模式，且能够支持虚拟任务簇或VTC中的任意虚拟线程与同一虚拟任务簇或VTC中的任意其它虚拟线程共享数据。Optionally, a virtual thread in a virtual task cluster or VTC shares input data or intermediate results with other virtual threads in the same virtual task cluster or VTC. For example, a VTC program may include instructions for calculating the address to which specific data in a shared memory will be written, wherein the address may be a function of a thread ID. Each virtual thread uses its own thread ID to calculate the function and writes it to the corresponding location. An address function may be defined so that different threads write to different locations; as long as the function is deterministic, the location to which any virtual thread writes may be predicted. For another example, a VTC program may also include instructions for calculating the address in a shared memory from which data will be read, wherein the address is a function of a thread ID. By defining appropriate functions and providing synchronization techniques, data can be written to a given location in a shared memory by a virtual thread of a virtual task cluster or VTC in a predictable manner, and read from the location by different virtual threads of the same virtual task cluster or VTC. In this way, the virtual task cluster or VTC in the embodiment of the present application can support any required data sharing mode between virtual threads, and can support any virtual thread in the virtual task cluster or VTC to share data with any other virtual thread in the same virtual task cluster or VTC.

该实施例中，根据所述虚拟ISA执行任务编排管理确定至少一个虚拟任务簇，也即是对虚拟ISA所对应的计算任务执行统一的虚拟计算，虚拟计算包括但不限于任务的识别、分解以及优化编排等。比如：根据每个所述任务分区进行优化编排，确定至少一个虚拟任务簇。In this embodiment, at least one virtual task cluster is determined according to the task scheduling management performed by the virtual ISA, that is, unified virtual computing is performed on the computing tasks corresponding to the virtual ISA, and the virtual computing includes but is not limited to task identification, decomposition, and optimized scheduling, etc. For example: optimizing and scheduling is performed according to each of the task partitions to determine at least one virtual task cluster.

可选地，任务分解或者称为数据并行分解，可以包含通过对输入数据并行执行同一算法多次，以产生输出数据来对计算问题求解的任何情形。举例来说，数据并行分解涉及将同一处理算法应用于输入数据集的不同部分以便产生输出数据集的不同部分。适于数据并行分解的问题的实例包含矩阵代数、任何数目的维度的线性和/或非线性变换(例如：快速傅里叶变换)，或者包含任何数目的维度的卷积滤波、多个维度的可分离滤波等各种滤波算法。在一些实施例中，待应用于输入数据集的每一部分的处理算法可以在VTC程序中指定，且虚拟任务簇或VTC中的每个虚拟线程对输入数据集的一个部分执行同一VTC程序。可选地，VTC程序可以支持使用广范围的数学和逻辑运算来实施算法，和/或，所述VTC程序可包含有条件或分支执行路径以及直接和/或间接存储器存取。Optionally, task decomposition or data parallel decomposition may include any situation in which the same algorithm is executed multiple times in parallel on the input data to generate output data to solve the computational problem. For example, data parallel decomposition involves applying the same processing algorithm to different parts of the input data set in order to generate different parts of the output data set. Examples of problems suitable for data parallel decomposition include matrix algebra, linear and/or nonlinear transformations of any number of dimensions (e.g., fast Fourier transforms), or various filtering algorithms such as convolution filtering of any number of dimensions, separable filtering of multiple dimensions. In some embodiments, the processing algorithm to be applied to each part of the input data set may be specified in a VTC program, and each virtual thread in a virtual task cluster or VTC executes the same VTC program on a part of the input data set. Optionally, the VTC program may support the use of a wide range of mathematical and logical operations to implement the algorithm, and/or, the VTC program may include conditional or branch execution paths and direct and/or indirect memory access.

基于任务分解或者数据并行分解，可以得到至少一个任务分区。其中每个任务分区可以由若干虚拟任务簇或VTC形成，每个任务分区中的所有虚拟任务簇或VTC执行相同程序。其中每个虚拟任务簇或VTC内可以由若干虚拟线程构建，每个虚拟任务簇或VTC中的所有虚拟线程块可以是相同大小并执行相同的VTC程序。举例来说，对于一个较大的图像数据输入而言，可通过在多个VTC之间划分图像使每一VTC产生输出像素的不同部分(例如：32x32小片)来管理这种大处理任务。Based on task decomposition or data parallel decomposition, at least one task partition can be obtained. Each task partition can be formed by a number of virtual task clusters or VTCs, and all virtual task clusters or VTCs in each task partition execute the same program. Each virtual task cluster or VTC can be constructed by a number of virtual threads, and all virtual thread blocks in each virtual task cluster or VTC can be of the same size and execute the same VTC program. For example, for a large image data input, this large processing task can be managed by dividing the image between multiple VTCs so that each VTC generates a different part of the output pixel (for example: 32x32 small pieces).

可选地，任务分区内的若干虚拟任务簇或VTC可以彼此独立，意味着每个任务分区中的任意虚拟任务簇或VTC的执行不受其它虚拟任务簇或VTC的执行的影响，这样可以在可用的处理核心之间分配任务时显著提升灵活性。Optionally, several virtual task clusters or VTCs within a task partition may be independent of each other, meaning that the execution of any virtual task cluster or VTC in each task partition is not affected by the execution of other virtual task clusters or VTCs, which can significantly improve flexibility when allocating tasks among available processing cores.

可选地，所述方法还包括：Optionally, the method further comprises:

生成任务分区相关信息；Generate task partition related information;

可选地，针对某个待执行任务分解确定的每个任务分区配置有标识信息，比如：任务分区识别符。在一些实施例中，为了区分任务分区内的不同虚拟任务簇或VTC，所述任务分区相关信息可以进一步包括任务分区的每个虚拟任务簇或VTC的标识信息。Optionally, each task partition determined by decomposing a task to be executed is configured with identification information, such as a task partition identifier. In some embodiments, in order to distinguish different virtual task clusters or VTCs within a task partition, the task partition related information may further include identification information of each virtual task cluster or VTC of the task partition.

可选地，在每个虚拟任务簇中的虚拟线程运行相同的程序，得到所述待执行任务对应的虚拟ISA序列，包括：Optionally, the virtual threads in each virtual task cluster run the same program to obtain a virtual ISA sequence corresponding to the task to be executed, including:

在每个虚拟任务簇或任务分区中，多个虚拟线程并行运行相同的程序，得到所述待执行任务对应的虚拟ISA序列。举例来说，在每个虚拟任务簇或任务分区中，支持多个虚拟线程能够在所需时间实现彼此的协作行为(例如：共享数据和同步的大量并发虚拟线程的并行处理器和相关联的存储器空间的表示)。该并行运行相同的程序的多个虚拟线程可映射到多种实际处理器和/或处理系统上，如不同类型的GPU、神经网络处理器(Neuralnetwork Processing Unit，NPU)或深度学习加速器(Deep learning accelerator，DLA)等。In each virtual task cluster or task partition, multiple virtual threads run the same program in parallel to obtain a virtual ISA sequence corresponding to the task to be executed. For example, in each virtual task cluster or task partition, multiple virtual threads are supported to achieve mutual cooperation at the required time (for example, a representation of a parallel processor and associated memory space of a large number of concurrent virtual threads that share data and synchronize). The multiple virtual threads running the same program in parallel can be mapped to a variety of actual processors and/or processing systems, such as different types of GPUs, neural network processors (Neural network Processing Unit, NPU) or deep learning accelerators (Deep learning accelerator, DLA), etc.

可选地，在每个虚拟任务簇中的虚拟线程运行相同的程序时，可以支持不同等级的数据共享和存取类型的若干虚拟存储器空间，以及识别支持执行的所有功能的虚拟ISA。Optionally, when the virtual threads in each virtual task cluster run the same program, several virtual memory spaces with different levels of data sharing and access types may be supported, as well as identifying a virtual ISA that supports all functions executed.

具体的，将所述虚拟ISA序列中的虚拟ISA代码转换为不同类型的加速器各自对应的目标ISA代码，也即是通过编译器针对应用程序定义虚拟线程行为的部分产生虚拟ISA代码，将虚拟ISA代码翻译为待由加速器执行的目标ISA代码。Specifically, the virtual ISA code in the virtual ISA sequence is converted into target ISA code corresponding to different types of accelerators, that is, the compiler generates virtual ISA code for the part of the application that defines the virtual thread behavior, and translates the virtual ISA code into target ISA code to be executed by the accelerator.

在一些实施例中，虚拟ISA代码是程序代码，但不限定是可在特定目标平台上执行的形式代码，比如虚拟ISA代码可作为任何其它程序代码被存储和/或分布。在其它实施例中，可将应用程序完全或部分指定为虚拟ISA代码，且可完全或部分避免使用编译器。在一些实施例中，目标ISA代码是可由目标平台直接执行的代码，目标ISA代码能够被接收和正确解码，视目标平台的特殊性而定。虚拟ISA代码可以被翻译成待由目标平台上的n0个线程分别执行的程序代码。或者，虚拟ISA代码也可以被翻译成将在少于n0个的线程分别执行的程序代码，其中每个线程包含与虚拟任务簇或VTC中的至少一个虚拟线程有关的处理任务。In certain embodiments, virtual ISA code is program code, but is not limited to the form code that can be executed on a specific target platform, such as virtual ISA code can be stored and/or distributed as any other program code. In other embodiments, application program can be designated as virtual ISA code wholly or in part, and can avoid using compiler wholly or in part. In certain embodiments, target ISA code is the code that can be directly executed by target platform, and target ISA code can be received and correctly decoded, depending on the particularity of target platform. Virtual ISA code can be translated into program code to be executed respectively by n0 threads on target platform. Or, virtual ISA code also can be translated into program code that will be less than n0 threads and executed respectively, wherein each thread comprises the processing task relevant with at least one virtual thread in virtual task cluster or VTC.

该实施例中，将所述目标ISA代码缓存至每个加速器对应的数据缓冲区，使得不同类型的加速器可以各自获得对应的目标ISA代码，也即是通过与不同类型的加速器共享内容的方式，使得不同类型的加速器可以各自获得对应的目标ISA代码。In this embodiment, the target ISA code is cached in the data buffer corresponding to each accelerator, so that different types of accelerators can each obtain the corresponding target ISA code. That is, by sharing content with different types of accelerators, different types of accelerators can each obtain the corresponding target ISA code.

如图3所示，本申请实施例提供一种指令处理装置300，包括：As shown in FIG3 , the embodiment of the present application provides an instruction processing device 300, including:

获取模块310，用于获取应用程序内承载的虚拟指令集ISA；An acquisition module 310 is used to acquire a virtual instruction set ISA carried in an application program;

确定模块320，用于根据虚拟ISA执行任务编排管理，确定待执行任务对应的虚拟ISA序列；A determination module 320 is used to determine a virtual ISA sequence corresponding to a task to be executed according to the virtual ISA execution task arrangement management;

转换模块330，用于将所述虚拟ISA序列中的虚拟ISA代码转换为不同类型的加速器各自对应的目标ISA代码。The conversion module 330 is used to convert the virtual ISA code in the virtual ISA sequence into target ISA codes corresponding to different types of accelerators.

可选地，所述确定模块320包括：Optionally, the determining module 320 includes:

确定单元，用于根据所述虚拟ISA执行任务编排管理，确定至少一个虚拟任务簇；其中，每个所述虚拟任务簇包括至少一个虚拟线程；A determination unit, configured to perform task scheduling management according to the virtual ISA and determine at least one virtual task cluster; wherein each of the virtual task clusters includes at least one virtual thread;

处理单元，用于在每个虚拟任务簇中的虚拟线程运行相同的程序，得到所述待执行任务对应的虚拟ISA序列。The processing unit is used to run the same program in the virtual threads in each virtual task cluster to obtain the virtual ISA sequence corresponding to the task to be executed.

可选地，所述确定单元还用于：Optionally, the determining unit is further configured to:

可选地，所述确定模块320还包括：Optionally, the determining module 320 further includes:

第一生成单元，用于生成虚拟任务簇相关信息；A first generating unit, used to generate virtual task cluster related information;

第一寄存单元，用于将所述虚拟任务簇相关信息进行寄存；A first storage unit, used for storing information related to the virtual task cluster;

第二生成单元，用于生成任务分区相关信息；The second generating unit is used to generate task partition related information;

第二寄存单元，用于将所述任务分区相关信息进行寄存；A second storage unit, used for storing the task partition related information;

可选地，所述指令处理装置300还包括：Optionally, the instruction processing device 300 further includes:

缓存模块，用于将所述目标ISA代码缓存至每个加速器对应的数据缓冲区。The cache module is used to cache the target ISA code into a data buffer corresponding to each accelerator.

需要说明的是，本申请实施例的上述装置与上述指令处理方法是基于同一发明构思的，两者实施例之间可以互相参见，也即是上述装置能够实现上述指令处理方法的各个实施例，且能达到相同的技术效果，为避免重复，这里不再赘述。It should be noted that the above-mentioned device and the above-mentioned instruction processing method of the embodiment of the present application are based on the same inventive concept, and the embodiments of the two can refer to each other, that is, the above-mentioned device can implement the various embodiments of the above-mentioned instruction processing method and can achieve the same technical effect. To avoid repetition, they will not be repeated here.

如图4所示，本发明实施例提供一种虚拟机，所述虚拟机能够支持在不同线程之间具有多层数据共享和计算任务调度的多个线程的并发执行的虚拟加速器模型，以及控制所述虚拟加速器模型的虚拟执行驱动器。举例来说，针对相同功能的多个编译器统一配置一套虚拟指令集(vISA)，由虚拟机针对该vISA进行任务编排管理，发执行的虚拟加速器模型并翻译为各类厂商加速器可用的实际ISA代码。其中，所述虚拟指令集(vISA)用于定义虚拟加速器模型中计算任务处理行为的一套虚拟指令，包括但不限于：存取指令、算术运算指令、逻辑运算指令、移位指令、选择指令、比较指令、整理指令和向量处理指令等基本指令。基于本申请实施例的虚拟机，可以提供编程人员可开发在其中执行并发、协作的线程以处理数据的应用程序。硬件特定的虚拟指令翻译器和执行驱动器使应用代码适应其执行于的特定硬件。因此，应用程序更具移植性并更易于开发，因为开发过程独立于特定处理硬件。As shown in Figure 4, an embodiment of the present invention provides a virtual machine, which can support a virtual accelerator model with concurrent execution of multiple threads with multi-layer data sharing and computing task scheduling between different threads, and a virtual execution driver for controlling the virtual accelerator model. For example, multiple compilers with the same function uniformly configure a set of virtual instruction sets (vISA), and the virtual machine performs task scheduling management for the vISA, issues an executed virtual accelerator model and translates it into actual ISA code available for accelerators of various manufacturers. Among them, the virtual instruction set (vISA) is used to define a set of virtual instructions for computing task processing behavior in the virtual accelerator model, including but not limited to basic instructions such as access instructions, arithmetic operation instructions, logical operation instructions, shift instructions, selection instructions, comparison instructions, sorting instructions and vector processing instructions. Based on the virtual machine of the embodiment of the present application, it can provide programmers with an application program in which concurrent and cooperative threads are executed to process data. The hardware-specific virtual instruction translator and execution driver adapt the application code to the specific hardware on which it is executed. Therefore, the application program is more portable and easier to develop because the development process is independent of the specific processing hardware.

具体的，如图5所示，所述虚拟机包括：虚拟指令加载器、虚拟加速器执行引擎、虚拟指令模型转换器；其中，Specifically, as shown in FIG5 , the virtual machine includes: a virtual instruction loader, a virtual accelerator execution engine, and a virtual instruction model converter; wherein,

所述虚拟指令加载器用于对应用程序内承载的虚拟指令集ISA进行加载；The virtual instruction loader is used to load the virtual instruction set ISA carried in the application program;

可选地，所述虚拟机能够兼容包含SIMD、SIMT的虚拟ISA，并且所述虚拟加速器执行引擎和虚拟指令模型转换器能够实现对虚拟ISA的解析并完成向SIMD、SIMT类型的厂商级加速器可使用的目标ISA代码的转换。Optionally, the virtual machine is compatible with virtual ISAs including SIMD and SIMT, and the virtual accelerator execution engine and virtual instruction model converter can parse the virtual ISA and complete conversion to target ISA code that can be used by SIMD and SIMT type vendor-level accelerators.

举例说明虚拟机的处理流程：The following example illustrates the processing flow of a virtual machine:

步骤1：虚拟机通过其内部的虚拟指令加载器对应用程序内所承载的虚拟指令进行加载，此过程依托虚拟映射模型进行方法定义，通过虚拟装置映射模型来操作目标处理器或平台的概念模型。应用程序可以定义数据处理应用，其可以采用包含虚拟任务簇或VTC的方式处理，和/或，采用包含任务分区的方式处理。可选地，应用程序可以定义以下至少一项：Step 1: The virtual machine loads the virtual instructions carried by the application through its internal virtual instruction loader. This process relies on the virtual mapping model to define the method and operate the conceptual model of the target processor or platform through the virtual device mapping model. The application can define a data processing application, which can be processed in a manner including virtual task clusters or VTCs, and/or in a manner including task partitions. Optionally, the application can define at least one of the following:

当编译应用程序时，编译器针对应用程序定义虚拟线程行为的部分产生虚拟ISA代码，将虚拟ISA代码翻译为待由加速器执行的目标ISA代码。When compiling an application, the compiler generates virtual ISA code for the portion of the application that defines the behavior of virtual threads, and translates the virtual ISA code into target ISA code to be executed by the accelerator.

步骤2：虚拟加速器执行引擎依托虚拟执行模型与虚拟数据模型，完成与虚拟指令加载器的交互，完成取指及数据缓冲等工作；Step 2: The virtual accelerator execution engine relies on the virtual execution model and virtual data model to complete the interaction with the virtual instruction loader and complete tasks such as instruction fetching and data buffering;

步骤3：虚拟加速器执行引擎实现对计算任务的识别、编排、优化等工作，这一过程与具体厂商硬件的特性、指令集等无关；具体的虚拟加速器执行引擎通过虚拟执行模型基于虚拟数据模型对vISA所对应的计算任务线程执行通用的虚拟计算，完成一系列计算任务的识别、分解以及优化编排，并运行得到虚拟ISA代码。Step 3: The virtual accelerator execution engine implements the identification, arrangement, and optimization of computing tasks. This process is independent of the characteristics and instruction set of the specific manufacturer's hardware. The specific virtual accelerator execution engine performs general virtual calculations on the computing task threads corresponding to vISA based on the virtual data model through the virtual execution model, completes the identification, decomposition, and optimization arrangement of a series of computing tasks, and runs to obtain the virtual ISA code.

其中，虚拟数据模型可以采用虚拟任务簇或VTC进行任务编排管理，也可以采用任务分区进行任务编排管理，具体可参见上述实施例，这里不再赘述。虚拟并行执行模型用于执行虚拟任务簇或VTC(或任务分区)中的虚拟线程。所述虚拟并行执行模型支持执行能够在所需时间实现彼此的协作行为(例如：共享数据和同步)的大量并发虚拟线程的并行处理器和相关联的存储器空间的表示。该虚拟并行执行模型可映射到多种实际处理器和/或处理系统上，如不同厂商的GPGPU、NPU或DLA等。虚拟执行模型可以定义支持不同等级的数据共享和存取类型的若干虚拟存储器空间，以及识别可由虚拟加速器执行引擎执行的所有功能的虚拟ISA。这样虚拟机通过定义和启动虚拟任务簇或VTC(或任务分区)来定义可用于控制虚拟线程执行的虚拟加速器执行引擎。Among them, the virtual data model can use a virtual task cluster or VTC to perform task scheduling management, or it can use a task partition to perform task scheduling management. For details, please refer to the above embodiment, which will not be repeated here. The virtual parallel execution model is used to execute virtual threads in a virtual task cluster or VTC (or task partition). The virtual parallel execution model supports the representation of parallel processors and associated memory spaces for executing a large number of concurrent virtual threads that can achieve each other's cooperative behavior (for example: sharing data and synchronization) at the required time. The virtual parallel execution model can be mapped to a variety of actual processors and/or processing systems, such as GPGPU, NPU or DLA of different manufacturers. The virtual execution model can define several virtual memory spaces that support different levels of data sharing and access types, and identify virtual ISAs of all functions that can be executed by a virtual accelerator execution engine. In this way, the virtual machine defines a virtual accelerator execution engine that can be used to control the execution of virtual threads by defining and starting a virtual task cluster or VTC (or task partition).

虚拟加速器执行引擎包含接收和解译来自虚拟指令加载器的虚拟指令或虚拟指令序列，以及能够并发执行单一VTC的所有线程的执行核心，虚拟加速器可根据执行核心的数量并发运行，每一虚拟处理引擎执行一个虚拟线程。虚拟处理引擎并发执行其各自虚拟线程。虚拟加速器执行引擎保持对其各自虚拟线程的指令指针维护，其中所提及的指令由作为虚拟机的一部分的虚拟ISA定义。The virtual accelerator execution engine includes receiving and interpreting virtual instructions or virtual instruction sequences from the virtual instruction loader, and an execution core capable of concurrently executing all threads of a single VTC. The virtual accelerator can run concurrently according to the number of execution cores, and each virtual processing engine executes one virtual thread. The virtual processing engines concurrently execute their respective virtual threads. The virtual accelerator execution engine maintains instruction pointers for their respective virtual threads, wherein the instructions mentioned are defined by the virtual ISA as part of the virtual machine.

针对虚拟ISA，表1给出了一种虚拟ISA定义的特殊变量的示例(比如：采用前缀“％”来表示特殊变量)。特殊变量与的编程模型有关，其中通过每个虚拟线程在VTC内的位置来识别所述虚拟线程，而VTC又位于某一数目的任务分区中的特定一者内。在一些实施例中，表1的特殊变量映射于虚拟机装置中的虚拟寄存器。For a virtual ISA, Table 1 gives an example of special variables defined by a virtual ISA (e.g., a prefix "%" is used to indicate a special variable). Special variables are related to a programming model, where each virtual thread is identified by its location within a VTC, and the VTC is located in a specific one of a number of task partitions. In some embodiments, the special variables of Table 1 are mapped to virtual registers in a virtual machine device.

表1Table 1

在表1中，VTC和任务分区各自用三维空间来定义，且不同的任务分区在一维空间中循序编号。虚拟ISA特殊变量将在启动VTC时被初始化，且虚拟ISA代码可简单地使用这些变量而无需初始化。In Table 1, VTC and task partitions are each defined using a three-dimensional space, and different task partitions are numbered sequentially in the one-dimensional space. Virtual ISA special variables will be initialized when starting the VTC, and the virtual ISA code can simply use these variables without initialization.

如表1所示，特殊变量的第一个三维向量％vtid＝(％vtid.x，％vtid.y，％vtid.z)定义VTC的维度(以虚拟线程数目计)。VTC中的所有虚拟线程将共享同一％vtid向量。在虚拟机中，预期％vtid向量的值将经由虚拟API函数调用提供给虚拟加速器执行引擎，通过对所述虚拟API函数调用建立VTC的维度。As shown in Table 1, the first three-dimensional vector of special variables %vtid=(%vtid.x, %vtid.y, %vtid.z) defines the dimensions of the VTC (in terms of the number of virtual threads). All virtual threads in the VTC will share the same %vtid vector. In the virtual machine, it is expected that the value of the %vtid vector will be provided to the virtual accelerator execution engine via a virtual API function call, which establishes the dimensions of the VTC.

如表1所示，特殊变量的第二个三维向量％vtcid＝(％vtcid.x，％vtcid.y，％vtcid.z)是指VTC内的给定计算任务的任务ID。在虚拟机中，预期虚拟加速器执行引擎将在启动VTC的每个虚拟线程时分派满足制约条件0≤％vtid.x＜％vtcid.x、0≤％vtid.y＜％vtcid.y和0≤％vtid.z＜％vtcid.z的唯一％vtid向量。As shown in Table 1, the second three-dimensional vector of special variables %vtcid=(%vtcid.x, %vtcid.y, %vtcid.z) refers to the task ID of a given computing task within the VTC. In the virtual machine, it is expected that the virtual accelerator execution engine will dispatch a unique %vtid vector that satisfies the constraints 0≤%vtid.x＜%vtcid.x, 0≤%vtid.y＜%vtcid.y and 0≤%vtid.z＜%vtcid.z when launching each virtual thread of the VTC.

如表1所示，特殊变量的第三个3向量％nvTCid＝(％nvTCid.x，％nvTCid.y，％nvTCid.z)定义任务分区的维度(以VTC的数目计)。在图5所示的虚拟机中，预期％nvTCid向量的值将经由虚拟API函数调用提供给虚拟加速器执行引擎，通过对所述虚拟API函数调用建立VTC的任务分区的维度。As shown in Table 1, the third 3-vector of special variables %nvTCid=(%nvTCid.x, %nvTCid.y, %nvTCid.z) defines the dimensions of the task partition (in terms of the number of VTCs). In the virtual machine shown in Figure 5, it is expected that the values of the %nvTCid vector will be provided to the virtual accelerator execution engine via a virtual API function call, by which the dimensions of the task partition of the VTC are established.

虚拟ISA使得编程人员(或编译器)可定义任意数目的变量以代表正被处理的数据项目，还可以通过类型和指示如何使用变量以及变量在何种程度上共享的虚拟状态空间来定义变量。使用目标平台中可用的寄存器或其它存储器结构来实现变量；在许多目标平台中，状态空间可能会影响对用来实现特定变量的存储器结构的选择。The virtual ISA allows the programmer (or compiler) to define any number of variables to represent the data items being processed, and to define the variables by type and a virtual state space that indicates how the variables are used and to what extent they are shared. Variables are implemented using registers or other memory structures available in the target platform; in many target platforms, the state space may affect the choice of memory structure used to implement a particular variable.

如表2所示，给出了一种虚拟ISA实施例中支持的变量类型的示例，比如。支持的变量类型包括：void类型的位、有符号、无符号和精度类型。As shown in Table 2, an example of variable types supported in a virtual ISA embodiment is given, for example: supported variable types include: bit, signed, unsigned, and precision types of void type.

表2Table 2

名称name 说明illustrate N值枚举N value enumeration .vb<n>.vb<n> Void类型Void Type 1,8,16,32,641,8,16,32,64 .vs<n>.vs<n> 有符号类型Signed Types 1,8,16,32,641,8,16,32,64 .vu<n>.vu<n> 无符号类型Unsigned Types 1,8,16,32,641,8,16,32,64 .vx<n>.vx<n> 精度类型Precision Type 1,8,16,32,641,8,16,32,64

void型的变量是单个位或具有指定长度的位的群组；可根据常规格式来定义带符号整数和无符号整数格式以及浮点格式。可选地，针对每个类型支持多个宽度，其中使用参数<n>来指定宽度。举例来说，.vs16指示16个位的带符号数，.vx32指示32位的精度数等。A variable of type void is a single bit or a group of bits of a specified length; signed integer and unsigned integer formats and floating point formats can be defined according to the conventional format. Optionally, multiple widths are supported for each type, where the width is specified using the parameter <n>. For example, .vs16 indicates a 16-bit signed number, .vx32 indicates a 32-bit precision number, etc.

如表2所示，有些变量类型限于特定的宽度，例如：精度变量必须至少为16个位；且整数类型必须至少为8个位。设计中，虚拟ISA的实现支持所有指定宽度，如果处理器的数据路径和/或寄存器比最宽宽度窄，那么可以使用多个寄存器和处理器循环来处置较宽类型。As shown in Table 2, some variable types are restricted to specific widths, for example: precision variables must be at least 16 bits; and integer types must be at least 8 bits. In the design, the implementation of the virtual ISA supports all specified widths, and if the processor's data path and/or registers are narrower than the widest width, multiple registers and processor cycles can be used to handle wider types.

表3所示，给出了虚拟ISA中支持的虚拟状态空间的示例，比如定义以下九个状态空间，其对应于不同的共享级别和虚拟机装置中的可能的存储位置。Table 3 shows an example of virtual state spaces supported in a virtual ISA. For example, the following nine state spaces are defined, which correspond to different sharing levels and possible storage locations in a virtual machine device.

表3Table 3

名称name 说明illustrate 映射位置Mapping location .vreg.vreg 线程Threads 寄存器register .vsreg.vsreg 线程Threads 寄存器register .vlocal.vlocal 线程Threads 存储器Memory .vshare.vshare 任务簇Task Cluster 存储器Memory .vparam.vparam 任务簇Task Cluster 存储器Memory .vconst.vconst 任务区Mission Area 存储器Memory .vglobal.vglobal 上下文Context 存储器Memory .vtex.vtex 上下文Context 存储器Memory .vsurf.vsurf 上下文Context 存储器Memory

在线程级别上共享前三个状态空间，这意味着每个虚拟线程将具有单独的变量实例，且没有任何虚拟线程将可存取任何其它虚拟线程的实例。比如使用虚拟寄存器(.reg)状态空间来定义要由每个虚拟线程执行的计算的运算数、临时值和/或结果。程序可定义任何数目的虚拟寄存器。可以通过静态编译时间名称而不是计算出来的地址来寻址虚拟寄存器。这个状态空间对应于虚拟机装置中的虚拟寄存器。The first three state spaces are shared at the thread level, which means that each virtual thread will have a separate instance of a variable, and no virtual thread will have access to the instance of any other virtual thread. For example, the virtual register (.reg) state space is used to define operands, temporary values, and/or results of calculations to be performed by each virtual thread. A program can define any number of virtual registers. Virtual registers can be addressed by static compile-time names rather than calculated addresses. This state space corresponds to the virtual registers in the virtual machine device.

特殊寄存器(.sreg)状态空间对应于表1的预定义的特殊变量，其存储在虚拟机中的特殊寄存器中。在一些实施例中，虚拟ISA代码可能不会定义.sreg空间中的任何其它变量，而是可使用特殊变量作为对计算的输入。所有虚拟线程均可读取.sreg状态空间中的任何变量。对于％tid(或其分量)，每个虚拟线程将读取其唯一的线程识别符；对于.sreg状态空间中的其它变量，同一VTC中的所有虚拟线程将读取相同值。The special register (.sreg) state space corresponds to the predefined special variables of Table 1, which are stored in special registers in the virtual machine. In some embodiments, the virtual ISA code may not define any other variables in the .sreg space, but may use special variables as inputs to calculations. All virtual threads can read any variable in the .sreg state space. For %tid (or its components), each virtual thread will read its unique thread identifier; for other variables in the .sreg state space, all virtual threads in the same VTC will read the same value.

每个虚拟线程的局部存储器(.local)变量对应于全局存储器中在每个虚拟线程基础上分配和寻址的区域。换言之，当虚拟线程存取.local变量时，其存取其自身的变量实例，且在一个虚拟线程中对.local变量作出的改变不会影响其它虚拟线程。与.reg和.sreg状态空间不同，每个虚拟线程的局部存储器可使用计算出来的地址来寻址。Each virtual thread's local memory (.local) variables correspond to an area of global memory that is allocated and addressed on a per-virtual thread basis. In other words, when a virtual thread accesses a .local variable, it accesses its own instance of the variable, and changes made to a .local variable in one virtual thread do not affect other virtual threads. Unlike the .reg and .sreg state spaces, each virtual thread's local memory can be addressed using calculated addresses.

虚拟机支持寄存器到寄存器算术，且所有算术运算均操纵一个或一个以上虚拟寄存器运算数(参见表4表示为a、b、c)，以产生写入到虚拟寄存器的结果(参见表4表示为d)。这样算术运算的运算数和目的地始终在虚拟寄存器状态空间.vreg中，除了表1的特殊寄存器(在特殊寄存器状态空间.vsreg中)可用作运算数以外。The virtual machine supports register-to-register arithmetic, and all arithmetic operations manipulate one or more virtual register operands (denoted as a, b, c in Table 4) to produce results written to virtual registers (denoted as d in Table 4). The operands and destinations of such arithmetic operations are always in the virtual register state space .vreg, except that the special registers of Table 1 (in the special register state space .vsreg) can be used as operands.

表4Table 4

名称name 说明illustrate vadd.<TYPE>d,a,bvadd.<TYPE>d,a,b d＝a+bd＝a+b vsub.<TYPE>d,a,bvsub.<TYPE>d,a,b d＝a-bd＝a-b vmul.<TYPE>d,a,bvmul.<TYPE>d,a,b d＝a*bd＝a*b vdiv.<TYPE>d,a,b,cvdiv.<TYPE>d,a,b,c d＝a/b+cd＝a/b+c vmad.<TYPE>d,a,b,cvmad.<TYPE>d,a,b,c d＝a+bd＝a+b vfma.<TYPE>d,a,b,cvfma.<TYPE>d,a,b,c d＝a+bd＝a+b vsad.<TYPE>d,a,bvsad.<TYPE>d,a,b d＝a+bd＝a+b vrem.<TYPE>d,a,bvrem.<TYPE>d,a,b d＝a+bd＝a+b

虚拟寄存器可向每一处理核心提供寄存器堆，提供一组全局可读取的寄存器来存储VTC ID、任务分区ID和VTC及任务分区维度。Virtual registers may provide a register file for each processing core, providing a set of globally readable registers to store VTC IDs, task partition IDs, and VTC and task partition dimensions.

指令缓冲器可逻辑上表示具有P个巷道(lane)的物理加速器局部寄存器堆，其中每一巷道具有某一数目的条目。一个巷道被分派到P个执行引擎中，且不同巷道中的相应条目可用执行相同程序以有助于不同线程的数据填充。The instruction buffer may logically represent a physical accelerator local register file with P lanes, each with a certain number of entries. One lane is dispatched to P execution engines, and corresponding entries in different lanes may be filled with data that facilitates different threads executing the same program.

步骤4：完成编排及优化后的任务级及虚拟指令序列送入虚拟指令模型转换器，通过转换实现向各类厂商硬件的实际ISA转换；Step 4: After the arrangement and optimization, the task level and virtual instruction sequence are sent to the virtual instruction model converter, and the conversion is realized to the actual ISA of hardware of various manufacturers;

步骤5：转换后的ISA(SIMD或SIMT类型)送入与加速器共享内存对齐的数据缓冲区，实现硬件内的实际计算执行。Step 5: The converted ISA (SIMD or SIMT type) is fed into a data buffer aligned with the accelerator’s shared memory to implement the actual computation execution within the hardware.

数据缓冲区可作为共享寄存器堆或虚拟高速缓冲存储器，其具有允许任何执行引擎对共享存储器中的任何位置进行读取或写入的操作。具体的，所述虚拟指令模型转换器将所述目标ISA代码缓存至所述数据缓冲区；其中，所述虚拟指令模型转换器与所述加速器共享内存。The data buffer can be used as a shared register file or a virtual cache memory, which has an operation that allows any execution engine to read or write to any location in the shared memory. Specifically, the virtual instruction model converter caches the target ISA code to the data buffer; wherein the virtual instruction model converter shares memory with the accelerator.

本申请实施例中，针对目前异构SIMD、SIMT指令类型有不同参数长度和不同单指令，提出一种可兼容SIMD、SIMT的一套虚拟指令集，同时提出可解析虚拟指令集并完成向SIMD、SIMT类型的厂商级ISA的虚拟机。采用本申请实施例可以使开发者运用任意支持虚拟指令集的编译器生成可在虚拟机装置中执行的应用程序，虚拟机装置可完成虚拟指令集向不同厂商的SIMD或SIMT类型的ISA进行转换，完成实际计算动作，可实现同一功能应用的一次编译、在不同异构硬件上的随处运行。将极大提升混合异构系统的协同运用效能。In the embodiment of the present application, a set of virtual instruction sets compatible with SIMD and SIMT is proposed for the current heterogeneous SIMD and SIMT instruction types with different parameter lengths and different single instructions, and a virtual machine that can parse the virtual instruction set and complete the conversion to the vendor-level ISA of SIMD and SIMT types is proposed. The use of the embodiment of the present application allows developers to use any compiler that supports the virtual instruction set to generate an application that can be executed in a virtual machine device. The virtual machine device can complete the conversion of the virtual instruction set to the SIMD or SIMT type ISA of different manufacturers, complete the actual computing action, and realize the one-time compilation of the same functional application and run it anywhere on different heterogeneous hardware. It will greatly improve the collaborative application efficiency of hybrid heterogeneous systems.

本发明另一实施例的虚拟机，如图6所示，包括收发器610、处理器600、存储器620及存储在所述存储器620上并可在所述处理器600上运行的程序或指令；所述处理器600执行所述程序或指令时实现上述应用于指令处理方法的步骤，且能达到相同的技术效果，为避免重复这里不再赘述。A virtual machine according to another embodiment of the present invention, as shown in FIG6 , includes a transceiver 610, a processor 600, a memory 620, and a program or instruction stored in the memory 620 and executable on the processor 600; when the processor 600 executes the program or instruction, the steps applied to the instruction processing method are implemented, and the same technical effect can be achieved, which will not be described again here to avoid repetition.

所述收发器610，用于在处理器600的控制下接收和发送数据。The transceiver 610 is used to receive and send data under the control of the processor 600 .

其中，在图6中，总线架构可以包括任意数量的互联的总线和桥，具体由处理器600代表的一个或多个处理器和存储器620代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起，这些都是本领域所公知的，因此，本文不再对其进行进一步描述。总线接口提供接口。收发器610可以是多个元件，即包括发送机和接收机，提供用于在传输介质上与各种其他装置通信的单元。处理器600负责管理总线架构和通常的处理，存储器620可以存储处理器600在执行操作时所使用的数据。Wherein, in FIG6, the bus architecture may include any number of interconnected buses and bridges, specifically one or more processors represented by processor 600 and various circuits of memory represented by memory 620 are linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art and are therefore not further described herein. The bus interface provides an interface. The transceiver 610 may be a plurality of components, i.e., including a transmitter and a receiver, providing a unit for communicating with various other devices on a transmission medium. The processor 600 is responsible for managing the bus architecture and general processing, and the memory 620 may store data used by the processor 600 when performing operations.

本发明实施例的一种可读存储介质，其上存储有程序或指令，所述程序或指令被处理器执行时实现如上所述的指令处理方法中的步骤，且能达到相同的技术效果，为避免重复，这里不再赘述。A readable storage medium according to an embodiment of the present invention stores a program or instruction thereon. When the program or instruction is executed by a processor, the steps in the instruction processing method as described above are implemented and the same technical effect can be achieved. To avoid repetition, it will not be described here.

其中，所述处理器为上述实施例中所述的虚拟机中的处理器。所述可读存储介质，包括计算机可读存储介质，如计算机只读存储器(Read-Only Memory，简称ROM)、随机存取存储器(Random Access Memory，简称RAM)、磁碟或者光盘等。The processor is a processor in the virtual machine described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.

本申请实施例还提供一种计算机程序产品，包括计算机指令，该计算机指令被处理器执行时实现上述指令处理方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。The embodiment of the present application also provides a computer program product, including computer instructions. When the computer instructions are executed by a processor, the various processes of the above-mentioned instruction processing method embodiment are implemented, and the same technical effect can be achieved. To avoid repetition, they are not repeated here.

进一步需要说明的是，此说明书中所描述的终端包括但不限于智能手机、平板电脑等，且所描述的许多功能部件都被称为模块，以便更加特别地强调其实现方式的独立性。It should be further explained that the terminals described in this specification include but are not limited to smart phones, tablet computers, etc., and many of the functional components described are called modules in order to more particularly emphasize the independence of their implementation methods.

本发明实施例中，模块可以用软件实现，以便由各种类型的处理器执行。举例来说，一个标识的可执行代码模块可以包括计算机指令的一个或多个物理或者逻辑块，举例来说，其可以被构建为对象、过程或函数。尽管如此，所标识模块的可执行代码无需物理地位于一起，而是可以包括存储在不同位里上的不同的指令，当这些指令逻辑上结合在一起时，其构成模块并且实现该模块的规定目的。In the embodiment of the present invention, module can be implemented with software so that it can be executed by various types of processors. For example, an executable code module of an identification can include one or more physical or logical blocks of computer instructions, for example, it can be constructed as an object, process or function. Nevertheless, the executable code of the identified module does not need to be physically located together, but can include different instructions stored in different positions, and when these instructions are logically combined together, it constitutes a module and realizes the specified purpose of the module.

实际上，可执行代码模块可以是单条指令或者是许多条指令，并且甚至可以分布在多个不同的代码段上，分布在不同程序当中，以及跨越多个存储器设备分布。同样地，操作数据可以在模块内被识别，并且可以依照任何适当的形式实现并且被组织在任何适当类型的数据结构内。所述操作数据可以作为单个数据集被收集，或者可以分布在不同位置上(包括在不同存储设备上)，并且至少部分地可以仅作为电子信号存在于系统或网络上。In fact, executable code module can be a single instruction or many instructions, and can even be distributed on a plurality of different code segments, distributed among different programs, and distributed across a plurality of memory devices. Similarly, operating data can be identified in the module, and can be implemented and organized in the data structure of any appropriate type according to any appropriate form. The operating data can be collected as a single data set, or can be distributed in different locations (including on different storage devices), and can only be present on a system or network as an electronic signal at least in part.

在模块可以利用软件实现时，考虑到现有硬件工艺的水平，所以可以以软件实现的模块，在不考虑成本的情况下，本领域技术人员都可以搭建对应的硬件电路来实现对应的功能，所述硬件电路包括常规的超大规模集成(VLSI)电路或者门阵列以及诸如逻辑芯片、晶体管之类的现有半导体或者是其它分立的元件。模块还可以用可编程硬件设备，诸如现场可编程门阵列、可编程阵列逻辑、可编程逻辑设备等实现。When a module can be implemented by software, considering the level of existing hardware technology, a person skilled in the art can build a corresponding hardware circuit to implement the corresponding function of the module that can be implemented by software without considering the cost. The hardware circuit includes a conventional very large scale integration (VLSI) circuit or gate array and existing semiconductors such as logic chips, transistors, or other discrete components. The module can also be implemented by a programmable hardware device, such as a field programmable gate array, a programmable array logic, a programmable logic device, etc.

上述范例性实施例是参考该些附图来描述的，许多不同的形式和实施例是可行而不偏离本发明精神及教示，因此，本发明不应被建构成为在此所提出范例性实施例的限制。更确切地说，这些范例性实施例被提供以使得本发明会是完善又完整，且会将本发明范围传达给那些熟知此项技术的人士。在该些图式中，组件尺寸及相对尺寸也许基于清晰起见而被夸大。在此所使用的术语只是基于描述特定范例性实施例目的，并无意成为限制用。如在此所使用地，除非该内文清楚地另有所指，否则该单数形式“一”、“一个”和“该”是意欲将该些多个形式也纳入。会进一步了解到该些术语“包含”及/或“包括”在使用于本说明书时，表示所述特征、整数、步骤、操作、构件及/或组件的存在，但不排除一或更多其它特征、整数、步骤、操作、构件、组件及/或其族群的存在或增加。除非另有所示，陈述时，一值范围包含该范围的上下限及其间的任何子范围。The above exemplary embodiments are described with reference to the accompanying drawings, and many different forms and embodiments are feasible without departing from the spirit and teachings of the present invention. Therefore, the present invention should not be constructed as a limitation of the exemplary embodiments proposed herein. More specifically, these exemplary embodiments are provided so that the present invention will be perfect and complete, and the scope of the present invention will be conveyed to those who are familiar with the technology. In these figures, the component sizes and relative sizes may be exaggerated for clarity. The terms used here are only based on the purpose of describing specific exemplary embodiments and are not intended to be limiting. As used herein, unless the text clearly indicates otherwise, the singular forms "one", "an" and "the" are intended to include these multiple forms. It will be further understood that the terms "including" and/or "comprising" when used in this specification indicate the presence of the features, integers, steps, operations, components and/or components, but do not exclude the presence or increase of one or more other features, integers, steps, operations, components, components and/or their groups. Unless otherwise indicated, when stated, a range of values includes the upper and lower limits of that range and any subranges therebetween.

以上所述是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明所述原理的前提下，还可以作出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principles of the present invention. These improvements and modifications should also be regarded as the scope of protection of the present invention.

Claims

1. A command processing method, comprising:

Get the virtual instruction set ISA carried by the application;

Perform task scheduling management according to the virtual ISA and determine the virtual ISA sequence corresponding to the task to be executed;

The virtual ISA code in the virtual ISA sequence is converted into target ISA codes corresponding to different types of accelerators.

2. The method according to claim 1, wherein the step of performing task scheduling management according to the virtual ISA and determining the virtual ISA sequence corresponding to the task to be executed comprises:

Performing task scheduling management according to the virtual ISA to determine at least one virtual task cluster; wherein each of the virtual task clusters includes at least one virtual thread;

The virtual threads in each virtual task cluster run the same program to obtain a virtual ISA sequence corresponding to the task to be executed.

3. The method according to claim 2, wherein the performing task scheduling management according to the virtual ISA to determine at least one virtual task cluster comprises:

Performing task identification according to the virtual ISA to determine the task to be performed;

According to the tasks to be executed, the virtual ISA is optimized and arranged to determine at least one virtual task cluster.

4. The method according to claim 2, further comprising:

Generate virtual task cluster related information;

Storing the virtual task cluster related information;

The virtual task cluster related information includes at least one of the following:

Identification information of a virtual task cluster, used to indicate a reading position of input data used by the virtual task cluster, and/or a writing position of output data generated by the virtual task cluster;

Dimension information of a virtual task cluster, used to indicate the number of virtual threads included in the virtual task cluster;

The identification information of the virtual thread is used to indicate a reading position of input data used by the virtual thread and/or a writing position of output data generated by the virtual thread.

5. The method according to claim 2, wherein the performing task scheduling management according to the virtual ISA to determine at least one virtual task cluster comprises:

According to the task to be executed, the virtual ISA is task-decomposed to obtain at least one task partition; wherein different task partitions correspond to different parts of the task to be executed;

At least one virtual task cluster is determined according to each of the task partitions; wherein different virtual task clusters in each task partition are independent of each other, and the virtual task clusters in each task partition run the same program.

6. The method according to claim 4, further comprising:

Generate task partition related information;

Storing the task partition related information;

The task partition related information includes at least one of the following:

The identification information of the task partition is used to indicate the reading position of the input data used by the task partition and/or the writing position of the output data generated by the task partition;

The dimension information of the task partition is used to indicate the number of virtual task clusters contained in the task partition.

7. The method according to any one of claims 1 to 5, characterized in that after converting the virtual ISA code in the virtual ISA sequence into target ISA codes corresponding to different types of accelerators, the method further comprises:

The target ISA code is cached in a data buffer corresponding to each accelerator.

8. An instruction processing device, comprising:

An acquisition module, used to acquire a virtual instruction set ISA carried in an application program;

A determination module, used for performing task scheduling management according to the virtual ISA, and determining the virtual ISA sequence corresponding to the task to be executed;

The conversion module is used to convert the virtual ISA code in the virtual ISA sequence into target ISA codes corresponding to different types of accelerators.

9. A virtual machine, comprising: a virtual instruction loader, a virtual accelerator execution engine, and a virtual instruction model converter; wherein:

The virtual instruction loader is used to load the virtual instruction set ISA carried in the application program;

The virtual accelerator execution engine performs task arrangement management on the virtual ISA loaded by the virtual instruction loader, and determines the virtual ISA sequence corresponding to the task to be executed;

The virtual instruction model converter converts the virtual ISA code in the virtual ISA sequence obtained by the virtual accelerator execution engine performing task arrangement management into target ISA codes corresponding to different types of accelerators.

10. The virtual machine according to claim 9, further comprising: a data buffer; wherein

The virtual instruction model converter caches the target ISA code to the data buffer; wherein the virtual instruction model converter shares memory with the accelerator.

11. A virtual machine, comprising: a transceiver, a processor, a memory, and a program or instruction stored in the memory and executable on the processor; characterized in that when the processor executes the program or instruction, the steps of the instruction processing method as described in any one of claims 1 to 7 are implemented.

12. A readable storage medium having a program or instruction stored thereon, wherein when the program or instruction is executed by a processor, the steps of the instruction processing method according to any one of claims 1 to 7 are implemented.

13. A computer program product, characterized in that it comprises computer instructions, and when the computer instructions are executed by a processor, the steps of the instruction processing method according to any one of claims 1 to 7 are implemented.