CN118747085A

CN118747085A - Register allocation method, processor, chip and electronic device

Info

Publication number: CN118747085A
Application number: CN202410963712.7A
Authority: CN
Inventors: 张司放
Original assignee: Haiguang Yunxin Integrated Circuit Design Shanghai Co ltd
Current assignee: Haiguang Yunxin Integrated Circuit Design Shanghai Co ltd
Priority date: 2024-07-17
Filing date: 2024-07-17
Publication date: 2024-10-08
Anticipated expiration: 2044-07-17
Also published as: CN118747085B

Abstract

The embodiment of the present application provides a register allocation method, a processor, a chip and an electronic device, wherein the register allocation method comprises: obtaining an instruction, wherein the instruction is related to a mask operation; decoding the instruction to obtain a micro-operation; allocating a mask physical register for the micro-operation from a mask physical register set according to the operand bit width of the instruction to meet the mask requirement of the instruction; wherein the mask physical register set comprises a plurality of mask physical registers, and the register bit width of each mask physical register is fixed to the first bit width; wherein the first bit width is less than the second bit width, and the value of the second bit width corresponds to the number of elements of a SIMD instruction whose operand bit width is the third bit width under the minimum data type, and the third bit width is the maximum operand bit width of the SIMD instruction. The embodiment of the present application can reduce the idle resources in the mask physical register and reduce the waste of power consumption.

Description

Register allocation method, processor, chip and electronic device

技术领域Technical Field

本申请实施例涉及计算机技术领域，具体涉及一种寄存器分配方法、处理器、芯片及电子设备。The embodiments of the present application relate to the field of computer technology, and specifically to a register allocation method, a processor, a chip, and an electronic device.

背景技术Background Art

在SIMD(Streaming SIMD Extensions，流式单指令多数据扩展指令集)指令的处理过程中，存在一个可选的源操作数，称之为掩码；掩码用于调整SIMD指令的计算结果。当掩码的数据较小时，每次指令的处理过程都会读取大量空数据，造成功耗浪费。因此如何提供技术方案，以减少功耗浪费，成为了本领域技术人员亟待解决的问题。In the processing of SIMD (Streaming SIMD Extensions) instructions, there is an optional source operand, called a mask; the mask is used to adjust the calculation result of the SIMD instruction. When the mask data is small, a large amount of empty data will be read each time the instruction is processed, resulting in power consumption waste. Therefore, how to provide a technical solution to reduce power consumption waste has become a problem that needs to be solved urgently by those skilled in the art.

发明内容Summary of the invention

有鉴于此，本申请实施例提供一种寄存器分配方法、处理器、芯片及电子设备，以减少SIMD指令的处理过程中的功耗浪费。In view of this, embodiments of the present application provide a register allocation method, a processor, a chip, and an electronic device to reduce power consumption during the processing of SIMD instructions.

为实现上述目的，本发明实施例提供如下技术方案：To achieve the above objectives, the embodiments of the present invention provide the following technical solutions:

本申请实施例提供一种寄存器分配方法，包括：The present application embodiment provides a register allocation method, including:

获取指令，所述指令与掩码操作相关；fetching instructions, the instructions being associated with mask operations;

解码所述指令，得到微操作；Decoding the instruction to obtain micro-operations;

根据所述指令的操作数位宽，从掩码物理寄存器集合中为所述微操作，分配掩码物理寄存器，以满足所述指令的掩码需求；其中，所述掩码物理寄存器集合包括多个掩码物理寄存器，各个掩码物理寄存器的寄存器位宽固定为第一位宽；According to the operand bit width of the instruction, a mask physical register is allocated to the micro-operation from a mask physical register set to meet the mask requirement of the instruction; wherein the mask physical register set includes a plurality of mask physical registers, and the register bit width of each mask physical register is fixed to the first bit width;

其中，所述第一位宽小于第二位宽，所述第二位宽的取值与操作数位宽为第三位宽的SIMD指令在最小数据类型下的元素数量相对应，所述第三位宽为SIMD指令的最大操作数位宽。The first bit width is smaller than the second bit width, the value of the second bit width corresponds to the number of elements of a SIMD instruction whose operand bit width is a third bit width under the minimum data type, and the third bit width is the maximum operand bit width of the SIMD instruction.

可选的，所述指令为掩码计算指令；为所述微操作分配的掩码物理寄存器包括：Optionally, the instruction is a mask calculation instruction; and the mask physical register allocated to the micro-operation includes:

一个第一位宽的掩码物理寄存器以及指示位，其中，所述指示位用于指示数据的高位是否有效，数据的高位范围介于第一位宽与第二位宽之间；A mask physical register with a first bit width and an indicator bit, wherein the indicator bit is used to indicate whether the high bit of the data is valid, and the high bit range of the data is between the first bit width and the second bit width;

或者，两个第一位宽的掩码物理寄存器。Alternatively, two first-bit wide mask physical registers.

可选的，所述根据所述指令的操作数位宽，从掩码物理寄存器集合中为所述微操作，分配掩码物理寄存器，以满足所述指令的掩码需求包括：Optionally, allocating a mask physical register from a mask physical register set for the micro-operation according to the operand bit width of the instruction to meet the mask requirement of the instruction includes:

如果所述掩码计算指令的操作数位宽小于或等于第一位宽，为所述掩码计算指令的每一个操作数均分配一个第一位宽的掩码物理寄存器以及指示位，且指示位设置为数据的高位无效。If the operand bit width of the mask calculation instruction is less than or equal to the first bit width, a mask physical register with the first bit width and an indicator bit are allocated to each operand of the mask calculation instruction, and the indicator bit is set to invalidate the high bit of the data.

如果所述掩码计算指令的操作数位宽等于第二位宽，为所述掩码计算指令的每一个操作数均分配两个第一位宽的掩码物理寄存器，或者，一个第一位宽的掩码物理寄存器以及指示位，且指示位设置为数据的高位无效；其中，第二位宽为第一位宽的两倍。If the operand bit width of the mask calculation instruction is equal to the second bit width, two mask physical registers with the first bit width are allocated to each operand of the mask calculation instruction, or one mask physical register with the first bit width and an indicator bit, and the indicator bit is set to invalidate the high bit of the data; wherein the second bit width is twice the first bit width.

可选的，还包括：为所述掩码计算指令分配两套第一位宽的掩码算术逻辑单元，以在同一周期内完成第二位宽的掩码计算指令。Optionally, it also includes: allocating two sets of mask arithmetic logic units with the first bit width to the mask calculation instruction to complete the mask calculation instruction with the second bit width in the same cycle.

可选的，所述指令为SIMD指令，其中，掩码用于调整SIMD指令的计算结果；第二位宽为第一位宽的两倍；为所述微操作分配的掩码物理寄存器包括：Optionally, the instruction is a SIMD instruction, wherein the mask is used to adjust the calculation result of the SIMD instruction; the second bit width is twice the first bit width; and the mask physical register allocated to the micro-operation includes:

一个第一位宽的掩码物理寄存器；A first-bit-wide mask physical register;

如果所述SIMD指令的操作数位宽为第四位宽或第五位宽，为所述微操作分配一个第一位宽的掩码物理寄存器，以满足所述SIMD指令不超过第一位宽的掩码需求；If the operand bit width of the SIMD instruction is the fourth bit width or the fifth bit width, a mask physical register with the first bit width is allocated to the micro-operation to meet the mask requirement of the SIMD instruction not exceeding the first bit width;

其中，第四位宽和第五位宽介于第二位宽和第三位宽之间，且第四位宽小于第五位宽；第四位宽或第五位宽的SIMD指令采用第一位宽的掩码调整计算结果。The fourth bit width and the fifth bit width are between the second bit width and the third bit width, and the fourth bit width is smaller than the fifth bit width; the SIMD instruction of the fourth bit width or the fifth bit width uses the mask of the first bit width to adjust the calculation result.

可选的，所述解码所述指令，得到微操作包括：Optionally, decoding the instruction to obtain a micro-operation includes:

如果所述SIMD指令的操作数位宽为第三位宽，且数据类型不为最小数据类型，则解码所述SIMD指令，得到两个微操作；其中，解码得到的一个微操作的操作数位宽为第五位宽，第五位宽为SIMD指令的第二大操作数位宽；If the operand bit width of the SIMD instruction is the third bit width and the data type is not the minimum data type, the SIMD instruction is decoded to obtain two micro-operations; wherein the operand bit width of one micro-operation obtained by decoding is the fifth bit width, and the fifth bit width is the second largest operand bit width of the SIMD instruction;

所述根据所述指令的操作数位宽，从掩码物理寄存器集合中为所述微操作，分配掩码物理寄存器，以满足所述指令的掩码需求包括：The allocating a mask physical register from a mask physical register set for the micro-operation according to the operand bit width of the instruction to meet the mask requirement of the instruction includes:

为所述两个微操作分配一个第一位宽的掩码物理寄存器，以满足所述SIMD指令不超过第一位宽的掩码需求；Allocating a mask physical register with a first bit width for the two micro-operations to meet the mask requirement of the SIMD instruction not exceeding the first bit width;

如果所述SIMD指令的目标操作数为掩码，所述方法还包括：If the target operand of the SIMD instruction is a mask, the method further comprises:

将所述两个微操作的操作结果进行融合，融合结果写入分配的掩码物理寄存器。The operation results of the two micro-operations are fused, and the fused result is written into the allocated mask physical register.

如果所述SIMD指令的操作数位宽为第三位宽，且数据类型为最小数据类型，则解码所述SIMD指令，得到两个微操作；其中，解码得到的一个微操作的操作数位宽为第五位宽，第五位宽为SIMD指令的第二大操作数位宽；If the operand bit width of the SIMD instruction is the third bit width and the data type is the minimum data type, the SIMD instruction is decoded to obtain two micro-operations; wherein the operand bit width of one micro-operation obtained by decoding is the fifth bit width, and the fifth bit width is the second largest operand bit width of the SIMD instruction;

为每个微操作分别分配一个第一位宽的掩码物理寄存器，以满足所述SIMD指令的第二位宽的掩码需求。A mask physical register with a first bit width is allocated to each micro-operation to meet the mask requirement of a second bit width of the SIMD instruction.

可选的，所述两个微操作包括第一个微操作和第二个微操作；所述为每个微操作分别分配一个第一位宽的掩码物理寄存器包括：Optionally, the two micro-operations include a first micro-operation and a second micro-operation; and the step of allocating a first-bit-width mask physical register to each micro-operation includes:

如果所述SIMD指令的掩码的高位无效，为第一个微操作分配一个第一位宽的掩码物理寄存器以及指示位，且指示位设置为数据的高位无效；If the high bit of the mask of the SIMD instruction is invalid, a mask physical register with the first bit width and an indicator bit are allocated to the first micro-operation, and the indicator bit is set to invalidate the high bit of the data;

为第二个微操作分配一个第一位宽的掩码物理寄存器，且分配的掩码物理寄存器的编码无效。A mask physical register with a first-bit width is allocated to the second micro-operation, and the encoding of the allocated mask physical register is invalid.

可选的，所述第一位宽的大小为32bit；所述第二位宽的大小为64bit；所述第三位宽的大小为512bit；所述最小数据类型为8bit。Optionally, the first bit width is 32 bits; the second bit width is 64 bits; the third bit width is 512 bits; and the minimum data type is 8 bits.

可选的，所述第五位宽为256bit，介于第二位宽与第五位宽之间的第四位宽为128bit。Optionally, the fifth bit width is 256 bits, and the fourth bit width between the second bit width and the fifth bit width is 128 bits.

本申请实施例还提供一种处理器，包括：The present application also provides a processor, including:

指令获取模块，用于获取指令，所述指令与掩码操作相关；An instruction acquisition module, used for acquiring instructions, wherein the instructions are related to mask operations;

指令解码模块，用于解码所述指令，得到微操作；An instruction decoding module, used for decoding the instruction to obtain a micro-operation;

分配单元，用于根据所述指令的操作数位宽，从掩码物理寄存器集合中为所述微操作，分配掩码物理寄存器；其中，所述掩码物理寄存器集合包括多个掩码物理寄存器，各个掩码物理寄存器的寄存器位宽固定为第一位宽；an allocation unit, configured to allocate a mask physical register from a mask physical register set for the micro-operation according to the operand bit width of the instruction; wherein the mask physical register set includes a plurality of mask physical registers, and the register bit width of each mask physical register is fixed to the first bit width;

本申请实施例还提供一种存储介质，所述存储介质存储芯片的设计程序，所述设计程序被执行时实现如上所述的电源设计方法。An embodiment of the present application further provides a storage medium, wherein the storage medium stores a design program of a chip, and when the design program is executed, the power supply design method as described above is implemented.

本申请实施例还提供一种计算机设备，包括如上所述的芯片。An embodiment of the present application also provides a computer device, comprising the chip as described above.

本申请实施例提供的寄存器分配方法，将掩码物理寄存器集合中的各个掩码物理寄存器的寄存器位宽固定为第一位宽，并且第一位宽小于第二位宽，第二位宽的取值与操作数位宽为第三位宽的SIMD指令在最小数据类型下的元素数量相对应，第三位宽为SIMD指令的最大操作数位宽；也就是说，第二位宽为最大操作数位宽的SIMD指令在最小数据类型下需求的掩码位宽，本申请实施例将各个掩码物理寄存器的寄存器位宽固定为小于第二位宽的第一位宽，可以使得掩码物理寄存器具有固定较小的第一位宽，从而掩码物理寄存器的位宽更接近实际需求，减少掩码物理寄存器因设置较大的第二位宽而出现未使用位的情况。进而，在解码与掩码操作相关的指令之后，本申请实施例可以根据指令的操作数位宽，为指令解码得到的微操作分配能够满足掩码需求的掩码物理寄存器，且所分配的掩码物理寄存器以第一位宽为基础；也就是说，在掩码物理寄存器以固定较小的第一位宽为分配基础的情况下，本申请实施例可以根据指令的掩码需求，为指令的微操作动态分配能够满足掩码需求的掩码物理寄存器，优化掩码物理寄存器的资源利用和分配。The register allocation method provided in the embodiment of the present application fixes the register bit width of each masked physical register in the masked physical register set to the first bit width, and the first bit width is smaller than the second bit width, and the value of the second bit width corresponds to the number of elements of the SIMD instruction whose operand bit width is the third bit width under the minimum data type, and the third bit width is the maximum operand bit width of the SIMD instruction; that is, the second bit width is the mask bit width required by the SIMD instruction whose operand bit width is the maximum operand bit width under the minimum data type. The embodiment of the present application fixes the register bit width of each masked physical register to the first bit width smaller than the second bit width, so that the masked physical register has a fixed smaller first bit width, so that the bit width of the masked physical register is closer to the actual demand, and reduces the situation where the masked physical register has unused bits due to setting a larger second bit width. Furthermore, after decoding the instructions related to the mask operation, the embodiments of the present application can allocate mask physical registers that can meet the mask requirements for the micro-operations obtained by decoding the instructions according to the operand bit width of the instructions, and the allocated mask physical registers are based on the first bit width; that is, when the mask physical registers are allocated based on the fixed smaller first bit width, the embodiments of the present application can dynamically allocate mask physical registers that can meet the mask requirements for the micro-operations of the instructions according to the mask requirements of the instructions, thereby optimizing the resource utilization and allocation of the mask physical registers.

因此，本申请实施例将掩码物理寄存器的寄存器位宽固定为较小的第一位宽，并动态分配能够满足掩码需求的掩码物理寄存器，可以有效减少掩码物理寄存器出现未使用位的情况，降低掩码物理寄存器中的空闲资源，并且优化掩码物理寄存器的资源利用和分配，从而可以减少功耗的浪费。Therefore, the embodiment of the present application fixes the register bit width of the mask physical register to a smaller first bit width, and dynamically allocates mask physical registers that can meet the mask requirements, which can effectively reduce the situation where unused bits appear in the mask physical register, reduce the idle resources in the mask physical register, and optimize the resource utilization and allocation of the mask physical register, thereby reducing power consumption waste.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are merely embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on the provided drawings without paying any creative work.

图1为本申请实施例提供的寄存器分配方法的第一流程示意图；FIG1 is a schematic diagram of a first flow chart of a register allocation method provided in an embodiment of the present application;

图2为本申请实施例提供的寄存器分配方法的第二流程示意图；FIG2 is a schematic diagram of a second flow chart of the register allocation method provided in an embodiment of the present application;

图3为本申请实施例提供的寄存器分配方法的第三流程示意图；FIG3 is a schematic diagram of a third flow chart of the register allocation method provided in an embodiment of the present application;

图4为本申请实施例提供的寄存器分配方法的第四流程示意图；FIG4 is a schematic diagram of a fourth flow chart of the register allocation method provided in an embodiment of the present application;

图5为本申请实施例提供的寄存器分配方法的第五流程示意图；FIG5 is a schematic diagram of a fifth flow chart of the register allocation method provided in an embodiment of the present application;

图6为本申请实施例提供的处理器的结构示意图；FIG6 is a schematic diagram of the structure of a processor provided in an embodiment of the present application;

图7为本申请实施例提供的处理器的另一结构示意图；FIG7 is another schematic diagram of the structure of a processor provided in an embodiment of the present application;

图8为本申请实施例提供的处理器的又一结构示意图。FIG8 is another schematic diagram of the structure of the processor provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

随着技术发展带来的对计算设备算力需求的提高，计算设备对SIMD处理器的计算能力需求逐渐加强，这需要SIMD处理器能够并行地处理更多的向量，还要能够准确控制向量中的每个数据变化。As technology advances, computing devices are increasingly demanding computing power from SIMD processors, which requires SIMD processors to process more vectors in parallel and accurately control every data change in the vectors.

在SIMD指令中，存在一个名为掩码的可选源操作数。当存在掩码时，需要根据掩码的每bit(比特)的值，对SIMD指令的计算结果进行一定处理。具体的，掩码的每bit的值可以为1或0，如果掩码的某个bit为1，那么SIMD指令计算结果与该bit对应的元素位域的数据将会正常输出；如果掩码的某个bit为0，那么SIMD指令计算结果与该bit对应的元素位域的数据，将被目标操作数中该元素位域的数据所取代，也称为保留原始数据。In SIMD instructions, there is an optional source operand called a mask. When a mask exists, the calculation result of the SIMD instruction needs to be processed according to the value of each bit of the mask. Specifically, the value of each bit of the mask can be 1 or 0. If a bit of the mask is 1, the data of the element bit field corresponding to the SIMD instruction calculation result and the bit will be output normally; if a bit of the mask is 0, the data of the element bit field corresponding to the SIMD instruction calculation result and the bit will be replaced by the data of the element bit field in the target operand, which is also called retaining the original data.

例如，在一条SMID指令中，掩码(mask)为：0010_1110_1000_0011，源操作数(vec0)为：0x0123_4567_89AB_CDEF，源操作数(vec1)为：0x1358_1050_3011_0210，目标操作数(vec2)的原始值为：0xAAAA_AAAA_AAAA_AAAA。For example, in a SMID instruction, the mask (mask) is: 0010_1110_1000_0011, the source operand (vec0) is: 0x0123_4567_89AB_CDEF, the source operand (vec1) is: 0x1358_1050_3011_0210, and the original value of the destination operand (vec2) is: 0xAAAA_AAAA_AAAA_AAAA.

若该SMID指令为一条加法指令，需要将源操作数(vec0)与源操作数(vec1)相加，经过掩码处理后存储至目标操作数(vec2)。那么源操作数(vec0)与源操作数(vec1)相加后的中间结果为：0x147B_55B7_B9BC_CFFF。将所述中间结果经过掩码处理后存储至目标操作数(vec2)的最终结果为：0xAA7A_55BA_BAAA_AAFF。If the SMID instruction is an addition instruction, the source operand (vec0) and the source operand (vec1) need to be added, and stored in the destination operand (vec2) after masking. Then the intermediate result after adding the source operand (vec0) and the source operand (vec1) is: 0x147B_55B7_B9BC_CFFF. The final result after storing the intermediate result in the destination operand (vec2) after masking is: 0xAA7A_55BA_BAAA_AAFF.

此外，掩码作为一种可选的操作数还可以作为指令的计算结果，即作为目标操作数，例如可以通过掩码指令计算掩码。SIMD处理器内可以包括掩码算术逻辑单元，用于执行掩码的计算，多个掩码之间可以通过所述掩码算术逻辑单元进行一些运算。当不存在掩码时，指令则正常执行。In addition, the mask as an optional operand can also be used as the calculation result of the instruction, that is, as the target operand, for example, the mask can be calculated by a mask instruction. The SIMD processor may include a mask arithmetic logic unit for performing mask calculations, and some operations can be performed between multiple masks through the mask arithmetic logic unit. When there is no mask, the instruction is executed normally.

掩码支持的SIMD指令的向量位宽通常为128bit、256bit、512bit。所述向量位宽为单条指令能够同时处理的数据总量的位数。以向量位宽为512bit的SIMD指令为例，此时，该指令可处理的操作数的最大位宽即为512bit，该指令可处理的操作数的数据类型有8bit、16bit、32bit和64bit共4种。当一个操作数的位宽为最大位宽512bit，数据类型为最小数据类型8bit时，一个操作数内会包含64个元素，而掩码中的1bit用于处理1个元素，因此该操作数的掩码则为64bit；相应的，存储掩码的掩码物理寄存器需要64bit。因此对于一条向量位宽为512bit的SIMD指令，需要的掩码物理寄存器的位宽最大为64bit。The vector bit width of SIMD instructions supported by the mask is usually 128bit, 256bit, and 512bit. The vector bit width is the number of bits of the total amount of data that can be processed simultaneously by a single instruction. Taking a SIMD instruction with a vector bit width of 512bit as an example, at this time, the maximum bit width of the operand that the instruction can process is 512bit, and the data types of the operands that the instruction can process are 8bit, 16bit, 32bit, and 64bit, a total of 4 types. When the bit width of an operand is the maximum bit width of 512bit and the data type is the minimum data type of 8bit, an operand will contain 64 elements, and 1bit in the mask is used to process 1 element, so the mask of the operand is 64bit; accordingly, the mask physical register that stores the mask requires 64bit. Therefore, for a SIMD instruction with a vector bit width of 512bit, the required bit width of the mask physical register is up to 64bit.

进一步的，由于掩码物理寄存器的位宽与处理器架构相关，因此掩码物理寄存器的位宽通常不可变。因此对于任一条SIMD指令，SIMD指令分配的掩码物理寄存器的位宽与最大需要的掩码物理寄存器的位宽相等，均为64bit。这样可以保证掩码物理寄存器可以适配更多种的SMID指令。Furthermore, since the bit width of the mask physical register is related to the processor architecture, the bit width of the mask physical register is usually immutable. Therefore, for any SIMD instruction, the bit width of the mask physical register allocated by the SIMD instruction is equal to the bit width of the maximum required mask physical register, both of which are 64 bits. This ensures that the mask physical register can adapt to more types of SIMID instructions.

此外，掩码也可作为SIMD指令的计算结果。如果SIMD指令的计算结果是掩码，那么还会为该SIMD指令分配1个空闲掩码物理寄存器作为用于存储该掩码的目标寄存器。掩码算术逻辑单元的位宽和掩码寄存器保持一致均为64bit。In addition, the mask can also be used as the calculation result of the SIMD instruction. If the calculation result of the SIMD instruction is a mask, then an idle mask physical register will be allocated to the SIMD instruction as the target register for storing the mask. The bit width of the mask arithmetic logic unit is consistent with that of the mask register, both of which are 64 bits.

但是仅在SMID指令的向量位宽为512bit、且操作数的数据类型为8bit时，掩码物理寄存器的64bit才会被完全使用。对于其他情况，掩码位宽最大仅为32bit，因此其他情况仅使用到了掩码物理寄存器的低32bit。在每次读取掩码物理寄存器时，都会读取掩码物理寄存器内全部64bit的数据，即使存在空闲的数据，也需要读取全部数据。数据写入掩码物理寄存器内数据时，同样仍会刷新掩码物理寄存器内全部64bit的数据，即使存在空闲的高32bit的数据，也同时需要刷新全部数据。由于掩码为64bit时仅为部分情况，对于其他情况，均会过度读写掩码物理寄存器的，造成功耗的浪费。However, the 64 bits of the mask physical register will only be fully used when the vector bit width of the SMID instruction is 512 bits and the data type of the operand is 8 bits. For other cases, the maximum mask bit width is only 32 bits, so only the lower 32 bits of the mask physical register are used in other cases. Every time the mask physical register is read, all 64 bits of data in the mask physical register are read. Even if there is idle data, all data needs to be read. When data is written to the data in the mask physical register, all 64 bits of data in the mask physical register are also refreshed. Even if there is idle high 32 bits of data, all data needs to be refreshed at the same time. Since the mask is 64 bits only in some cases, for other cases, the mask physical register will be over-read and written, resulting in a waste of power consumption.

基于此，本申请实施例考虑通过减小为每个SMID指令分配到单个掩码物理寄存器的位宽，减少掩码物理寄存器内的空闲资源，降低对掩码物理寄存器的过度读写，减少功耗的浪费。Based on this, the embodiments of the present application consider reducing the bit width allocated to a single mask physical register for each SMID instruction, reducing the idle resources in the mask physical register, reducing excessive reading and writing of the mask physical register, and reducing power consumption.

基于上述思路，为解决上述问题，本申请实施例提供一种寄存器分配方法，以降低功耗的浪费。作为可选实现，图1示出了本申请实施例提供的寄存器分配方法的流程示意图。如图1所示，本申请实施例提供的寄存器分配方法包括如下步骤。Based on the above ideas, in order to solve the above problems, the embodiment of the present application provides a register allocation method to reduce the waste of power consumption. As an optional implementation, Figure 1 shows a flow chart of the register allocation method provided by the embodiment of the present application. As shown in Figure 1, the register allocation method provided by the embodiment of the present application includes the following steps.

步骤S10：获取指令，所述指令与掩码操作相关。Step S10: Get instructions, where the instructions are related to the mask operation.

具体的，所述指令可以是以掩码作为源操作数或目标操作数的掩码计算指令，也可以是需要使用掩码处理计算结果的SIMD指令。Specifically, the instruction may be a mask calculation instruction that uses a mask as a source operand or a target operand, or may be a SIMD instruction that requires the use of a mask to process a calculation result.

步骤S20：解码所述指令，得到至少一个微操作。Step S20: Decode the instruction to obtain at least one micro-operation.

步骤S30：根据所述指令的操作数位宽，从掩码物理寄存器集合中为所述微操作，分配掩码物理寄存器，以满足所述指令的掩码需求。Step S30: allocating a mask physical register from a mask physical register set for the micro-operation according to the operand bit width of the instruction to meet the mask requirement of the instruction.

其中，所述掩码物理寄存器集合包括多个掩码物理寄存器，各个掩码物理寄存器的寄存器位宽固定为第一位宽。其中，所述第一位宽小于第二位宽，所述第二位宽的取值与操作数位宽为第三位宽的SIMD指令在最小数据类型下的元素数量相对应，所述第三位宽为SIMD指令的最大操作数位宽。The mask physical register set includes a plurality of mask physical registers, and the register bit width of each mask physical register is fixed to the first bit width. The first bit width is smaller than the second bit width, and the value of the second bit width corresponds to the number of elements of a SIMD instruction whose operand bit width is the third bit width under the minimum data type, and the third bit width is the maximum operand bit width of the SIMD instruction.

需要说明的是，由前述内容可知，最大操作数位宽等同于指令的最大向量位宽，即512bit，因此，在一种可选实现中，所述第三位宽的值为512bit。It should be noted that, from the foregoing content, it can be seen that the maximum operand bit width is equal to the maximum vector bit width of the instruction, that is, 512 bits. Therefore, in an optional implementation, the value of the third bit width is 512 bits.

进一步的，掩码的1bit的数据用于处理指令的计算结果的一个元素，因此当掩码的最大位宽64bit时，所述掩码对应的元素数量，等于操作数的位宽为最大操作数位宽且数据类型为最小数据类型时的元素数量，即所述第二位宽。而由前述内容可知，在一种可选实现中，SIMD指令的所述最小数据类型为8bit。此时，第三位宽的指令在最小数据类型下的元素数量为64个，所述第二位宽的值为64bit。Furthermore, the 1-bit data of the mask is used to process an element of the calculation result of the instruction. Therefore, when the maximum bit width of the mask is 64 bits, the number of elements corresponding to the mask is equal to the number of elements when the bit width of the operand is the maximum operand bit width and the data type is the minimum data type, that is, the second bit width. As can be seen from the foregoing, in an optional implementation, the minimum data type of the SIMD instruction is 8 bits. At this time, the number of elements of the instruction of the third bit width under the minimum data type is 64, and the value of the second bit width is 64 bits.

进一步的，由于所述第一位宽小于第二位宽，即所述第一位宽小于64bit。由前述内容可知，SIMD指令的向量位宽通常为128bit、256bit、512bit，即SIMD指令的操作数的位宽可以为128bit、256bit、512bit，而SIMD指令的操作数的数据类型有8bit、16bit、32bit和64bit。因此，所述第一位宽的可选取值包括8bit、16bit、32bit。Furthermore, since the first bit width is smaller than the second bit width, that is, the first bit width is smaller than 64 bits. As can be seen from the foregoing, the vector bit width of SIMD instructions is usually 128 bits, 256 bits, and 512 bits, that is, the bit width of the operand of SIMD instructions can be 128 bits, 256 bits, and 512 bits, and the data types of the operands of SIMD instructions are 8 bits, 16 bits, 32 bits, and 64 bits. Therefore, the selectable values of the first bit width include 8 bits, 16 bits, and 32 bits.

进一步的，由前述内容可知，当所述掩码物理寄存器的大小为64bit时，容易出现较多资源浪费。而若使掩码物理寄存器的大小过小，将在处理单条指令时需要过多的掩码物理寄存器，掩码物理寄存器数量的上升会导致的指令的调度队列的位宽大幅增加，进而导致的资源暴涨的问题。因此，可选的，所述第一位宽的值为32bit。Furthermore, it can be seen from the above content that when the size of the mask physical register is 64 bits, it is easy to waste more resources. If the size of the mask physical register is too small, too many mask physical registers will be required when processing a single instruction. The increase in the number of mask physical registers will cause the bit width of the instruction scheduling queue to increase significantly, thereby causing the problem of resource explosion. Therefore, optionally, the value of the first bit width is 32 bits.

需要说明的是，由于所述掩码物理寄存器的位宽与处理器的架构有关，因此在一个处理器中，任一掩码物理寄存器的位宽均相等，即全部掩码物理寄存器的位宽均为第一位宽32bit。It should be noted that, since the bit width of the mask physical register is related to the architecture of the processor, in a processor, the bit width of any mask physical register is equal, that is, the bit width of all mask physical registers is the first bit width of 32 bits.

特别的，所述从掩码物理寄存器集合中为所述微操作，分配掩码物理寄存器时，被分配的掩码物理寄存器均为空闲的掩码物理寄存器。这样可以减少掩码物理寄存器的数据冲突和延迟。In particular, when allocating mask physical registers from the mask physical register set for the micro-operation, the allocated mask physical registers are all idle mask physical registers, which can reduce data conflicts and delays of the mask physical registers.

可见，将掩码物理寄存器集合中的各个掩码物理寄存器的寄存器位宽固定为第一位宽，并且第一位宽小于第二位宽，第二位宽的取值与操作数位宽为第三位宽的SIMD指令在最小数据类型下的元素数量相对应，第三位宽为SIMD指令的最大操作数位宽；也就是说，第二位宽为最大操作数位宽的SIMD指令在最小数据类型下需求的掩码位宽，本申请实施例将各个掩码物理寄存器的寄存器位宽固定为小于第二位宽的第一位宽，可以使得掩码物理寄存器具有固定较小的第一位宽，从而掩码物理寄存器的位宽更接近实际需求，减少掩码物理寄存器因设置较大的第二位宽而出现未使用位的情况。进而，在解码与掩码操作相关的指令之后，本申请实施例可以根据指令的操作数位宽，为指令解码得到的微操作分配能够满足掩码需求的掩码物理寄存器，且所分配的掩码物理寄存器以第一位宽为基础；也就是说，在掩码物理寄存器以固定较小的第一位宽为分配基础的情况下，本申请实施例可以根据指令的掩码需求，为指令的微操作动态分配能够满足掩码需求的掩码物理寄存器，优化掩码物理寄存器的资源利用和分配。It can be seen that the register bit width of each masked physical register in the masked physical register set is fixed to the first bit width, and the first bit width is smaller than the second bit width, the value of the second bit width corresponds to the number of elements of the SIMD instruction with the operand bit width of the third bit width under the minimum data type, and the third bit width is the maximum operand bit width of the SIMD instruction; that is, the second bit width is the mask bit width required by the SIMD instruction with the maximum operand bit width under the minimum data type. The embodiment of the present application fixes the register bit width of each masked physical register to a first bit width smaller than the second bit width, so that the masked physical register has a fixed smaller first bit width, so that the bit width of the masked physical register is closer to the actual demand, and reduces the situation where unused bits of the masked physical register are caused by setting a larger second bit width. Furthermore, after decoding the instructions related to the mask operation, the embodiments of the present application can allocate mask physical registers that can meet the mask requirements for the micro-operations obtained by decoding the instructions according to the operand bit width of the instructions, and the allocated mask physical registers are based on the first bit width; that is, when the mask physical registers are allocated based on the fixed smaller first bit width, the embodiments of the present application can dynamically allocate mask physical registers that can meet the mask requirements for the micro-operations of the instructions according to the mask requirements of the instructions, thereby optimizing the resource utilization and allocation of the mask physical registers.

掩码作为一种特殊的操作数，除了用于处理SIMD指令的计算结果，掩码也可作为指令的源操作数或目标操作数。例如，SIMD指令需要的掩码不为预先存储的数据，而是需要通过特定数据计算得到的结果值。这种将掩码作为源操作数或目标操作数的指令即为掩码计算指令。As a special operand, mask can be used not only to process the calculation results of SIMD instructions, but also as the source operand or target operand of instructions. For example, the mask required by SIMD instructions is not pre-stored data, but the result value calculated by specific data. Such instructions that use mask as source operand or target operand are called mask calculation instructions.

对于掩码计算指令以外的SIMD指令，操作数的位宽可以为128bit、256bit或512bit。而掩码的最大大小仅有64bit，且在处理器中处理掩码计算指令与一般的SIMD指令的模块也不同。因此对于不同类型的指令需要有不同的处理。For SIMD instructions other than mask calculation instructions, the bit width of the operand can be 128 bits, 256 bits, or 512 bits. However, the maximum size of the mask is only 64 bits, and the modules that process mask calculation instructions and general SIMD instructions in the processor are also different. Therefore, different types of instructions need to be processed differently.

进一步的，对于指令的类型为掩码计算指令，在一种可选实现中，所述指令为掩码计算指令，所述步骤S30中为所述微操作分配的掩码物理寄存器包括如下任一项：Further, for an instruction type of mask calculation instruction, in an optional implementation, the instruction is a mask calculation instruction, and the mask physical register allocated to the micro-operation in step S30 includes any of the following:

一个第一位宽的掩码物理寄存器以及指示位，其中，所述指示位用于指示数据的高位是否有效，其中数据的高位范围介于第一位宽与第二位宽之间。A mask physical register with a first bit width and an indicator bit, wherein the indicator bit is used to indicate whether the high bit of the data is valid, wherein the high bit range of the data is between the first bit width and the second bit width.

两个第一位宽的掩码物理寄存器。Two first-bit wide mask physical registers.

需要说明的是，“数据的高位”为一个二进制数字中，数值最大的数字位，位于数字的最左端。具体的，所述“数据的高位”是当掩码大小为64bit时，掩码最左端的部分，所述数据的高位范围为数据的高32bit。所述“指示数据或数据的高位是否有效”为指示数据或数据的高位的值是否全为0。It should be noted that the "high bit of data" is the digit with the largest value in a binary number, located at the leftmost end of the number. Specifically, the "high bit of data" is the leftmost part of the mask when the mask size is 64 bits, and the high bit range of the data is the high 32 bits of the data. The "indicating whether the high bit of data or data is valid" indicates whether the value of the high bit of the data or data is all 0.

由于掩码的位宽最大可为64bit，而本申请实施例中，第一位宽的掩码物理寄存器的大小，即第一位宽仅为32bit。因此当掩码大于32bit时，1个第一位宽的掩码物理寄存器无法完整存储，因此需要两种不同的方式，分别用于存储位宽小于等于32bit的掩码或位宽大于32bit的掩码。Since the maximum bit width of the mask can be 64 bits, and in the embodiment of the present application, the size of the first bit width mask physical register, that is, the first bit width, is only 32 bits. Therefore, when the mask is larger than 32 bits, one first bit width mask physical register cannot be fully stored, so two different methods are required, respectively for storing a mask with a bit width less than or equal to 32 bits or a mask with a bit width greater than 32 bits.

具体的，如图2所示，在一种可选实现中，如果所述掩码计算指令的操作数位宽小于或等于第一位宽，所述步骤S30包括步骤S31：为所述掩码计算指令的每一个操作数均分配一个第一位宽的掩码物理寄存器以及指示位，且指示位设置为数据的高位无效。Specifically, as shown in Figure 2, in an optional implementation, if the operand bit width of the mask calculation instruction is less than or equal to the first bit width, the step S30 includes step S31: a mask physical register with the first bit width and an indicator bit are allocated to each operand of the mask calculation instruction, and the indicator bit is set to invalidate the high bit of the data.

由于掩码的最大大小为64bit，而本申请实施例中，第一位宽的掩码物理寄存器的大小，即第一位宽为32bit。因此若当掩码大小小于第一位宽32bit时，仅需一个第一位宽的掩码物理寄存器以及指示位即可存储完整的掩码。Since the maximum size of the mask is 64 bits, and in the embodiment of the present application, the size of the first bit width mask physical register, i.e., the first bit width is 32 bits, if the mask size is less than the first bit width 32 bits, only one first bit width mask physical register and an indicator bit are needed to store the complete mask.

特别的，在一种可选实现中，当掩码的大小小于或等于32bit时，一个第一位宽的掩码物理寄存器即可存储所述掩码，因此，此时不存在“数据的高位”，可以省略所述指示位，即仅为分配一个第一位宽的掩码物理寄存器。In particular, in an optional implementation, when the size of the mask is less than or equal to 32 bits, a mask physical register with the first bit width can store the mask. Therefore, there is no "high bit of data" at this time, and the indicator bit can be omitted, that is, only one mask physical register with the first bit width is allocated.

进一步的，如图2所示，在一种可选实现中，如果所述掩码计算指令的操作数位宽等于第二位宽，所述步骤S30包括步骤S32：为所述掩码计算指令的每一个操作数均分配两个第一位宽的掩码物理寄存器，或者，一个第一位宽的掩码物理寄存器以及指示位，且指示位设置为数据的高位无效。Further, as shown in Figure 2, in an optional implementation, if the operand bit width of the mask calculation instruction is equal to the second bit width, the step S30 includes step S32: allocating two mask physical registers with the first bit width to each operand of the mask calculation instruction, or, one mask physical register with the first bit width and an indicator bit, and the indicator bit is set to invalidate the high bit of the data.

其中，第二位宽为第一位宽的两倍。The second bit width is twice the first bit width.

需要说明的是，所述第二位宽即为掩码可能的最大大小，即64bit。对于大于32bit，小于等于64bit的掩码，1个第一位宽的掩码物理寄存器无法完整存储，因此需要两个第一位宽的掩码物理寄存器才可存储完整的掩码。It should be noted that the second bit width is the maximum possible size of the mask, i.e. 64 bits. For a mask greater than 32 bits and less than or equal to 64 bits, one mask physical register with the first bit width cannot store the entire mask, so two mask physical registers with the first bit width are required to store the entire mask.

此外，当使用值为0的掩码，处理指令的计算结果时，所述指令的计算结果不会被改变。此时为了节省物理寄存器资源，可以通过一个第一位宽的掩码物理寄存器结合指示位的结构存储所述掩码。所述指示位的值包括1或0，1表示数据的高位的值为0，0表示数据的高位的值不为0。这样，通过所述指示位可以表示所述掩码高位部分是否为0，从而使得一个大于32bit的掩码仍可仅使用一个第一位宽的掩码物理寄存器进行存储，在读取所述掩码物理寄存器时，可以减少从掩码物理寄存器内读取的数据量，以减少资源消耗。In addition, when a mask with a value of 0 is used to process the calculation result of an instruction, the calculation result of the instruction will not be changed. At this time, in order to save physical register resources, the mask can be stored in a mask physical register with a first bit width combined with an indicator bit structure. The value of the indicator bit includes 1 or 0, 1 indicates that the value of the high bit of the data is 0, and 0 indicates that the value of the high bit of the data is not 0. In this way, the indicator bit can indicate whether the high-order part of the mask is 0, so that a mask greater than 32 bits can still be stored using only a mask physical register with a first bit width. When reading the mask physical register, the amount of data read from the mask physical register can be reduced to reduce resource consumption.

特别的，在某些特殊情况下，所述掩码的值可能全部为0。例如，当所述指令为掩码计算指令时，源操作数为两个数值相同的源操作数，且进行的计算为异或计算。根据异或计算的计算规则，在完成该掩码计算指令前，即可预见到计算结果的值全部为0。此时，可以不为该全部为0的结果值分配2个第一位宽的掩码物理寄存器，而是分配一个第一位宽的掩码物理寄存器以及指示位，以避免资源浪费。In particular, in some special cases, the values of the mask may all be 0. For example, when the instruction is a mask calculation instruction, the source operands are two source operands with the same value, and the calculation performed is an XOR calculation. According to the calculation rules of the XOR calculation, before completing the mask calculation instruction, it can be foreseen that the values of the calculation results are all 0. At this time, instead of allocating 2 mask physical registers with the first bit width for the result value that is all 0, a mask physical register with the first bit width and an indicator bit can be allocated to avoid wasting resources.

进一步的，由于掩码算术逻辑单元的位宽与掩码物理寄存器的位宽相等，仅为第一位宽，例如32bit。而当所述指令为掩码计算指令且源操作数为第二位宽，例如64bit时，仅使用1个第一位宽的掩码算术逻辑单元需要2个周期才可完成该掩码计算指令。Furthermore, since the bit width of the mask arithmetic logic unit is equal to the bit width of the mask physical register, which is only the first bit width, for example, 32 bits, when the instruction is a mask calculation instruction and the source operand is the second bit width, for example, 64 bits, it takes 2 cycles to complete the mask calculation instruction using only one mask arithmetic logic unit with the first bit width.

为了提高指令处理速度，当所述指令为掩码计算指令且源操作数为第二位宽时，在一种可选实现中，可以为所述掩码计算指令分配两套第一位宽的掩码算术逻辑单元，以在同一周期内完成第二位宽的掩码计算指令。这样每个第一位宽的掩码算术逻辑单元各处理32bit的源操作数，两套第一位宽的掩码算术逻辑单元即可处理共64bit的源操作数，从而实现在一个周期内完成源操作数为第二位宽的掩码计算指令。In order to improve the instruction processing speed, when the instruction is a mask calculation instruction and the source operand is the second bit width, in an optional implementation, two sets of mask arithmetic logic units with the first bit width can be allocated to the mask calculation instruction to complete the mask calculation instruction with the second bit width in the same cycle. In this way, each mask arithmetic logic unit with the first bit width processes a 32-bit source operand, and two sets of mask arithmetic logic units with the first bit width can process a total of 64 bits of source operands, thereby completing the mask calculation instruction with the source operand being the second bit width in one cycle.

进一步的，对于上述掩码计算指令以外的SIMD指令，掩码是用于调整SIMD指令的计算结果；此时针对于SIMD指令，第一位宽可以为第二位宽的一半，为SIMD指令的微操作分配的掩码物理寄存器可以包括如下任一项：Further, for SIMD instructions other than the above-mentioned mask calculation instructions, the mask is used to adjust the calculation result of the SIMD instruction; at this time, for the SIMD instruction, the first bit width may be half of the second bit width, and the mask physical register allocated to the micro-operation of the SIMD instruction may include any of the following:

由于掩码的位宽最大可为64bit，而本申请实施例中，第一位宽的掩码物理寄存器的大小，即第一位宽仅为32bit。因此若SIMD指令的掩码大于32bit时，1个第一位宽的掩码物理寄存器无法完整存储，因此需要两种不同的方式，分别用于存储位宽小于等于32bit的掩码或位宽大于32bit的掩码。Since the maximum bit width of the mask can be 64 bits, and in the embodiment of the present application, the size of the first bit width mask physical register, that is, the first bit width, is only 32 bits. Therefore, if the mask of the SIMD instruction is larger than 32 bits, one first bit width mask physical register cannot be fully stored, so two different methods are required, one for storing a mask with a bit width less than or equal to 32 bits or a mask with a bit width greater than 32 bits.

进一步的，如图3所示，在一种可选实现中，如果所述SIMD指令的操作数位宽为第四位宽或第五位宽，所述步骤S30包括步骤S33：为所述微操作分配一个第一位宽的掩码物理寄存器，以满足所述SIMD指令不超过第一位宽的掩码需求。其中，第四位宽和第五位宽介于第二位宽和第三位宽之间，且第四位宽小于第五位宽；第四位宽或第五位宽的SIMD指令采用第一位宽的掩码调整计算结果。Further, as shown in FIG3 , in an optional implementation, if the operand bit width of the SIMD instruction is the fourth bit width or the fifth bit width, the step S30 includes a step S33: allocating a mask physical register of the first bit width to the micro-operation to meet the mask requirement of the SIMD instruction not exceeding the first bit width. The fourth bit width and the fifth bit width are between the second bit width and the third bit width, and the fourth bit width is smaller than the fifth bit width; the SIMD instruction of the fourth bit width or the fifth bit width uses the mask of the first bit width to adjust the calculation result.

需要说明的是，所述第四位宽即为上述内容中SIMD指令的操作数的位宽128bit。对于SIMD指令，最小数据类型即为8bit，因此对于第四位宽128bit的操作数，所述SIMD指令的操作数包括的元素数量最多为16个。相对应的第四位宽128bit的SIMD指令的掩码的最大位宽即为16bit。It should be noted that the fourth bit width is the bit width of the SIMD instruction operand in the above content, 128 bits. For SIMD instructions, the minimum data type is 8 bits, so for the fourth bit width of the operand 128 bits, the number of elements included in the operand of the SIMD instruction is at most 16. The maximum bit width of the mask of the corresponding SIMD instruction with the fourth bit width of 128 bits is 16 bits.

所述第五位宽即为上述内容中SIMD指令的操作数的位宽256bit。对于SIMD指令，最小数据类型即为8bit，因此对于第五位宽256bit的操作数，所述SIMD指令的操作数包括的元素数量最多为32个。相对应的第五位宽256bit的SIMD指令的掩码的最大位宽即为32bit。The fifth bit width is the bit width of the operand of the SIMD instruction in the above content, 256 bits. For SIMD instructions, the minimum data type is 8 bits, so for the fifth bit width of the operand of 256 bits, the number of elements included in the operand of the SIMD instruction is at most 32. The maximum bit width of the mask of the corresponding SIMD instruction with the fifth bit width of 256 bits is 32 bits.

可见，当SIMD指令的操作数位宽为第四位宽128bit或第五位宽256bit时，SIMD指令的掩码位宽必然小于第一位宽32bit。因此，当SIMD指令的操作数位宽为第四位宽128bit和第五位宽256bit时，仅为所述微操作分配一个第一位宽的掩码物理寄存器即可完整的存储掩码。It can be seen that when the operand bit width of the SIMD instruction is the fourth bit width of 128 bits or the fifth bit width of 256 bits, the mask bit width of the SIMD instruction must be smaller than the first bit width of 32 bits. Therefore, when the operand bit width of the SIMD instruction is the fourth bit width of 128 bits and the fifth bit width of 256 bits, only one mask physical register with the first bit width is allocated to the micro-operation to completely store the mask.

进一步的，如图4所示，在一种可选实现中，如果所述SIMD指令的操作数位宽为第三位宽，且数据类型不为最小数据类型，所述步骤S20包括步骤S21：解码所述SIMD指令，得到两个微操作。其中，解码得到的一个微操作的操作数位宽为第五位宽，第五位宽为SIMD指令的第二大操作数位宽。Further, as shown in Fig. 4, in an optional implementation, if the operand bit width of the SIMD instruction is the third bit width and the data type is not the minimum data type, the step S20 includes a step S21: decoding the SIMD instruction to obtain two micro-operations. The operand bit width of one micro-operation obtained by decoding is the fifth bit width, and the fifth bit width is the second largest operand bit width of the SIMD instruction.

需要说明的是，所述第三位宽为521bit，第五位宽即为256bit。由于在本申请实施例中，一个掩码物理寄存器的位宽为第一位宽32bit，因此一个掩码物理寄存器对应的SIMD指令的操作数的最大位宽，即为操作数的数据类型为最小数据类型8bit时操作数的位宽，此时SIMD指令的操作数的位宽为256bit，即第五位宽。因此可以用于处理操作数与掩码间的运算的向量算术逻辑单元大小为第五位宽256bit，即可处理绝大多数操作数与掩码间的运算。It should be noted that the third bit width is 521 bits, and the fifth bit width is 256 bits. Since in the embodiment of the present application, the bit width of a mask physical register is the first bit width of 32 bits, the maximum bit width of the operand of the SIMD instruction corresponding to a mask physical register is the bit width of the operand when the data type of the operand is the minimum data type of 8 bits. At this time, the bit width of the operand of the SIMD instruction is 256 bits, that is, the fifth bit width. Therefore, the size of the vector arithmetic logic unit that can be used to process operations between operands and masks is the fifth bit width of 256 bits, which can process most operations between operands and masks.

但仍有小部分SIMD指令的操作数的位宽为512bit，此时，仅一个向量算术逻辑单元无法处理512bit的操作数。而若在系统内额外实现一个512bit的向量算术逻辑单元会使得系统架构过于复杂，出现较大的资源浪费。因此选择将512bit的操作数在解码并得到微操作时，分解为两个256bit的微操作，从而可以使用两个第五位宽的向量算术逻辑单元处理一个512bit的操作数。这样，可以保证在不需要改动系统架构的前提下，实现对512bit的操作数的处理。However, there are still a small number of SIMD instructions whose operands have a bit width of 512 bits. At this time, only one vector arithmetic logic unit cannot process 512-bit operands. If an additional 512-bit vector arithmetic logic unit is implemented in the system, the system architecture will be too complicated, resulting in a large waste of resources. Therefore, it is chosen to decompose the 512-bit operand into two 256-bit micro-operations when decoding and obtaining micro-operations, so that two fifth-bit-wide vector arithmetic logic units can be used to process a 512-bit operand. In this way, it can be ensured that the processing of 512-bit operands is achieved without changing the system architecture.

进一步的，所述SIMD指令的操作数位宽为第三位宽，且数据类型不为最小数据类型时，在得到两个第五位宽的微操作后，在一种可选实现中，所述步骤S30包括步骤S34：为所述两个微操作分配一个第一位宽的掩码物理寄存器，以满足所述SIMD指令不超过第一位宽的掩码需求。Furthermore, when the operand bit width of the SIMD instruction is the third bit width and the data type is not the minimum data type, after obtaining two micro-operations of the fifth bit width, in an optional implementation, the step S30 includes step S34: allocating a mask physical register of the first bit width to the two micro-operations to meet the mask requirement of the SIMD instruction not exceeding the first bit width.

需要说明的时，当SMID指令的操作数位宽为512bit而操作数的数据类型不为8bit时，操作数的数据类型最小为16bit，SMID指令的掩码位宽最大为32bit，SMID指令的掩码位宽小于等于32bit。因此，当SMID指令向量位宽为第三位宽且数据类型不为最小数据类型8bit时，仅需分配一个第一位宽的掩码物理寄存器即可完整存储掩码。It should be noted that when the operand bit width of the SMID instruction is 512 bits and the data type of the operand is not 8 bits, the minimum data type of the operand is 16 bits, the maximum mask bit width of the SMID instruction is 32 bits, and the mask bit width of the SMID instruction is less than or equal to 32 bits. Therefore, when the SMID instruction vector bit width is the third bit width and the data type is not the minimum data type 8 bits, only one mask physical register with the first bit width needs to be allocated to completely store the mask.

在一种可选实现中，在步骤S34之后，所述方法还包括步骤S341：如果所述SIMD指令的目标操作数为掩码，将所述两个微操作的操作结果进行融合，融合结果写入分配的掩码物理寄存器。In an optional implementation, after step S34, the method further includes step S341: if the target operand of the SIMD instruction is a mask, fusing the operation results of the two micro-operations, and writing the fusion result into the allocated mask physical register.

需要说明的是，在数据类型不为最小数据类型8bit时，每一个第五位宽256bit的微操作各需要一个16bit的掩码。而若分别将两个16bit的掩码存入第一位宽的掩码物理寄存器，两个16bit的掩码可能会相互覆盖。因此为了避免出现存储错误，可以先将两个16bit的掩码融合为一个完整的32bit掩码，再进行存储。It should be noted that when the data type is not the minimum data type of 8 bits, each micro-operation with a fifth bit width of 256 bits requires a 16-bit mask. If two 16-bit masks are stored in the first bit width mask physical register, the two 16-bit masks may overwrite each other. Therefore, in order to avoid storage errors, the two 16-bit masks can be merged into a complete 32-bit mask before storage.

进一步的，如图4所示，如果所述SIMD指令的操作数位宽为第三位宽，且数据类型为最小数据类型时，与上述内容相同的，在一种可选实现中，所述步骤S20包括步骤S21：解码所述SIMD指令，得到两个微操作。其中，解码得到的一个微操作的操作数位宽为第五位宽，第五位宽为SIMD指令的第二大操作数位宽。Further, as shown in FIG4 , if the operand bit width of the SIMD instruction is the third bit width and the data type is the minimum data type, similar to the above content, in an optional implementation, the step S20 includes step S21: decoding the SIMD instruction to obtain two micro-operations. The operand bit width of one micro-operation obtained by decoding is the fifth bit width, and the fifth bit width is the second largest operand bit width of the SIMD instruction.

如图4所示，在所述SIMD指令的操作数位宽为第三位宽，且数据类型为最小数据类型时，在得到两个第五位宽的微操作后，在一种可选实现中，所述步骤S30包括步骤S35：为每个微操作分别分配一个第一位宽的掩码物理寄存器，以满足所述SIMD指令的第二位宽的掩码需求。其中，第二位宽为第一位宽的两倍。As shown in FIG4 , when the operand bit width of the SIMD instruction is the third bit width and the data type is the minimum data type, after obtaining two micro-operations of the fifth bit width, in an optional implementation, the step S30 includes a step S35: allocating a mask physical register of the first bit width to each micro-operation to meet the mask requirement of the second bit width of the SIMD instruction, wherein the second bit width is twice the first bit width.

需要说明的是，当SMID指令的操作数的位宽为512bit且操作数的数据类型为最小数据类型8bit时，SMID指令的掩码大小为64bit。此时，仅使用一个第一位宽32bit的掩码物理寄存器无法完全存储64bit掩码。而每个微操作的位宽均为256bit，每个微操作需要的掩码的位宽为32bit。因此可以为每个微操作各分配1个掩码物理寄存器，以在两个第一位宽32bit的掩码物理寄存器中分别存储32bit的掩码，以完整存储上述64bit的掩码。使用两个掩码物理寄存器而非单一的一个掩码物理寄存器，还可避免先融合2个32bit的掩码为64bit，再去将其写入掩码物理寄存器时引入的延迟。It should be noted that when the bit width of the operand of the SMID instruction is 512 bits and the data type of the operand is the minimum data type of 8 bits, the mask size of the SMID instruction is 64 bits. At this time, only one mask physical register with a first bit width of 32 bits cannot fully store the 64-bit mask. The bit width of each micro-operation is 256 bits, and the bit width of the mask required for each micro-operation is 32 bits. Therefore, one mask physical register can be allocated to each micro-operation to store 32-bit masks in two mask physical registers with a first bit width of 32 bits respectively, so as to completely store the above 64-bit mask. Using two mask physical registers instead of a single mask physical register can also avoid the delay introduced when first fusing two 32-bit masks into 64 bits and then writing them into the mask physical register.

特别的，如图5所示，在所述SIMD指令的操作数位宽为第三位宽，且数据类型为最小数据类型时，且所述SIMD指令的掩码的高位无效，即上述SIMD指令的64bit的掩码的高32位的值全部为0时，在一种可选实现中，所述步骤S35包括步骤S351：为第一个微操作分配一个第一位宽的掩码物理寄存器以及指示位，且指示位设置为数据的高位无效；为第二个微操作分配一个第一位宽的掩码物理寄存器，且分配的掩码物理寄存器的编码无效。In particular, as shown in Figure 5, when the operand bit width of the SIMD instruction is the third bit width and the data type is the minimum data type, and the high bit of the mask of the SIMD instruction is invalid, that is, when the values of the high 32 bits of the 64-bit mask of the above SIMD instruction are all 0, in an optional implementation, the step S35 includes step S351: allocating a mask physical register with the first bit width and an indicator bit to the first micro-operation, and the indicator bit is set to invalidate the high bit of the data; allocating a mask physical register with the first bit width to the second micro-operation, and the encoding of the allocated mask physical register is invalid.

此时，对于与掩码的低32位对应的第一微操作，可以被正常分配到1个32bit的掩码物理寄存器以及一个指示位，且指示位设置为数据的高位无效，并存储该掩码的低32位。而与掩码的高32位对应的第二微操作在被分配一个第一位宽的掩码物理寄存器时，该掩码物理寄存器的编码无效；此时掩码物理存储器中不存在掩码的高32位，在读取掩码的高32位时会直接得到32bit的0值数据。这样在掩码位宽为64位，且掩码的高32位的值全为0时，可以减少掩码物理寄存器的使用，减少系统内资源消耗，缩短读取掩码所需的开销。At this time, for the first micro-operation corresponding to the lower 32 bits of the mask, it can be normally allocated to a 32-bit mask physical register and an indicator bit, and the indicator bit is set to invalidate the high bit of the data, and store the lower 32 bits of the mask. When the second micro-operation corresponding to the upper 32 bits of the mask is allocated a mask physical register with the first bit width, the encoding of the mask physical register is invalid; at this time, the upper 32 bits of the mask do not exist in the mask physical memory, and 32 bits of 0 value data will be directly obtained when reading the upper 32 bits of the mask. In this way, when the mask bit width is 64 bits and the values of the upper 32 bits of the mask are all 0, the use of the mask physical register can be reduced, the resource consumption in the system can be reduced, and the overhead required to read the mask can be shortened.

进一步的，本申请实施例提供的寄存器分配方法还可减少程序的等待时间提高运行效率。例如，对于1个操作数位宽为第三位宽(512bit)的指令A1，所述指令A1需要先使用位宽为512bit的源操作数vec1与源操作数vec2，并配合掩码mask1，计算得到掩码mask2并存储。之后需要将源操作数vec2与源操作数vec3相加，然后配合掩码mask2计算结果，结果存储为源操作数vec1。Furthermore, the register allocation method provided in the embodiment of the present application can also reduce the waiting time of the program and improve the operation efficiency. For example, for an instruction A1 with an operand bit width of the third bit width (512 bits), the instruction A1 needs to first use the source operands vec1 and vec2 with a bit width of 512 bits, and cooperate with mask mask1 to calculate mask mask2 and store it. After that, it is necessary to add the source operands vec2 and vec3, and then calculate the result with mask mask2, and the result is stored as the source operand vec1.

若使用256bit数据通路及64bit的掩码物理寄存器，此时需要将上述指令分为4个微操作，分别为：If a 256-bit data path and a 64-bit mask physical register are used, the above instruction needs to be divided into four micro-operations, namely:

微操作U1：根据源操作数vec1的高位与源操作数vec2的高位，配合掩码mask1计算得到掩码mask2的高位，存储至缓存。Micro-operation U1: According to the high bits of source operand vec1 and source operand vec2, the high bits of mask mask2 are calculated with mask mask1 and stored in the cache.

微操作U2：根据源操作数vec1的低位与源操作数vec2低位，配合掩码mask1计算得到掩码mask2的低位，掩码mask2的低位和缓存中掩码mask2的高位存储为掩码mask2，并存储至掩码物理寄存器。Micro-operation U2: According to the low bits of source operand vec1 and source operand vec2, the low bits of mask mask2 are calculated in conjunction with mask mask1. The low bits of mask mask2 and the high bits of mask mask2 in the cache are stored as mask mask2 and stored in the mask physical register.

微操作U3：源操作数vec2的高位与源操作数vec3的高位相加，配合掩码mask2计算结果，并存储为源操作数vec1的高位。Micro-operation U3: The high bits of source operand vec2 are added to the high bits of source operand vec3, the result is calculated with mask mask2, and stored as the high bits of source operand vec1.

微操作U4：源操作数vec2的低位与源操作数vec3的低位相加，配合掩码mask2计算结果，并存储为源操作数vec1的低位。Micro-operation U4: Add the low bits of source operand vec2 and source operand vec3, calculate the result with mask mask2, and store it as the low bits of source operand vec1.

在上述情况下，所述微操作U2需要在微操作U1的计算过程开始后，才可调度执行。而微操作U3和微操作U4需要在所述微操作U2的计算过程开始后，才可调度执行。在其他操作数默认都已经准备完成的情况下，完成上述微操作U1至U4共需要7个周期。In the above case, the micro-operation U2 can be scheduled for execution only after the calculation process of the micro-operation U1 starts. The micro-operations U3 and U4 can be scheduled for execution only after the calculation process of the micro-operation U2 starts. When other operands are prepared by default, it takes a total of 7 cycles to complete the above micro-operations U1 to U4.

而若使用本申请实施例提供的寄存器分配方法，上述指令被分为4个微操作分别为：If the register allocation method provided in the embodiment of the present application is used, the above instruction is divided into 4 micro-operations:

微操作U5：根据源操作数vec1的高位与源操作数vec2的高位，配合掩码mask1计算得到掩码mask2的高位，并存储至掩码物理寄存器。Micro-operation U5: According to the high bits of source operand vec1 and source operand vec2, the high bits of mask mask2 are calculated with mask mask1 and stored in the mask physical register.

微操作U6：根据源操作数vec1的低位与源操作数vec2低位，配合掩码mask1计算得到掩码mask2的低位，并存储至掩码物理寄存器。Micro-operation U6: According to the low bits of source operand vec1 and source operand vec2, the low bits of mask mask2 are calculated with mask mask1 and stored in the mask physical register.

微操作U7：源操作数vec2的高位与源操作数vec3的高位相加，配合掩码mask2计算结果，并存储为源操作数vec1的高位。Micro-operation U7: Add the high bits of source operand vec2 and source operand vec3, calculate the result with mask mask2, and store it as the high bits of source operand vec1.

微操作U8：源操作数vec2的低位与源操作数vec3的低位相加，配合掩码mask2计算结果，并存储为源操作数vec1的低位。Micro-operation U8: Add the low bits of source operand vec2 and source operand vec3, calculate the result with mask mask2, and store it as the low bits of source operand vec1.

在本申请实施例提供的寄存器分配方法下，上述指令相关的掩码的高位和低位，被分配存储至2个掩码物理寄存器。因此所述微操作U1和微操作U2可以同时开始调度运行，所述微操作U7微操作U8则可分别在所述微操作U1和微操作U2的计算过程开始后，分别调度运行。在其他操作数默认都已经准备完成的情况下，完成上述微操作U5至U8仅需6个周期。可以缩短2个周期的运行时间，提高了指令的运行效率。Under the register allocation method provided in the embodiment of the present application, the high and low bits of the mask related to the above instruction are allocated and stored in 2 mask physical registers. Therefore, the micro-operation U1 and micro-operation U2 can start scheduling operation at the same time, and the micro-operation U7 and micro-operation U8 can be respectively scheduled and operated after the calculation process of the micro-operation U1 and micro-operation U2 starts. In the case where other operands are all prepared by default, it only takes 6 cycles to complete the above micro-operations U5 to U8. The running time of 2 cycles can be shortened, and the operating efficiency of the instruction is improved.

本申请实施例提供的寄存器分配方法，将掩码物理寄存器集合中的各个掩码物理寄存器的寄存器位宽由64bit减小为32bit，可以减小掩码物理寄存器具有位宽，从而掩码物理寄存器的位宽更接近实际需求，减少掩码物理寄存器因设置较大位宽而出现过多未使用位的情况。进而，在解码与掩码操作相关的指令之后，本申请实施例可以根据指令的操作数位宽，为指令解码得到的微操作分配能够满足掩码需求的掩码物理寄存器；也就是说，在掩码物理寄存器以较小位宽为分配基础的情况下，本申请实施例可以根据指令的掩码需求，为指令的微操作动态分配能够满足掩码需求的掩码物理寄存器，优化掩码物理寄存器的资源利用和分配。The register allocation method provided in the embodiment of the present application reduces the register bit width of each mask physical register in the mask physical register set from 64 bits to 32 bits, which can reduce the bit width of the mask physical register, so that the bit width of the mask physical register is closer to the actual demand, and reduce the situation where too many unused bits of the mask physical register appear due to setting a larger bit width. Furthermore, after decoding the instructions related to the mask operation, the embodiment of the present application can allocate mask physical registers that can meet the mask requirements for the micro-operations obtained by decoding the instructions according to the operand bit width of the instructions; that is, in the case where the mask physical register is allocated based on a smaller bit width, the embodiment of the present application can dynamically allocate mask physical registers that can meet the mask requirements for the micro-operations of the instructions according to the mask requirements of the instructions, thereby optimizing the resource utilization and allocation of the mask physical registers.

基于上述方法，本申请实施例考虑通过减小为每个SMID指令分配到单个掩码物理寄存器的位宽，减少掩码物理寄存器内的空闲资源，降低对掩码物理寄存器的过度读写，减少功耗的浪费。本申请实施例还提供一种处理器以降低功耗的浪费。作为可选实现，图6示出了本申请实施例提供的处理器的结构示意图。如图6所示，本申请实施例提供的处理器包括：Based on the above method, the embodiment of the present application considers reducing the bit width allocated to a single mask physical register for each SMID instruction, reducing the idle resources in the mask physical register, reducing excessive reading and writing of the mask physical register, and reducing the waste of power consumption. The embodiment of the present application also provides a processor to reduce the waste of power consumption. As an optional implementation, FIG6 shows a schematic diagram of the structure of the processor provided by the embodiment of the present application. As shown in FIG6, the processor provided by the embodiment of the present application includes:

指令获取模块100，用于获取指令，所述指令与掩码操作相关。The instruction acquisition module 100 is used to acquire instructions, where the instructions are related to mask operations.

指令解码模块200，用于解码所述指令，得到微操作。The instruction decoding module 200 is used to decode the instruction to obtain a micro-operation.

分配单元300，用于根据所述指令的操作数位宽，从掩码物理寄存器集合中为所述微操作，分配掩码物理寄存器400。其中，掩码物理寄存器集合包括多个掩码物理寄存器400。The allocation unit 300 is used to allocate a mask physical register 400 for the micro-operation from a mask physical register set according to the operand bit width of the instruction. The mask physical register set includes a plurality of mask physical registers 400 .

掩码物理寄存器400，用于存储掩码，位宽为第一位宽。The mask physical register 400 is used to store a mask, and has a bit width of the first bit.

其中，所述第一位宽小于第二位宽，所述第二位宽的取值与操作数位宽为第三位宽的指令在最小数据类型下的元素数量相对应，所述第三位宽为多个操作数位宽中的最大操作数位宽。Among them, the first bit width is smaller than the second bit width, the value of the second bit width corresponds to the number of elements under the minimum data type of an instruction whose operand bit width is a third bit width, and the third bit width is the maximum operand bit width among multiple operand bit widths.

进一步的，如图6所示，在一种可选实现中，所述指令为掩码计算指令；为所述微操作分配的掩码物理寄存器400包括：Further, as shown in FIG6 , in an optional implementation, the instruction is a mask calculation instruction; the mask physical register 400 allocated to the micro-operation includes:

一个第一位宽的掩码物理寄存器以及指示位，其中，所述指示位用于指示数据的高位是否有效，数据的高位范围介于第一位宽与第二位宽之间；或者，两个第一位宽的掩码物理寄存器。A mask physical register with the first bit width and an indicator bit, wherein the indicator bit is used to indicate whether the high bit of the data is valid, and the high bit range of the data is between the first bit width and the second bit width; or, two mask physical registers with the first bit width.

进一步的，在一种可选实现中，所述分配单元300，用于如果所述掩码计算指令的操作数位宽小于或等于第一位宽，为所述掩码计算指令的每一个操作数均分配一个第一位宽的掩码物理寄存器以及指示位，且指示位设置为数据的高位无效。Furthermore, in an optional implementation, the allocation unit 300 is used to allocate a mask physical register with the first bit width and an indicator bit to each operand of the mask calculation instruction if the operand bit width of the mask calculation instruction is less than or equal to the first bit width, and the indicator bit is set to invalidate the high bit of the data.

进一步的，在一种可选实现中，所述分配单元300，用于如果所述掩码计算指令的操作数位宽等于第二位宽，为所述掩码计算指令的每一个操作数均分配两个第一位宽的掩码物理寄存器，或者，一个第一位宽的掩码物理寄存器以及指示位，且指示位设置为数据的高位无效；其中，第二位宽为第一位宽的两倍；其中，所述掩码计算指令采用两套第一位宽的掩码算术逻辑单元，以在同一周期内完成第二位宽的掩码计算指令。Further, in an optional implementation, the allocation unit 300 is used to allocate two mask physical registers with the first bit width to each operand of the mask calculation instruction if the operand bit width of the mask calculation instruction is equal to the second bit width, or, a mask physical register with the first bit width and an indicator bit, and the indicator bit is set to invalidate the high bit of the data; wherein the second bit width is twice the first bit width; wherein the mask calculation instruction uses two sets of mask arithmetic logic units with the first bit width to complete the mask calculation instruction with the second bit width in the same cycle.

进一步的，在一种可选实现中，所述指令为SIMD指令，其中，掩码用于调整SIMD指令的计算结果；第二位宽为第一位宽的两倍；为所述微操作分配的掩码物理寄存器400包括：一个第一位宽的掩码物理寄存器；或者，两个第一位宽的掩码物理寄存器。Further, in an optional implementation, the instruction is a SIMD instruction, wherein the mask is used to adjust the calculation result of the SIMD instruction; the second bit width is twice the first bit width; the mask physical register 400 allocated to the micro-operation includes: a mask physical register with the first bit width; or, two mask physical registers with the first bit width.

进一步的，在一种可选实现中，所述分配单元300，用于如果所述SIMD指令的操作数位宽为第四位宽或第五位宽，为所述微操作分配一个第一位宽的掩码物理寄存器，以满足所述SIMD指令不超过第一位宽的掩码需求；其中，第四位宽和第五位宽介于第二位宽和第三位宽之间，且第四位宽小于第五位宽；第四位宽或第五位宽的SIMD指令采用第一位宽的掩码调整计算结果。Further, in an optional implementation, the allocation unit 300 is used to allocate a mask physical register with a first bit width to the micro-operation if the operand bit width of the SIMD instruction is a fourth bit width or a fifth bit width, so as to meet the mask requirement of the SIMD instruction not exceeding the first bit width; wherein the fourth bit width and the fifth bit width are between the second bit width and the third bit width, and the fourth bit width is smaller than the fifth bit width; and the SIMD instruction with a fourth bit width or a fifth bit width uses a mask with a first bit width to adjust the calculation result.

进一步的，在一种可选实现中，所述指令解码模块200，用于如果所述SIMD指令的操作数位宽为第三位宽，且数据类型不为最小数据类型，则解码所述SIMD指令，得到两个微操作；其中，解码得到的一个微操作的操作数位宽为第五位宽，第五位宽为SIMD指令的第二大操作数位宽。Further, in an optional implementation, the instruction decoding module 200 is used to decode the SIMD instruction to obtain two micro-operations if the operand bit width of the SIMD instruction is the third bit width and the data type is not the minimum data type; wherein the operand bit width of one micro-operation obtained by decoding is the fifth bit width, and the fifth bit width is the second largest operand bit width of the SIMD instruction.

进一步的，在一种可选实现中，所述分配模块300，用于为所述两个微操作分配一个第一位宽的掩码物理寄存器，以满足所述SIMD指令不超过第一位宽的掩码需求；其中，所述两个微操作的操作结果融合之后的融合结果，写入分配的掩码物理寄存器。Furthermore, in an optional implementation, the allocation module 300 is used to allocate a mask physical register with a first bit width for the two micro-operations to meet the mask requirement of the SIMD instruction not exceeding the first bit width; wherein, the fusion result after the operation results of the two micro-operations are fused is written into the allocated mask physical register.

进一步的，如图7所示，在一种可选实现中，所述处理器还包括结果融合模块301，用于将所述两个微操作的操作结果进行融合，融合结果写入分配的掩码物理寄存器400。Furthermore, as shown in FIG. 7 , in an optional implementation, the processor further includes a result fusion module 301 for fusing the operation results of the two micro-operations, and writing the fusion result into the allocated mask physical register 400 .

进一步的，在一种可选实现中，所述指令解码模块200，如果所述SIMD指令的操作数位宽为第三位宽，且数据类型为最小数据类型，则解码所述SIMD指令，得到两个微操作；其中，解码得到的一个微操作的操作数位宽为第五位宽，第五位宽为SIMD指令的第二大操作数位宽。Further, in an optional implementation, the instruction decoding module 200 decodes the SIMD instruction to obtain two micro-operations if the operand bit width of the SIMD instruction is the third bit width and the data type is the minimum data type; wherein the operand bit width of one micro-operation obtained by decoding is the fifth bit width, and the fifth bit width is the second largest operand bit width of the SIMD instruction.

所述分配单元300，为每个微操作分别分配一个第一位宽的掩码物理寄存器，以满足所述SIMD指令的第二位宽的掩码需求。The allocation unit 300 allocates a mask physical register of the first bit width to each micro-operation to meet the mask requirement of the second bit width of the SIMD instruction.

进一步的，在一种可选实现中，所述分配单元300，用于如果所述SIMD指令的掩码的高位无效，为第一个微操作分配一个第一位宽的掩码物理寄存器以及指示位，且指示位设置为数据的高位无效；为第二个微操作分配一个第一位宽的掩码物理寄存器，且分配的掩码物理寄存器的编码无效。Further, in an optional implementation, the allocation unit 300 is used to allocate a mask physical register with a first bit width and an indicator bit to the first micro-operation if the high bit of the mask of the SIMD instruction is invalid, and the indicator bit is set to invalid high bit of the data; allocate a mask physical register with a first bit width to the second micro-operation, and the encoding of the allocated mask physical register is invalid.

进一步的，由前述内容可知，在一种可选实现中，所述第一位宽的大小为32bit；所述第二位宽的大小为64bit；所述第三位宽的大小为64bit，所述最小数据类型为8bit。所述第四位宽为128bit，所述第五位宽为256bit。Further, from the foregoing content, it can be seen that in an optional implementation, the first bit width is 32 bits, the second bit width is 64 bits, the third bit width is 64 bits, the minimum data type is 8 bits, the fourth bit width is 128 bits, and the fifth bit width is 256 bits.

进一步的，在一种可选实现中，所述处理器除上述与寄存器分配流程相关的模块外，还包括多个周边模块。具体的，如图8所示，所述处理器还包括：指令下发模块600，用于将所述指令，及所述掩码物理寄存器400的分配要求发送至后续的其他模块。Furthermore, in an optional implementation, the processor includes a plurality of peripheral modules in addition to the above modules related to the register allocation process. Specifically, as shown in FIG8 , the processor also includes: an instruction issuing module 600, which is used to send the instruction and the allocation requirement of the mask physical register 400 to other subsequent modules.

浮点运算单元700，用于调度所述操作数及掩码，并处理操作数或掩码的数值计算。The floating point operation unit 700 is used to schedule the operands and masks and process numerical calculations of the operands or masks.

进一步的，所述浮点运算单元700内还包括：调度单元710，用于调度所述浮点运算单元700内其他单元间的信号传输。掩码算术逻辑单元501，用于处理所述掩码的数值计算。需要说明的是，为了提高掩码的计算效率，所述掩码算术逻辑单元501的数量为多个。向量算术逻辑单元720，用于处理所述操作数的数值计算。Furthermore, the floating point operation unit 700 further includes: a scheduling unit 710, which is used to schedule signal transmission between other units in the floating point operation unit 700. A mask arithmetic logic unit 501, which is used to process the numerical calculation of the mask. It should be noted that in order to improve the calculation efficiency of the mask, the number of the mask arithmetic logic units 501 is multiple. A vector arithmetic logic unit 720, which is used to process the numerical calculation of the operand.

所述处理器还包括，向量物理寄存器810，用于存储所述操作数。定点模块910，所述定点模块910为执行定点数运算的处理器，用于处理没有小数点或小数点固定位置的数字的运算。加载模块920，用于处理所述浮点运算单元700以及定点模块910，与内存或寄存器间的传输数据。The processor further includes a vector physical register 810 for storing the operands. A fixed-point module 910 is a processor for performing fixed-point operations, and is used to process operations on numbers without decimal points or with fixed decimal points. A loading module 920 is used to process data transmitted between the floating-point operation unit 700 and the fixed-point module 910 and a memory or register.

本申请实施例还提供一种芯片，所述芯片包括如上所述的处理器。An embodiment of the present application also provides a chip, which includes the processor as described above.

本申请实施例还提供一种电子设备，包括如上所述处理器或如上所述的芯片。An embodiment of the present application also provides an electronic device, including the processor as described above or the chip as described above.

虽然本申请实施例披露如上，但本申请并非限定于此。任何本领域技术人员，在不脱离本申请的精神和范围内，均可作各种更动与修改，因此本申请的保护范围应当以权利要求所限定的范围为准。Although the embodiments of the present application are disclosed above, the present application is not limited thereto. Any person skilled in the art may make various changes and modifications without departing from the spirit and scope of the present application. Therefore, the scope of protection of the present application shall be subject to the scope defined by the claims.

Claims

1. A register allocation method, comprising:

Fetching an instruction, the instruction associated with a masking operation;

Decoding the instruction to obtain micro-operation;

According to the operation digital width of the instruction, a mask physical register is allocated for the micro operation from a mask physical register set so as to meet the mask requirement of the instruction; wherein the set of mask physical registers includes a plurality of mask physical registers, a register bit width of each mask physical register being fixed to a first bit width;

The first bit width is smaller than the second bit width, the value of the second bit width corresponds to the element number of the SIMD instruction with the third bit width under the minimum data type, and the third bit width is the maximum operation bit width of the SIMD instruction.

2. The register allocation method according to claim 1, wherein the instruction is a mask calculation instruction; the masked physical registers allocated for the micro-operations include:

A first bit wide mask physical register and an indication bit, wherein the indication bit is used for indicating whether the high order bits of the data are valid, and the high order bit range of the data is between the first bit wide and the second bit wide;

Or two first bit wide masked physical registers.

3. The register allocation method according to claim 2, wherein said allocating a mask physical register from a mask physical register set for said micro-operation according to an operand bit width of said instruction to satisfy a mask requirement of said instruction comprises:

If the operand bit width of the mask calculation instruction is less than or equal to the first bit width, each operand of the mask calculation instruction is allocated a mask physical register of the first bit width and an indication bit, and the indication bit is set to be invalid in the high order of the data.

4. The register allocation method according to claim 2, wherein said allocating a mask physical register from a mask physical register set for said micro-operation according to an operand bit width of said instruction to satisfy a mask requirement of said instruction comprises:

If the operation digital width of the mask calculation instruction is equal to the second digital width, two mask physical registers with the first digital width or one mask physical register with the first digital width and an indication bit are allocated to each operand of the mask calculation instruction, and the indication bit is set to be invalid in the high order of the data; wherein the second bit width is twice the first bit width.

5. The register allocation method according to claim 4, further comprising: two sets of mask arithmetic logic units of a first bit width are allocated for the mask calculation instruction to complete the mask calculation instruction of a second bit width in the same period.

6. The register allocation method of claim 1, wherein the instruction is a SIMD instruction, wherein a mask is used to adjust a calculation result of the SIMD instruction; the second bit width is twice the first bit width; the masked physical registers allocated for the micro-operations include:

A first bit wide mask physical register;

Or two first bit wide masked physical registers.

7. The register allocation method according to claim 6, wherein said allocating a mask physical register from a mask physical register set for said micro-operation according to an operand bit width of said instruction to satisfy a mask requirement of said instruction comprises:

If the operating digital width of the SIMD instruction is the fourth bit width or the fifth bit width, a mask physical register with a first bit width is allocated for the micro-operation so as to meet the mask requirement that the SIMD instruction does not exceed the first bit width;

wherein the fourth bit width and the fifth bit width are between the second bit width and the third bit width, and the fourth bit width is smaller than the fifth bit width; the fourth bit wide or fifth bit wide SIMD instruction adjusts the result of the calculation using the mask of the first bit wide.

8. The method of register allocation according to claim 6, wherein said decoding said instruction resulting in a micro-operation comprises:

If the operation digital width of the SIMD instruction is the third bit width and the data type is not the minimum data type, decoding the SIMD instruction to obtain two micro-operations; the operation digital width of one micro-operation obtained by decoding is a fifth bit width, and the fifth bit width is a second large operation digital width of the SIMD instruction;

The allocating a mask physical register from a mask physical register set for the micro-operation according to the operation digital width of the instruction to meet the mask requirement of the instruction comprises:

allocating a first bit wide mask physical register for said two micro-operations to meet the mask requirement that said SIMD instruction not exceed the first bit wide;

If the target operand of the SIMD instruction is a mask, the method further includes:

And fusing the operation results of the two micro-operations, and writing the fused results into the allocated mask physical register.

9. The method of register allocation according to claim 6, wherein said decoding said instruction resulting in a micro-operation comprises:

If the operation digital width of the SIMD instruction is the third bit width and the data type is the minimum data type, decoding the SIMD instruction to obtain two micro-operations; the operation digital width of one micro-operation obtained by decoding is a fifth bit width, and the fifth bit width is a second large operation digital width of the SIMD instruction;

Each micro-operation is respectively allocated a first bit wide mask physical register to meet the mask requirement of the second bit wide of the SIMD instruction.

10. The register allocation method according to claim 9, wherein the two micro-operations comprise a first micro-operation and a second micro-operation; the allocating a mask physical register with a first bit width for each micro-operation includes:

If the upper bits of the mask of the SIMD instruction are invalid, allocating a mask physical register with a first bit width and an indication bit for the first micro-operation, wherein the indication bit is set to be invalid in the upper bits of the data;

a first bit wide mask physical register is allocated for the second micro-operation and the encoding of the allocated mask physical register is not valid.

11. The register allocation method according to any one of claims 1 to 10, wherein the first bit width has a size of 32 bits; the second bit width is 64 bits; the size of the third bit width is 512 bits; the minimum data type is 8 bits.

12. The register allocation method according to any one of claims 7-10, wherein the fifth bit width is 256 bits, and the fourth bit width between the second bit width and the fifth bit width is 128 bits.

13. A processor comprising a processor, a memory, and a control unit, characterized by comprising the following steps:

An instruction acquisition module for acquiring an instruction, the instruction being associated with a masking operation;

the instruction decoding module is used for decoding the instruction to obtain micro-operation;

An allocation unit, configured to allocate a mask physical register from a mask physical register set for the micro-operation according to an operation digital width of the instruction; wherein the set of mask physical registers includes a plurality of mask physical registers, a register bit width of each mask physical register being fixed to a first bit width;

14. The processor of claim 13, wherein the instruction is a mask calculation instruction; the masked physical registers allocated for the micro-operations include:

Or two first bit wide masked physical registers.

15. The processor of claim 14, wherein the allocation unit to allocate the masked physical registers for the micro-operations from a set of masked physical registers according to an operand bit width of the instruction comprises:

And if the operation digital width of the mask calculation instruction is smaller than or equal to the first bit width, allocating a mask physical register with the first bit width and an indication bit for each operand of the mask calculation instruction, wherein the indication bit is set to be invalid in the high order of data.

16. The processor of claim 14, wherein the allocation unit to allocate the masked physical registers for the micro-operations from a set of masked physical registers according to an operand bit width of the instruction comprises:

For allocating two mask physical registers of a first bit width, or one mask physical register of a first bit width and an indication bit for each operand of the mask calculation instruction if the operand bit width of the mask calculation instruction is equal to the second bit width, and the indication bit is set to be high order invalid of data; wherein the second bit width is twice the first bit width;

The mask calculation instruction adopts two sets of mask arithmetic logic units with first bit width to complete the mask calculation instruction with second bit width in the same period.

17. The processor of claim 13, wherein the instruction is a SIMD instruction, and wherein the mask is used to adjust the computation result of the SIMD instruction; the second bit width is twice the first bit width; the masked physical registers allocated for the micro-operations include:

A first bit wide mask physical register;

Or two first bit wide masked physical registers.

18. The processor of claim 17, wherein the allocation unit to allocate the masked physical registers for the micro-operations from a set of masked physical registers according to an operand bit width of the instruction comprises:

for allocating a first bit wide mask physical register for said micro-operation if the operand bit width of said SIMD instruction is either a fourth bit wide or a fifth bit wide to meet the mask requirement that said SIMD instruction does not exceed the first bit wide;

19. The processor of claim 17, wherein the instruction decode module to decode the instruction to obtain the micro-operation comprises:

The allocation module is configured to allocate, according to the operation digital width of the instruction, a mask physical register from a mask physical register set for the micro-operation, where the allocation module includes:

A mask physical register for allocating a first bit width for said two micro-operations to meet a mask requirement that said SIMD instruction not exceed the first bit width;

And if the target operand of the SIMD instruction is a mask, writing the fused result after the fused operation results of the two micro-operations into an allocated mask physical register.

20. The processor of claim 17, wherein the instruction decode module to decode the instruction to obtain the micro-operation comprises:

The method comprises the steps of decoding the SIMD instruction to obtain two micro-operations if the operation digital width of the SIMD instruction is a third bit width and the data type is the minimum data type; the operation digital width of one micro-operation obtained by decoding is a fifth bit width, and the fifth bit width is a second large operation digital width of the SIMD instruction;

The allocation unit, configured to allocate, according to the operation digital width of the instruction, a mask physical register from a mask physical register set for the micro-operation, where the mask physical register includes:

a first bit wide mask physical register is allocated for each micro-operation to meet the second bit wide mask requirement of the SIMD instruction.

21. A processor according to claim 20, wherein the two micro-operations comprise a first micro-operation and a second micro-operation; the allocation unit, configured to allocate a mask physical register with a first bit width for each micro-operation, includes:

for allocating a first bit wide mask physical register and indication bits for a first micro-operation if the upper bits of the mask of the SIMD instruction are invalid and the indication bits are set to the upper bits of the data are invalid;

22. A processor according to any of claims 13-21, wherein the first bit width is 32 bits in size; the second bit width is 64 bits; the third bit width is 64 bits in size, and the minimum data type is 8 bits.

23. A processor according to any of claims 18-21, wherein the fifth bit width is 256 bits and the fourth bit width between the second bit width and the fifth bit width is 128 bits.

24. A processor according to any one of claims 13-23, further comprising:

the instruction issuing module is used for issuing the micro-operation obtained by decoding by the instruction decoding module to the execution unit;

and the execution unit is used for executing the micro-operation.

25. The processor of claim 24, wherein the execution unit comprises a floating point arithmetic unit; the floating point arithmetic unit includes:

the scheduling unit is used for scheduling the micro-operation to an operation unit in the floating point operation unit;

a mask arithmetic logic unit for processing a mask operation of the micro-operation; the data bit width of the mask arithmetic logic unit is the first bit width;

Vector arithmetic logic unit for processing the vector operation of micro-operation; the mask arithmetic logic unit and the vector arithmetic logic unit belong to an arithmetic unit inside a floating point arithmetic unit;

The processor further includes:

A mask physical register for storing mask data of a mask operation; the bit width of the read-write data of the mask physical register is the first bit width;

and the vector physical register is used for storing a source operand and a destination operand of the vector operation.

26. A chip comprising the processor of any one of claims 13-25.

27. An electronic device comprising a processor according to any of claims 13-25, or a chip according to claim 26.