CN111813450A

CN111813450A - Computing method, device and related products

Info

Publication number: CN111813450A
Application number: CN201910294130.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2020-10-23

Abstract

The disclosure relates to an operation method, an operation device and a related product. The integrated circuit board includes: the device comprises a storage device, an interface device, a control device and a machine learning chip; the machine learning chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the machine learning chip and external equipment; the control device is used for monitoring the state of the machine learning chip.

Description

Computing method, device and related products

技术领域technical field

本公开涉及计算机技术领域，尤其涉及一种矩阵镜像指令处理方法、装置及相关产品。The present disclosure relates to the field of computer technology, and in particular, to a method, device and related products for processing matrix mirroring instructions.

背景技术Background technique

随着科技的不断发展，机器学习，尤其是神经网络算法的使用越来越广泛。其在图像识别、语音识别、自然语言处理等领域中都得到了良好的应用。但由于神经网络算法的复杂度越来越高，所涉及的数据运算种类和数量不断增大。相关技术中，在对矩阵数据进行镜像处理的效率低、速度慢。With the continuous development of science and technology, the use of machine learning, especially neural network algorithms, is becoming more and more extensive. It has been well used in image recognition, speech recognition, natural language processing and other fields. However, due to the increasing complexity of neural network algorithms, the types and quantities of data operations involved continue to increase. In the related art, the mirroring process for matrix data is inefficient and slow.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本公开提出了一种矩阵镜像指令处理方法、装置及相关产品，以提高对矩阵进行镜像处理的效率和速度。In view of this, the present disclosure proposes a method, device and related products for processing a matrix mirroring instruction, so as to improve the efficiency and speed of mirroring processing on a matrix.

根据本公开的第一方面，提供了一种矩阵镜像指令处理装置，所述装置包括：According to a first aspect of the present disclosure, there is provided an apparatus for processing a matrix mirroring instruction, the apparatus comprising:

控制模块，用于对接收到的矩阵镜像指令进行解析，获得所述矩阵镜像指令的操作码和操作域，并根据所述操作码和所述操作域确定执行所述矩阵镜像指令所需的待镜像矩阵和目标地址，以及确定进行镜像处理所需的镜像策略；The control module is configured to parse the received matrix mirroring instruction, obtain the operation code and operation field of the matrix mirroring instruction, and determine the waiting period required to execute the matrix mirroring instruction according to the operation code and the operation field. Mirroring matrix and target addresses, and determining the mirroring strategy required for mirroring;

处理模块，根据所述镜像策略对所述待镜像矩阵进行镜像处理，得到镜像后矩阵，并将所述镜像后矩阵存入所述目标地址中，a processing module that performs mirroring processing on the matrix to be mirrored according to the mirroring strategy to obtain a mirrored matrix, and stores the mirrored matrix in the target address,

其中，所述操作码用于指示所述矩阵镜像指令对矩阵数据所进行的处理为镜像处理，所述操作域包括所述待镜像矩阵地址和所述目标地址。The operation code is used to indicate that the processing performed by the matrix mirroring instruction on the matrix data is mirror processing, and the operation field includes the address of the matrix to be mirrored and the target address.

根据本公开的第二方面，提供了一种机器学习运算装置，所述装置包括：According to a second aspect of the present disclosure, there is provided a machine learning computing device, the device comprising:

一个或多个上述第一方面所述的矩阵镜像指令处理装置，用于从其他处理装置中获取待镜像矩阵和控制信息，并执行指定的机器学习运算，将执行结果通过I/O接口传递给其他处理装置；One or more matrix mirroring instruction processing devices described in the first aspect above are used to obtain the matrix to be mirrored and control information from other processing devices, perform specified machine learning operations, and transmit the execution results to the I/O interface. other processing devices;

当所述机器学习运算装置包含多个所述矩阵镜像指令处理装置时，所述多个所述矩阵镜像指令处理装置间可以通过特定的结构进行连接并传输数据；When the machine learning computing device includes a plurality of the matrix mirror instruction processing devices, the plurality of the matrix mirror instruction processing devices can be connected through a specific structure and data can be transmitted;

其中，多个所述矩阵镜像指令处理装置通过快速外部设备互连总线PCIE总线进行互联并传输数据，以支持更大规模的机器学习的运算；多个所述矩阵镜像指令处理装置共享同一控制系统或拥有各自的控制系统；多个所述矩阵镜像指令处理装置共享内存或者拥有各自的内存；多个所述矩阵镜像指令处理装置的互联方式是任意互联拓扑。Wherein, a plurality of the matrix mirror instruction processing devices are interconnected and transmit data through the fast peripheral device interconnection bus PCIE bus to support larger-scale machine learning operations; a plurality of the matrix mirror command processing devices share the same control system Or have their own control systems; a plurality of the matrix mirror instruction processing devices share memory or have their own memory; the interconnection mode of the plurality of the matrix mirror instruction processing devices is an arbitrary interconnection topology.

根据本公开的第三方面，提供了一种组合处理装置，所述装置包括：According to a third aspect of the present disclosure, there is provided a combined processing device, the device comprising:

上述第二方面所述的机器学习运算装置、通用互联接口和其他处理装置；The machine learning computing device, universal interconnection interface, and other processing devices described in the second aspect above;

所述机器学习运算装置与所述其他处理装置进行交互，共同完成用户指定的计算操作。The machine learning computing device interacts with the other processing devices to jointly complete the computing operation specified by the user.

根据本公开的第四方面，提供了一种机器学习芯片，所述机器学习芯片包括上述第二方面所述的机器学习络运算装置或上述第三方面所述的组合处理装置。According to a fourth aspect of the present disclosure, there is provided a machine learning chip, where the machine learning chip includes the machine learning network computing device described in the second aspect or the combined processing device described in the third aspect.

根据本公开的第五方面，提供了一种机器学习芯片封装结构，该机器学习芯片封装结构包括上述第四方面所述的机器学习芯片。According to a fifth aspect of the present disclosure, a machine learning chip packaging structure is provided, and the machine learning chip packaging structure includes the machine learning chip described in the fourth aspect.

根据本公开的第六方面，提供了一种板卡，该板卡包括上述第五方面所述的机器学习芯片封装结构。According to a sixth aspect of the present disclosure, there is provided a board card including the machine learning chip packaging structure described in the fifth aspect.

根据本公开的第七方面，提供了一种电子设备，所述电子设备包括上述第四方面所述的机器学习芯片或上述第六方面所述的板卡。According to a seventh aspect of the present disclosure, an electronic device is provided, and the electronic device includes the machine learning chip described in the fourth aspect or the board card described in the sixth aspect.

根据本公开的第八方面，提供了一种矩阵镜像指令处理方法，所述方法应用于矩阵镜像指令处理装置，所述方法包括：According to an eighth aspect of the present disclosure, a method for processing a matrix mirroring instruction is provided. The method is applied to an apparatus for processing a matrix mirroring instruction, and the method includes:

对接收到的矩阵镜像指令进行解析，获得所述矩阵镜像指令的操作码和操作域，并根据所述操作码和所述操作域确定执行所述矩阵镜像指令所需的待镜像矩阵和目标地址，以及确定进行镜像处理所需的镜像策略；Parse the received matrix mirroring instruction, obtain the opcode and operation field of the matrix mirroring instruction, and determine the matrix to be mirrored and the target address required to execute the matrix mirroring instruction according to the operation code and the operation field , and determine the mirroring strategy required for mirroring;

根据所述镜像策略对所述待镜像矩阵进行镜像处理，得到镜像后矩阵，并将所述镜像后矩阵存入所述目标地址中，Perform mirroring processing on the matrix to be mirrored according to the mirroring strategy to obtain a mirrored matrix, and store the mirrored matrix in the target address,

其中，所述操作码用于指示所述矩阵镜像指令对矩阵所进行的处理为镜像处理，所述操作域包括所述待镜像矩阵地址和所述目标地址。Wherein, the operation code is used to indicate that the processing performed by the matrix mirroring instruction on the matrix is mirror processing, and the operation field includes the address of the matrix to be mirrored and the target address.

在一些实施例中，所述电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。In some embodiments, the electronic device includes a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, Cameras, projectors, watches, headphones, mobile storage, wearables, vehicles, home appliances, and/or medical equipment.

在一些实施例中，所述交通工具包括飞机、轮船和/或车辆；所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机；所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。In some embodiments, the vehicles include airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical Equipment includes MRI machines, ultrasound machines and/or electrocardiographs.

本公开实施例所提供的矩阵镜像指令处理方法、装置及相关产品，该装置包括控制模块和处理模块。控制模块用于对接收到的矩阵镜像指令进行解析，获得矩阵镜像指令的操作码和操作域，并根据操作码和操作域确定执行矩阵镜像指令所需的待镜像矩阵和目标地址。处理模块用于根据镜像策略对待镜像矩阵进行镜像处理，得到镜像后矩阵，并将镜像后矩阵存入目标地址中。本公开实施例所提供的矩阵镜像指令处理方法、装置及相关产品的适用范围广，根据矩阵镜像指令对矩阵进行镜像处理的处理效率高、处理速度快。The matrix mirror instruction processing method, device, and related products provided by the embodiments of the present disclosure include a control module and a processing module. The control module is used to parse the received matrix mirroring instruction, obtain the operation code and operation field of the matrix mirroring instruction, and determine the matrix to be mirrored and the target address required to execute the matrix mirroring instruction according to the operation code and operation field. The processing module is used for performing mirror processing on the mirror matrix to be mirrored according to the mirroring strategy, to obtain the mirrored matrix, and store the mirrored matrix into the target address. The matrix mirroring instruction processing method, device and related products provided by the embodiments of the present disclosure have a wide range of applications, and the processing efficiency and processing speed of performing mirroring processing on a matrix according to the matrix mirroring instruction are high.

根据下面参考附图对示例性实施例的详细说明，本公开的其它特征及方面将变得清楚。Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

附图说明Description of drawings

包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面，并且用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the disclosure, and together with the description, serve to explain the principles of the disclosure.

图1a、图1b示出根据本公开一实施例的组合处理装置的框图。1a and 1b illustrate block diagrams of a combined processing apparatus according to an embodiment of the present disclosure.

图2示出根据本公开一实施例的板卡的结构示意图。FIG. 2 shows a schematic structural diagram of a board according to an embodiment of the present disclosure.

图3示出根据本公开一实施例的矩阵镜像指令处理装置的框图。FIG. 3 shows a block diagram of an apparatus for processing a matrix mirroring instruction according to an embodiment of the present disclosure.

图4示出根据本公开一实施例的矩阵镜像指令处理装置的框图。FIG. 4 shows a block diagram of a matrix mirror instruction processing apparatus according to an embodiment of the present disclosure.

图5示出根据本公开一实施例的矩阵镜像指令处理装置的应用场景的示意图。FIG. 5 shows a schematic diagram of an application scenario of an apparatus for processing a matrix mirroring instruction according to an embodiment of the present disclosure.

图6示出根据本公开一实施例的矩阵镜像指令处理方法的流程图。FIG. 6 shows a flowchart of a method for processing a matrix mirroring instruction according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面，但是除非特别指出，不必按比例绘制附图。Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures denote elements that have the same or similar functions. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

另外，为了更好的说明本公开，在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解，没有某些具体细节，本公开同样可以实施。在一些实例中，对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述，以便于凸显本公开的主旨。In addition, in order to better illustrate the present disclosure, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present disclosure may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present disclosure.

由于神经网络算法在图像识别、语音识别、自然语言处理等领域中的使用越来越广泛，使得神经网络算法的复杂度越来越高，所涉及的数据运算种类和数量不断增大。其中，矩阵是一种在神经网络算法中较为常见的数据形式，由数字和/或字符组成。神经网络算法中对矩阵的处理过程包括对矩阵进行镜像处理。相关技术中，需要对矩阵进行镜像处理的效率低、速度慢。As neural network algorithms are more and more widely used in image recognition, speech recognition, natural language processing and other fields, the complexity of neural network algorithms is getting higher and higher, and the types and quantities of data operations involved are increasing. Among them, a matrix is a common data form in neural network algorithms, consisting of numbers and/or characters. The processing of the matrix in the neural network algorithm includes mirroring the matrix. In the related art, it is necessary to perform mirror image processing on the matrix, which is inefficient and slow.

本公开提供一种机器学习运算装置，该机器学习运算装置可以进行神经网络算法的相关运算，该机器学习运算装置可以包括一个或多个用于根据接收到的矩阵镜像指令对矩阵进行镜像处理的矩阵镜像指令处理装置，用于从其他处理装置中获取待镜像矩阵和控制信息，执行指定的机器学习运算。该机器学习运算装置可以从其他机器学习运算装置或非机器学习运算装置中获得矩阵镜像指令，并将执行结果通过I/O接口传递给外围设备(也可称其他处理装置)。外围设备譬如摄像头，显示器，鼠标，键盘，网卡，wifi接口，服务器。当包含一个以上矩阵镜像指令处理装置时，矩阵镜像指令处理装置间可以通过特定的结构进行链接并传输数据，譬如，通过PCIE总线进行互联并传输数据，以支持更大规模的神经网络的运算。此时，可以共享同一控制系统，也可以有各自独立的控制系统；可以共享内存，也可以每个加速器有各自的内存。此外，其互联方式可以是任意互联拓扑。The present disclosure provides a machine learning computing device, the machine learning computing device can perform related operations of neural network algorithms, and the machine learning computing device can include one or more for mirroring a matrix according to a received matrix mirroring instruction. The matrix mirroring instruction processing device is used to obtain the matrix to be mirrored and control information from other processing devices, and execute the specified machine learning operation. The machine learning computing device can obtain matrix mirroring instructions from other machine learning computing devices or non-machine learning computing devices, and transmit the execution results to peripheral devices (also called other processing devices) through the I/O interface. Peripherals such as camera, monitor, mouse, keyboard, network card, wifi interface, server. When more than one matrix mirror instruction processing device is included, the matrix mirror command processing devices can be linked and transmitted through a specific structure, for example, interconnected and transmitted through the PCIE bus to support larger-scale neural network operations. At this time, the same control system can be shared, or there can be independent control systems; memory can be shared, or each accelerator can have its own memory. In addition, the interconnection method can be any interconnection topology.

该机器学习运算装置具有较高的兼容性，可通过PCIE接口与各种类型的服务器相连接。The machine learning computing device has high compatibility and can be connected with various types of servers through the PCIE interface.

图1a示出根据本公开一实施例的组合处理装置的框图。如图1a所示，该组合处理装置包括上述机器学习运算装置、通用互联接口和其他处理装置。机器学习运算装置与其他处理装置进行交互，共同完成用户指定的操作。FIG. 1a shows a block diagram of a combined processing apparatus according to an embodiment of the present disclosure. As shown in Fig. 1a, the combined processing device includes the above-mentioned machine learning computing device, a general interconnection interface and other processing devices. The machine learning computing device interacts with other processing devices to jointly complete the operation specified by the user.

其他处理装置，包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装置所包括的处理器数量不做限制。其他处理装置作为机器学习运算装置与外部数据和控制的接口，包括数据搬运，完成对本机器学习运算装置的开启、停止等基本控制；其他处理装置也可以和机器学习运算装置协作共同完成运算任务。Other processing devices include one or more processor types among general-purpose/special-purpose processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a neural network processor. The number of processors included in other processing devices is not limited. Other processing devices serve as the interface between the machine learning computing device and external data and control, including data transfer, to complete the basic control of starting and stopping the machine learning computing device; other processing devices can also cooperate with the machine learning computing device to complete computing tasks.

通用互联接口，用于在机器学习运算装置与其他处理装置间传输数据和控制指令。该机器学习运算装置从其他处理装置中获取所需的输入数据，写入机器学习运算装置片上的存储装置；可以从其他处理装置中获取控制指令，写入机器学习运算装置片上的控制缓存；也可以读取机器学习运算装置的存储模块中的数据并传输给其他处理装置。A universal interconnect interface for transferring data and control instructions between machine learning computing devices and other processing devices. The machine learning computing device obtains required input data from other processing devices, and writes it into a storage device on-chip of the machine learning computing device; it can obtain control instructions from other processing devices and write it into the control cache on the machine learning computing device chip; The data in the storage module of the machine learning computing device can be read and transmitted to other processing devices.

图1b示出根据本公开一实施例的组合处理装置的框图。在一种可能的实现方式中，如图1b所示，该组合处理装置还可以包括存储装置，存储装置分别与机器学习运算装置和其他处理装置连接。存储装置用于保存在机器学习运算装置和其他处理装置的数据，尤其适用于所需要运算的数据在本机器学习运算装置或其他处理装置的内部存储中无法全部保存的数据。Fig. 1b shows a block diagram of a combined processing apparatus according to an embodiment of the present disclosure. In a possible implementation manner, as shown in FIG. 1b, the combined processing device may further include a storage device, and the storage device is respectively connected to the machine learning computing device and other processing devices. The storage device is used to save data in the machine learning computing device and other processing devices, and is especially suitable for data that cannot be fully stored in the internal storage of the machine learning computing device or other processing devices.

该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统，有效降低控制部分的核心面积，提高处理速度，降低整体功耗。此情况时，该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头，显示器，鼠标，键盘，网卡，wifi接口。The combined processing device can be used as an SOC system for mobile phones, robots, drones, video surveillance equipment and other equipment, effectively reducing the core area of the control part, improving the processing speed and reducing the overall power consumption. In this case, the general interconnection interface of the combined processing device is connected to certain components of the apparatus. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.

本公开提供一种机器学习芯片，该芯片包括上述机器学习运算装置或组合处理装置。The present disclosure provides a machine learning chip, which includes the above-mentioned machine learning computing device or combined processing device.

本公开提供一种机器学习芯片封装结构，该机器学习芯片封装结构包括上述机器学习芯片。The present disclosure provides a machine learning chip packaging structure, and the machine learning chip packaging structure includes the above-mentioned machine learning chip.

本公开提供一种板卡，图2示出根据本公开一实施例的板卡的结构示意图。如图2所示，该板卡包括上述机器学习芯片封装结构或者上述机器学习芯片。板卡除了包括机器学习芯片389以外，还可以包括其他的配套部件，该配套部件包括但不限于：存储器件390、接口装置391和控制器件392。The present disclosure provides a board, and FIG. 2 shows a schematic structural diagram of the board according to an embodiment of the present disclosure. As shown in FIG. 2 , the board includes the above-mentioned machine learning chip packaging structure or the above-mentioned machine learning chip. In addition to the machine learning chip 389 , the board may also include other supporting components, including but not limited to: a storage device 390 , an interface device 391 and a control device 392 .

存储器件390与机器学习芯片389(或者机器学习芯片封装结构内的机器学习芯片)通过总线连接，用于存储数据。存储器件390可以包括多组存储单元393。每一组存储单元393与机器学习芯片389通过总线连接。可以理解，每一组存储单元393可以是DDR SDRAM(英文：Double Data Rate SDRAM，双倍速率同步动态随机存储器)。The storage device 390 is connected to the machine learning chip 389 (or the machine learning chip in the machine learning chip package structure) through a bus for storing data. The memory device 390 may include groups of memory cells 393 . Each group of storage units 393 is connected to the machine learning chip 389 through a bus. It can be understood that each group of storage units 393 may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).

DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。DDR does not need to increase the clock frequency to double the speed of SDRAM. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.

在一个实施例中，存储器件390可以包括4组存储单元393。每一组存储单元393可以包括多个DDR4颗粒(芯片)。在一个实施例中，机器学习芯片389内部可以包括4个72位DDR4控制器，上述72位DDR4控制器中64bit用于传输数据，8bit用于ECC校验。可以理解，当每一组存储单元393中采用DDR4-3200颗粒时，数据传输的理论带宽可达到25600MB/s。In one embodiment, the memory device 390 may include four sets of memory cells 393 . Each set of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include four 72-bit DDR4 controllers inside, where 64 bits of the 72-bit DDR4 controllers are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of storage units 393, the theoretical bandwidth of data transmission can reach 25600MB/s.

在一个实施例中，每一组存储单元393包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在机器学习芯片389中设置控制DDR的控制器，用于对每个存储单元393的数据传输与数据存储的控制。In one embodiment, each set of memory cells 393 includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling the DDR is provided in the machine learning chip 389 for controlling data transmission and data storage of each storage unit 393 .

接口装置391与机器学习芯片389(或者机器学习芯片封装结构内的机器学习芯片)电连接。接口装置391用于实现机器学习芯片389与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中，接口装置391可以为标准PCIE接口。比如，待处理的数据由服务器通过标准PCIE接口传递至机器学习芯片289，实现数据转移。优选的，当采用PCIE 3.0 X 16接口传输时，理论带宽可达到16000MB/s。在另一个实施例中，接口装置391还可以是其他的接口，本公开并不限制上述其他的接口的具体表现形式，接口装置能够实现转接功能即可。另外，机器学习芯片的计算结果仍由接口装置传送回外部设备(例如服务器)。The interface device 391 is electrically connected to the machine learning chip 389 (or the machine learning chip within the machine learning chip package structure). The interface device 391 is used to realize data transmission between the machine learning chip 389 and an external device (such as a server or a computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transmitted by the server to the machine learning chip 289 through a standard PCIE interface to realize data transfer. Preferably, when the PCIE 3.0 X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device 391 may also be other interfaces, and the present disclosure does not limit the specific expression forms of the other interfaces, as long as the interface device can realize the switching function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (such as a server) by the interface device.

控制器件392与机器学习芯片389电连接。控制器件392用于对机器学习芯片389的状态进行监控。具体的，机器学习芯片389与控制器件392可以通过SPI接口电连接。控制器件392可以包括单片机(Micro Controller Unit，MCU)。如机器学习芯片389可以包括多个处理芯片、多个处理核或多个处理电路，可以带动多个负载。因此，机器学习芯片389可以处于多负载和轻负载等不同的工作状态。通过控制器件可以实现对机器学习芯片中多个处理芯片、多个处理和/或多个处理电路的工作状态的调控。The control device 392 is electrically connected to the machine learning chip 389 . The control device 392 is used to monitor the state of the machine learning chip 389 . Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a Micro Controller Unit (MCU). For example, the machine learning chip 389 may include multiple processing chips, multiple processing cores or multiple processing circuits, and may drive multiple loads. Therefore, the machine learning chip 389 can be in different working states such as multi-load and light-load. The control device can realize the regulation of the working states of multiple processing chips, multiple processing and/or multiple processing circuits in the machine learning chip.

本公开提供一种电子设备，该电子设备包括上述机器学习芯片或板卡。The present disclosure provides an electronic device including the above-mentioned machine learning chip or board.

电子设备可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。Electronic devices may include data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, video cameras, projectors, watches, Headphones, mobile storage, wearables, vehicles, home appliances, and/or medical equipment.

交通工具可以包括飞机、轮船和/或车辆。家用电器可以包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机。医疗设备可以包括核磁共振仪、B超仪和/或心电图仪。Vehicles may include aircraft, ships and/or vehicles. Household appliances may include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods. Medical equipment may include MRI machines, ultrasound machines and/or electrocardiographs.

图3示出根据本公开一实施例的矩阵镜像指令处理装置的框图。如图3所示，该装置包括控制模块11和处理模块12。FIG. 3 shows a block diagram of an apparatus for processing a matrix mirroring instruction according to an embodiment of the present disclosure. As shown in FIG. 3 , the device includes a control module 11 and a processing module 12 .

控制模块11，用于对接收到的矩阵镜像指令进行解析，获得矩阵镜像指令的操作码和操作域，并根据操作码和操作域确定执行矩阵镜像指令所需的待镜像矩阵和目标地址，以及确定进行镜像处理所需的镜像策略。其中，操作码用于指示矩阵镜像指令对矩阵数据所进行的处理为镜像处理，操作域包括待镜像矩阵地址和目标地址。The control module 11 is configured to parse the received matrix mirroring instruction, obtain the operation code and operation field of the matrix mirroring instruction, and determine the matrix to be mirrored and the target address required to execute the matrix mirroring instruction according to the operation code and the operation field, and Determine the mirroring strategy required for mirroring. The operation code is used to indicate that the processing performed by the matrix mirroring instruction on the matrix data is mirror processing, and the operation domain includes the address of the matrix to be mirrored and the target address.

处理模块12，根据镜像策略对待镜像矩阵进行镜像处理，得到镜像后矩阵，并将镜像后矩阵存入目标地址中。The processing module 12 performs mirroring processing on the to-be-mirrored matrix according to the mirroring strategy, obtains the mirrored matrix, and stores the mirrored matrix in the target address.

在本实施例中，待镜像矩阵可以是由多个数字和/或字符按照阵列排列而成的数据集合。镜像处理是对矩阵进行一种变换处理，将待镜像矩阵沿着特定翻转直线(二维平面中)或特定翻转平面(三维空间)进行翻折，获得镜像处理后的矩阵。例如，如果待镜像矩阵在二维平面中，镜像策略可以包括沿待镜像矩阵的水平方向将待镜像矩阵进行翻折和沿待镜像矩阵的垂直方向将待镜像矩阵进行翻折中的至少一种。如果待镜像矩阵在三维空间中，镜像策略可以包括沿待镜像矩阵的水平面将待镜像矩阵进行翻折、沿待镜像矩阵的垂直面将待镜像矩阵进行翻折以及沿水平面与垂直面的共同垂直的平面将待镜像矩阵进翻折中的至少一种。镜像策略中可以包括对待镜像矩阵进行镜像处理所需的翻转直线和翻转平面等进行镜像处理所需的参数，矩阵镜像指令中可以对矩阵进行一次或多次镜像处理，本公开对此不作限制。In this embodiment, the matrix to be mirrored may be a data set formed by a plurality of numbers and/or characters arranged in an array. The mirroring process is a transformation process on the matrix, and the matrix to be mirrored is folded along a specific inverted straight line (in a two-dimensional plane) or a specific inverted plane (in a three-dimensional space) to obtain a mirrored matrix. For example, if the matrix to be mirrored is in a two-dimensional plane, the mirroring strategy may include at least one of folding the matrix to be mirrored along the horizontal direction of the matrix to be mirrored and folding the matrix to be mirrored along the vertical direction of the matrix to be mirrored . If the matrix to be mirrored is in three-dimensional space, the mirroring strategy may include folding the matrix to be mirrored along the horizontal plane of the matrix to be mirrored, folding the matrix to be mirrored along the vertical plane of the matrix to be mirrored, and folding the matrix to be mirrored along the vertical plane of the matrix to be mirrored, and the vertical plane of the horizontal plane. The plane will be mirrored matrix into at least one of the folds. The mirroring strategy may include parameters required for mirroring, such as flipped straight lines and flipped planes required for mirroring the matrix to be mirrored. The matrix mirroring instruction may perform one or more mirroring processes on the matrix, which is not limited in the present disclosure.

举例来说，假定待镜像矩阵为[[1,4,7],[2,5,8],[3,6,9]]。若根据矩阵镜像指令确定镜像策略为“水平镜像”，那么装置对待镜像矩阵进行水平镜像处理后，可得到镜像后矩阵[[3,6,9],[2,5,8],[1,4,7]]。若根据矩阵镜像指令确定对称策略为“垂直镜像”，那么装置对对待镜像矩阵进行垂直镜像处理后，可得到对称后矩阵[[9,6,3],[8,5,2],[7,4,1]]。For example, it is assumed that the matrix to be mirrored is [[1,4,7],[2,5,8],[3,6,9]]. If the mirroring strategy is determined to be "horizontal mirroring" according to the matrix mirroring instruction, the device can obtain the mirrored matrix [[3,6,9],[2,5,8],[1, 4,7]]. If the symmetric strategy is determined to be "vertical mirroring" according to the matrix mirroring instruction, after the device performs vertical mirroring processing on the matrix to be mirrored, the symmetric post-matrix [[9,6,3],[8,5,2],[7 ,4,1]].

在本实施例中，控制模块可以从待镜像矩阵地址中获取待镜像矩阵。待镜像矩阵地址可以是存储待镜像矩阵的首地址等物理地址，也可以是逻辑地址、线性地址。控制模块可以将待镜像矩阵存储在目标地址中。目标地址可以是存储镜像后矩阵的首地址等物理地址，也可以是逻辑地址、线性地址。本公开对待镜像矩阵地址、目标地址的表示方式不作限制。。控制模块可以通过数据输入输出单元获得矩阵镜像指令、待镜像矩阵，该数据输入输出单元可以为一个或多个数据I/O接口或I/O引脚。In this embodiment, the control module may acquire the matrix to be mirrored from the address of the matrix to be mirrored. The address of the matrix to be mirrored may be a physical address such as the first address for storing the matrix to be mirrored, or a logical address or a linear address. The control module may store the matrix to be mirrored in the target address. The target address can be a physical address such as the first address of the matrix after storing the mirror, or a logical address or a linear address. The present disclosure does not limit the representation of the mirror matrix address and the target address. . The control module can obtain the matrix mirroring instruction and the matrix to be mirrored through the data input and output unit, and the data input and output unit can be one or more data I/O interfaces or I/O pins.

在本实施例中，对于一个矩阵镜像指令可以包括操作码和操作域。操作码可以是计算机程序中所规定的要执行操作的那一部分指令或字段(通常用代码表示)，是指令序列号，用来告知执行指令的装置具体需要执行哪一条指令。而操作域可以是执行对应的指令所需的所有数据的来源，执行对应的指令所需的所有数据包括待镜像矩阵、对应的镜像策略，或者存储待镜像矩阵、对应的镜像策略的地址等等。比如，操作域可以包括待镜像矩阵地址和目标地址。In this embodiment, an operation code and an operation field may be included for a matrix mirroring instruction. The opcode may be the part of the instruction or field (usually represented by code) specified in the computer program to perform the operation, and the instruction sequence number, which is used to inform the device that executes the instruction which instruction needs to be executed. The operation domain can be the source of all data required to execute the corresponding instruction. All the data required to execute the corresponding instruction includes the matrix to be mirrored, the corresponding mirroring strategy, or the address of the matrix to be mirrored and the corresponding mirroring strategy, etc. . For example, the operation domain may include the address of the matrix to be mirrored and the target address.

应当理解的是，本领域技术人员可以根据需要对矩阵镜像指令的指令格式以及所包含的操作码和操作域进行设置，本公开对此不作限制。It should be understood that those skilled in the art can set the instruction format of the matrix mirroring instruction and the included operation codes and operation fields as required, which is not limited in the present disclosure.

在本实施例中，该装置可以包括一个或多个控制模块，以及一个或多个处理模块，可以根据实际需要对控制模块和处理模块的数量进行设置，本公开对此不作限制。在装置包括一个控制模块时，该控制模块可以接收矩阵镜像指令，并控制一个或多个处理模块进行镜像处理。在装置包括多个控制模块时，多个控制模块可以分别接收矩阵镜像指令，并控制对应的一个或多个处理模块进行镜像处理。In this embodiment, the apparatus may include one or more control modules and one or more processing modules, and the number of the control modules and the processing modules may be set according to actual needs, which is not limited in the present disclosure. When the device includes a control module, the control module can receive a matrix mirroring instruction and control one or more processing modules to perform mirroring processing. When the device includes multiple control modules, the multiple control modules can respectively receive the matrix mirroring instructions, and control one or more corresponding processing modules to perform mirroring processing.

本公开实施例所提供的矩阵镜像指令处理装置，该装置包括控制模块和处理模块。控制模块用于对接收到的矩阵镜像指令进行解析，获得矩阵镜像指令的操作码和操作域，并根据操作码和操作域确定执行矩阵镜像指令所需的待镜像矩阵和目标地址，以及确定进行镜像处理所需的镜像策略。处理模块根据镜像策略对待镜像矩阵进行镜像处理，得到镜像后矩阵，并将镜像后矩阵存入目标地址中。本公开实施例所提供的矩阵镜像指令处理装置的适用范围广，根据矩阵镜像指令对矩阵进行镜像处理的处理效率高、处理速度快。The matrix mirror instruction processing apparatus provided by the embodiment of the present disclosure includes a control module and a processing module. The control module is used to parse the received matrix mirroring instruction, obtain the opcode and operation field of the matrix mirroring instruction, and determine the matrix to be mirrored and the target address required to execute the matrix mirroring instruction according to the opcode and operation field, and determine the The mirroring policy required for mirroring. The processing module performs mirroring processing on the mirroring matrix according to the mirroring strategy, obtains the mirrored matrix, and stores the mirrored matrix in the target address. The apparatus for processing a matrix mirroring instruction provided by the embodiment of the present disclosure has a wide range of applications, and performs mirroring processing on a matrix according to the matrix mirroring instruction with high processing efficiency and high processing speed.

在一种可能的实现方式中，操作域还可以包括待镜像矩阵的输入形状。其中，处理模块12，还可以用于根据输入形状以及镜像策略，对待镜像矩阵进行镜像处理，得到镜像后矩阵。In a possible implementation, the operation domain may further include the input shape of the matrix to be mirrored. The processing module 12 may also be configured to perform mirror processing on the matrix to be mirrored according to the input shape and the mirroring strategy to obtain a mirrored matrix.

在该实现方式中，根据待镜像矩阵的输入形状便于对矩阵进行对称处理，也可以根据待镜像矩阵的输入形状确定对称后矩阵的形状。矩阵的形状可以用待镜像矩阵在行、列上数字和/或字符的数量来表示。例如，待镜像矩阵1为[[0,1,1],[0,1,-1]]，该待镜像矩阵1的形状为3×2，也即该待处理矩阵1为3行、2列，由6个数字组成。In this implementation manner, it is convenient to perform symmetric processing on the matrix according to the input shape of the matrix to be mirrored, and the shape of the symmetric matrix can also be determined according to the input shape of the matrix to be mirrored. The shape of the matrix can be represented by the number of numbers and/or characters on the rows and columns of the matrix to be mirrored. For example, the matrix 1 to be mirrored is [[0,1,1],[0,1,-1]], and the shape of the matrix 1 to be mirrored is 3×2, that is, the matrix 1 to be processed has 3 rows, 2 Column, consisting of 6 numbers.

在一种可能的实现方式中，可以预先设置待镜像矩阵的默认输入形状。在操作域中不包含待镜像矩阵的输入形状时，可以将待镜像矩阵的默认输入形状确定为当前矩阵镜像指令的待镜像矩阵的输入形状。本公开对此不作限制。In a possible implementation manner, the default input shape of the matrix to be mirrored may be preset. When the input shape of the matrix to be mirrored is not included in the operation domain, the default input shape of the matrix to be mirrored may be determined as the input shape of the matrix to be mirrored of the current matrix mirroring instruction. This disclosure does not limit this.

在一种可能的实现方式中，操作域还可以包括镜像后矩阵的输出形状。其中，处理模块12，还用于根据输出形状以及镜像策略，对待镜像矩阵进行镜像处理，得到镜像后矩阵。In one possible implementation, the operation domain may further include the output shape of the mirrored matrix. The processing module 12 is further configured to perform mirror processing on the matrix to be mirrored according to the output shape and the mirroring strategy to obtain a mirrored matrix.

在该实现方式中，输出形状可以为镜像后矩阵的形状。例如，镜像后矩阵为[[1,0],[0,1],[-1,0]]，该镜像后矩阵的形状为2×3，也即该镜像后矩阵为2行、3列，由6个数字组成。In this implementation, the output shape may be the shape of the mirrored matrix. For example, the mirrored matrix is [[1,0],[0,1],[-1,0]], and the shape of the mirrored matrix is 2×3, that is, the mirrored matrix has 2 rows and 3 columns , which consists of 6 numbers.

在一种可能的实现方式中，可以预先设置镜像后矩阵的默认输出形状。在操作域中不包含镜像后矩阵的输出形状时，可以将镜像后矩阵的默认输出形状确定为当前矩阵镜像指令的镜像后矩阵的输出形状。本公开对此不作限制。In a possible implementation, the default output shape of the mirrored matrix can be preset. When the output shape of the mirrored matrix is not included in the operation domain, the default output shape of the mirrored matrix can be determined as the output shape of the mirrored matrix of the current matrix mirroring instruction. This disclosure does not limit this.

在一种可能的实现方式中，操作域还可以用于指示镜像策略。In a possible implementation, the operation domain may also be used to indicate a mirroring policy.

在一种可能的实现方式中，操作码还可以用于指示镜像策略。In one possible implementation, the opcode may also be used to indicate a mirroring strategy.

在一种可能的实现方式中，可以根据矩阵镜像指令的操作码或操作域确定镜像策略。还可以预先设置待镜像矩阵的默认镜像策略。在操作域中不包含待镜像矩阵的镜像策略时，可以将待镜像矩阵的默认镜像策略确定为当前矩阵镜像指令的待镜像矩阵的镜像策略。In a possible implementation manner, the mirroring strategy may be determined according to the opcode or operation domain of the matrix mirroring instruction. The default mirroring policy of the matrix to be mirrored can also be preset. When the operation domain does not contain the mirroring strategy of the matrix to be mirrored, the default mirroring strategy of the matrix to be mirrored may be determined as the mirroring strategy of the matrix to be mirrored of the current matrix mirroring instruction.

图4示出根据本公开一实施例的矩阵镜像指令处理装置的框图。在一种可能的实现方式中，如图4所示，矩阵镜像指令处理装置还可以包括：存储模块13，用于存储待镜像矩阵。FIG. 4 shows a block diagram of a matrix mirror instruction processing apparatus according to an embodiment of the present disclosure. In a possible implementation manner, as shown in FIG. 4 , the apparatus for processing a matrix mirroring instruction may further include: a storage module 13 for storing the matrix to be mirrored.

在该实现方式中，存储模块可以包括内存、缓存和寄存器中的一种或多种，缓存可以包括速暂存缓存。可以根据需要将待镜像矩阵在存储模块中的内存、缓存和/或寄存器中，本公开对此不作限制。In this implementation, the storage module may include one or more of a memory, a cache, and a register, and the cache may include a temporary cache. The matrix to be mirrored can be stored in the memory, cache and/or register in the storage module as required, which is not limited in the present disclosure.

在一种可能的实现方式中，该装置还可以包括直接内存访问模块，用于从存储模块中读取或者存储数据。In a possible implementation manner, the apparatus may further include a direct memory access module for reading or storing data from the storage module.

在一种可能的实现方式中，如图4所示，控制模块11可以包括指令存储子模块111、指令处理子模块112和队列存储子模块113。In a possible implementation manner, as shown in FIG. 4 , the control module 11 may include an instruction storage sub-module 111 , an instruction processing sub-module 112 and a queue storage sub-module 113 .

指令存储子模块111用于存储矩阵镜像指令。The instruction storage sub-module 111 is used for storing matrix mirroring instructions.

指令处理子模块112用于对矩阵镜像指令进行解析，得到矩阵镜像指令的操作码和操作域。The instruction processing sub-module 112 is used for parsing the matrix mirroring instruction to obtain the operation code and operation domain of the matrix mirroring instruction.

队列存储子模块113用于存储指令队列，指令队列包括按照执行顺序依次排列的多个待执行指令，多个待执行指令可以包括矩阵镜像指令。多个待执行指令可以包括还可以包括与矩阵镜像指令相关的其他计算指令。The queue storage sub-module 113 is used to store an instruction queue, and the instruction queue includes a plurality of instructions to be executed sequentially arranged in an execution order, and the multiple instructions to be executed may include matrix mirroring instructions. The plurality of instructions to be executed may include and may also include other computational instructions related to the matrix mirroring instructions.

在该实现方式中，可以根据待执行指令的接收时间、优先级别等对多个待执行指令的执行顺序进行排列获得指令队列，以便于根据指令队列依次执行多个待执行指令。In this implementation manner, the execution order of the plurality of to-be-executed commands can be arranged according to the receiving time and priority level of the to-be-executed commands to obtain the command queue, so as to execute the plurality of to-be-executed commands in sequence according to the command queue.

在一种可能的实现方式中，如图4所示，控制模块11还可以包括依赖关系处理子模块114。In a possible implementation manner, as shown in FIG. 4 , the control module 11 may further include a dependency relationship processing sub-module 114 .

依赖关系处理子模块114，用于在确定多个待执行指令中的第一待执行指令与第一待执行指令之前的第零待执行指令存在依赖关系时，依赖关系处理子模块114可以将第一待执行指令缓存在指令存储子模块112中，在第零待执行指令执行完毕后，从指令存储子模块112中提取第一待执行指令发送至处理模块12。其中，第一待执行指令和第零待执行指令是多个待执行指令中的指令。The dependency relationship processing sub-module 114 is configured to, when it is determined that the first to-be-executed instruction in the plurality of to-be-executed instructions has a dependency relationship with the zeroth to-be-executed instruction before the first to-be-executed instruction, the dependency relationship processing sub-module 114 can process the first to-be-executed instruction. A to-be-executed instruction is cached in the instruction storage sub-module 112 , and after the execution of the zeroth to-be-executed instruction is completed, the first to-be-executed instruction is extracted from the instruction storage sub-module 112 and sent to the processing module 12 . Wherein, the first to-be-executed instruction and the zeroth to-be-executed instruction are instructions in a plurality of to-be-executed instructions.

其中，第一待执行指令与第一待执行指令之前的第零待执行指令存在依赖关系包括：存储第一待执行指令所需数据的第一存储地址区间与存储第零待执行指令所需数据的第零存储地址区间具有重叠的区域。反之，第一待执行指令与第零待执行指令之间没有依赖关系可以是第一存储地址区间与第零存储地址区间没有重叠区域。The dependency relationship between the first instruction to be executed and the zeroth instruction to be executed before the first instruction to be executed includes: a first storage address range for storing data required by the first instruction to be executed and data required for storing the zeroth instruction to be executed The zeroth memory address range of has overlapping regions. Conversely, if there is no dependency between the first instruction to be executed and the zeroth instruction to be executed, it may be that the first storage address interval and the zeroth storage address interval have no overlapping area.

通过这种方式，可以根据待执行指令之间的依赖关系，使得在先的待执行令执行完毕之后，再执行在后的待执行指令，保证运算结果的准确性。In this way, according to the dependencies between the instructions to be executed, after the execution of the previous instruction to be executed is completed, the subsequent instruction to be executed is executed to ensure the accuracy of the operation result.

在一种可能的实现方式中，矩阵镜像指令的指令格式可以为：In a possible implementation manner, the instruction format of the matrix mirroring instruction may be:

Rotate2 type dst src src_shape dst_shapeRotate2 type dst src src_shape dst_shape

其中，Rotate2为操作码，type、dst、src、src_shape、dst_shape为操作域。Rotate2用于指示该指令为矩阵镜像指令。type为镜像策略。dst为目标地址。src为待镜像矩阵地址。src_shape为输入形状。dst_shape为输出形状。Among them, Rotate2 is the operation code, and type, dst, src, src_shape, and dst_shape are the operation fields. Rotate2 is used to indicate that the instruction is a matrix mirroring instruction. type is the mirroring strategy. dst is the target address. src is the address of the matrix to be mirrored. src_shape is the input shape. dst_shape is the output shape.

Rotate2_type dst src src_shape dst_shapeRotate2_type dst src src_shape dst_shape

其中，Rotate2_type为操作码，dst、src、src_shape、dst_shape为操作域。Rotate2_type中的Rotate2用于指示该指令为矩阵镜像指令，Rotate2_type中type为镜像策略。dst为目标地址。src为待镜像矩阵地址。src_shape为输入形状。dst_shape为输出形状。Among them, Rotate2_type is the operation code, and dst, src, src_shape, and dst_shape are the operation fields. Rotate2 in Rotate2_type is used to indicate that the instruction is a matrix mirroring instruction, and type in Rotate2_type is a mirroring strategy. dst is the target address. src is the address of the matrix to be mirrored. src_shape is the input shape. dst_shape is the output shape.

应当理解的是，本领域技术人员可以根据需要对矩阵镜像指令的操作码、指令格式中操作码以及操作域的位置进行设置，本公开对此不作限制。It should be understood that those skilled in the art can set the operation code of the matrix mirroring instruction, the operation code in the instruction format and the position of the operation field as required, which is not limited in the present disclosure.

在一种可能的实现方式中，该装置可以设置于图形处理器(Graphics ProcessingUnit，简称GPU)、中央处理器(Central Processing Unit，简称CPU)和嵌入式神经网络处理器(Neural-network Processing Unit，简称NPU)的一种或多种之中。In a possible implementation manner, the apparatus may be provided in a graphics processing unit (Graphics Processing Unit, GPU for short), a central processing unit (Central Processing Unit, CPU for short), and an embedded neural-network processing unit (Neural-network Processing Unit, abbreviated as NPU) among one or more of them.

需要说明的是，尽管以上述实施例作为示例介绍了矩阵镜像指令处理装置如上，但本领域技术人员能够理解，本公开应不限于此。事实上，用户完全可根据个人喜好和/或实际应用场景灵活设定各模块，只要符合本公开的技术方案即可。It should be noted that although the above embodiments are used as examples to describe the matrix mirroring instruction processing apparatus as above, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preferences and/or actual application scenarios, as long as it conforms to the technical solutions of the present disclosure.

应用示例Application example

以下结合“利用矩阵镜像指令处理装置对待矩阵镜像进行镜像处理”作为一个示例性应用场景，给出根据本公开实施例的应用示例，以便于理解矩阵镜像指令处理装置的流程。本领域技术人员应理解，以下应用示例仅仅是出于便于理解本公开实施例的目的，不应视为对本公开实施例的限制。In the following, an application example according to the embodiment of the present disclosure is given in conjunction with "using a matrix mirroring instruction processing apparatus to perform mirroring processing on a matrix image to be mirrored" as an exemplary application scenario, so as to facilitate the understanding of the flow of the matrix mirroring instruction processing apparatus. Those skilled in the art should understand that the following application examples are only for the purpose of facilitating the understanding of the embodiments of the present disclosure, and should not be regarded as limitations of the embodiments of the present disclosure.

图5示出根据本公开一实施例的矩阵镜像指令处理装置的应用场景的示意图。如图5所示，矩阵镜像指令处理装置对矩阵镜像指令进行处理的过程如下。FIG. 5 shows a schematic diagram of an application scenario of an apparatus for processing a matrix mirroring instruction according to an embodiment of the present disclosure. As shown in FIG. 5 , the process of processing the matrix mirroring instruction by the apparatus for processing the matrix mirroring instruction is as follows.

示例1Example 1

控制模块11在接收到矩阵镜像指令1(Rotate2_type 200 100 S1 S2)，对矩阵镜像指令进行解析，获得矩阵镜像指令1的操作码和操作域。该矩阵镜像指令1的操作码为Rotate2_type。根据操作码可以确定：该指令为矩阵镜像处理指令，镜像策略为type。根据操作域可以确定：待镜像矩阵地址为100、输入形状为S1、目标地址为200、输出形状为S2。进而控制模块11从待镜像矩阵地址100中获取输入形状为S1的待镜像矩阵1。After receiving the matrix mirroring instruction 1 (Rotate2_type 200 100 S1 S2 ), the control module 11 parses the matrix mirroring instruction, and obtains the operation code and operation domain of the matrix mirroring instruction 1 . The opcode of this matrix mirror instruction 1 is Rotate2_type. According to the opcode, it can be determined that the instruction is a matrix mirroring processing instruction, and the mirroring strategy is type. According to the operation field, it can be determined: the address of the matrix to be mirrored is 100, the input shape is S1, the target address is 200, and the output shape is S2. Further, the control module 11 obtains the matrix 1 to be mirrored whose input shape is S1 from the address 100 of the matrix to be mirrored.

处理模块12根据镜像策略对待镜像矩阵1进行镜像处理，得到镜像后矩阵1’，并将镜像矩阵1’存入目标地址200中。The processing module 12 performs mirror processing on the mirror matrix 1 to be mirrored according to the mirroring strategy, obtains the mirror matrix 1', and stores the mirror matrix 1' in the target address 200.

其中吗，矩阵镜像指令1除可以为上述Rotate2_type 200 100 S1 S2，还可以为Rotate2 type 200 100 S1 S2，二者为不同指令格式，且表示相同处理过程的指令，矩阵镜像指令装置对二者的处理过程相似，不再赘述。Among them, the matrix mirroring instruction 1 can be the above Rotate2_type 200 100 S1 S2, and it can also be Rotate2 type 200 100 S1 S2, the two are instructions of different instruction formats and represent the same processing process, the matrix mirroring instruction device for the two The processing process is similar and will not be repeated here.

上述处理过程详见上文相关描述。The above processing process is detailed in the above related description.

这样，矩阵镜像指令处理装置可以快速、高效地根据矩阵镜像指令对矩阵进行镜像处理。In this way, the apparatus for processing the matrix mirroring instruction can quickly and efficiently perform mirroring processing on the matrix according to the matrix mirroring instruction.

图6示出根据本公开一实施例的矩阵镜像指令处理方法的流程图。如图6所示，该方法应用于上述矩阵镜像指令处理装置，该方法包括步骤S51和步骤S52。FIG. 6 shows a flowchart of a method for processing a matrix mirroring instruction according to an embodiment of the present disclosure. As shown in FIG. 6 , the method is applied to the above-mentioned matrix mirror instruction processing apparatus, and the method includes step S51 and step S52.

在步骤S51中，对接收到的矩阵镜像指令进行解析，获得矩阵镜像指令的操作码和操作域，并根据操作码和操作域确定执行矩阵镜像指令所需的待镜像矩阵和目标地址，以及确定进行镜像处理所需的镜像策略。其中，操作码用于指示矩阵镜像指令对矩阵所进行的处理为镜像处理，操作域包括待镜像矩阵地址和目标地址。In step S51, the received matrix mirroring instruction is parsed, the operation code and operation field of the matrix mirroring instruction are obtained, and the matrix to be mirrored and the target address required to execute the matrix mirroring instruction are determined according to the operation code and the operation field, and the The mirroring policy required for mirroring. The operation code is used to indicate that the processing performed by the matrix mirroring instruction on the matrix is mirror processing, and the operation domain includes the address of the matrix to be mirrored and the target address.

在步骤S52中，根据镜像策略对待镜像矩阵进行镜像处理，得到镜像后矩阵，并将镜像后矩阵存入目标地址中。In step S52, mirroring is performed on the to-be-mirrored matrix according to the mirroring strategy to obtain the mirrored matrix, and the mirrored matrix is stored in the target address.

在一种可能的实现方式中，操作域还可以包括待镜像矩阵的输入形状。其中，根据镜像策略对待镜像矩阵进行镜像处理，得到镜像后矩阵，可以包括：根据输出形状以及镜像策略，对待镜像矩阵进行镜像处理，获得镜像后矩阵。In a possible implementation, the operation domain may further include the input shape of the matrix to be mirrored. Wherein, performing mirror processing on the to-be-mirrored matrix according to the mirroring strategy to obtain the mirrored matrix may include: performing mirroring processing on the to-be-mirrored matrix according to the output shape and the mirroring strategy to obtain the mirrored matrix.

在一种可能的实现方式中，操作域还可以包括镜像后矩阵的输出形状，其中，根据镜像策略对待镜像矩阵进行镜像处理，得到镜像后矩阵，可以包括：根据输出形状以及镜像策略，对待镜像矩阵进行镜像处理，获得镜像后矩阵。In a possible implementation manner, the operation domain may further include the output shape of the mirrored matrix, wherein performing mirroring processing on the to-be-mirrored matrix according to the mirroring strategy to obtain the mirrored matrix may include: according to the output shape and the mirroring strategy, the mirrored matrix is to be mirrored. The matrix is mirrored to obtain the mirrored matrix.

在一种可能的实现方式中，该方法还可以包括：存储待镜像矩阵。In a possible implementation manner, the method may further include: storing the matrix to be mirrored.

在一种可能的实现方式中，对接收到的矩阵镜像指令进行解析，获得矩阵镜像指令的操作码和操作域，可以包括：In a possible implementation manner, the received matrix mirroring instruction is parsed to obtain the opcode and operation field of the matrix mirroring instruction, which may include:

存储矩阵镜像指令；Store matrix mirroring instructions;

对矩阵镜像指令进行解析，得到矩阵镜像指令的操作码和操作域；Analyze the matrix mirroring instruction to obtain the opcode and operation domain of the matrix mirroring instruction;

存储指令队列，指令队列包括按照执行顺序依次排列的多个待执行指令，多个待执行指令可以包括矩阵镜像指令。An instruction queue is stored, and the instruction queue includes multiple instructions to be executed sequentially arranged in an execution order, and the multiple instructions to be executed may include matrix mirroring instructions.

在一种可能的实现方式中，该方法还可以包括：In a possible implementation, the method may further include:

在确定多个待执行指令中的第一待执行指令与第一待执行指令之前的第零待执行指令存在依赖关系时，缓存第一待执行指令，并在确定第零待执行指令执行完毕后，控制进行第一待执行指令的执行，When it is determined that the first to-be-executed instruction in the plurality of to-be-executed instructions has a dependency relationship with the zeroth to-be-executed instruction before the first to-be-executed instruction, the first to-be-executed instruction is cached, and after it is determined that the zeroth to-be-executed instruction is executed , control the execution of the first instruction to be executed,

其中，第一待执行指令与第一待执行指令之前的第零待执行指令存在依赖关系包括：Wherein, the dependency relationship between the first instruction to be executed and the zeroth instruction to be executed before the first instruction to be executed includes:

存储第一待执行指令所需数据的第一存储地址区间与存储第零待执行指令所需数据的第零存储地址区间具有重叠的区域。The first storage address interval storing the data required by the first instruction to be executed and the zeroth storage address interval storing the data required by the zeroth instruction to be executed have an overlapping area.

需要说明的是，尽管以上述实施例作为示例介绍了矩阵镜像指令处理方法如上，但本领域技术人员能够理解，本公开应不限于此。事实上，用户完全可根据个人喜好和/或实际应用场景灵活设定各步骤，只要符合本公开的技术方案即可。It should be noted that although the above embodiments are used as examples to describe the matrix mirror instruction processing method as above, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scenarios, as long as it conforms to the technical solutions of the present disclosure.

本公开实施例所提供的矩阵镜像指令处理方法的适用范围广，对矩阵机械能镜像的处理效率高、处理速度快。The matrix mirroring instruction processing method provided by the embodiment of the present disclosure has a wide range of applications, and has high processing efficiency and high processing speed for the matrix mechanical energy mirroring.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于可选实施例，所涉及的动作和模块并不一定是本公开所必须的。It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Because in accordance with the present application, certain steps may be performed in other orders or concurrently. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily required by the present disclosure.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

在本公开所提供的实施例中，应该理解到，所揭露的系统、装置，可通过其它的方式实现。例如，以上所描述的系统、装置实施例仅仅是示意性的，例如设备、装置、模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个模块可以结合或者可以集成到另一个系统或装置，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，设备、装置或模块的间接耦合或通信连接，可以是电性或其它的形式。In the embodiments provided in the present disclosure, it should be understood that the disclosed systems and devices may be implemented in other manners. For example, the system and device embodiments described above are only illustrative, for example, the division of devices, devices, and modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules may be combined. Or it may be integrated into another system or device, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices, devices or modules, which may be in electrical or other forms.

作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。Modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本公开各个实施例中的各功能模块可以集成在一个处理单元中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件程序模块的形式实现。In addition, each functional module in each embodiment of the present disclosure may be integrated into one processing unit, or each module may exist physically alone, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, or can be implemented in the form of software program modules.

集成的模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储器中。基于这样的理解，本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储器中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated modules, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer-readable memory. Based on such understanding, the technical solutions of the present disclosure essentially or the parts that contribute to the prior art or all or part of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a memory, Several instructions are included to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储器中，存储器可以包括：闪存盘、只读存储器(英文：Read-Only Memory，简称：ROM)、随机存取器(英文：Random Access Memory，简称：RAM)、磁盘或光盘等。Those skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), magnetic disk or optical disk, etc.

以上对本申请实施例进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The embodiments of the present application have been introduced in detail above, and the principles and implementations of the present application are described in this paper by using specific examples. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application; at the same time, for Persons of ordinary skill in the art, based on the idea of the present application, will have changes in the specific implementation manner and application scope. In summary, the contents of this specification should not be construed as limitations on the present application.

Claims

1. A matrix mirroring instruction processing apparatus, the apparatus comprising:

the control module is used for analyzing the received matrix mirror image instruction, obtaining an operation code and an operation domain of the matrix mirror image instruction, determining a matrix to be mirrored and a target address required by executing the matrix mirror image instruction according to the operation code and the operation domain, and determining a mirror image strategy required by mirror image processing;

a processing module for carrying out mirror image processing on the matrix to be mirror image according to the mirror image strategy to obtain a matrix after mirror image and storing the matrix after mirror image into the target address,

the operation code is used for indicating that the matrix mirroring instruction processes the matrix data to be mirrored, and the operation domain comprises the matrix address to be mirrored and the target address.

2. The apparatus of claim 1, wherein the operation domain further comprises input shapes of a matrix to be mirrored,

the processing module is further configured to perform mirror image processing on the matrix to be mirrored according to the input shape and the mirror image policy to obtain a matrix after mirror image processing.

3. The apparatus of claim 1, wherein the operation domain further comprises an output shape of a mirrored matrix,

the processing module is further configured to perform mirror image processing on the matrix to be mirrored according to the output shape and the mirror image policy to obtain a matrix after mirror image processing.

4. The apparatus of claim 1, wherein the operation domain is further configured to indicate a mirroring policy.

5. The apparatus of claim 1, wherein the opcode is further configured to indicate the mirroring policy.

6. The apparatus of claim 1,

the device further comprises: a storage module for storing the matrix to be mirrored,

wherein the control module comprises:

the instruction storage submodule is used for storing the matrix mirror image instruction;

the instruction processing submodule is used for analyzing the matrix mirror image instruction to obtain an operation code and an operation domain of the matrix mirror image instruction;

a queue storage submodule, configured to store an instruction queue, where the instruction queue includes multiple instructions to be executed that are sequentially arranged according to an execution order, where the multiple instructions to be executed include the matrix mirroring instruction,

wherein, the control module further comprises:

the dependency relationship processing submodule is used for caching a first instruction to be executed in the instruction storage submodule when the dependency relationship between the first instruction to be executed in the plurality of instructions to be executed and a zeroth instruction to be executed before the first instruction to be executed is determined, extracting the first instruction to be executed from the instruction storage submodule after the zeroth instruction to be executed is executed, and sending the first instruction to be executed to the processing module,

wherein the dependency relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:

and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area.

7. A machine learning arithmetic device, the device comprising:

one or more matrix mirroring instruction processing devices according to any one of claims 1 to 6, configured to obtain a matrix to be mirrored and control information from another processing device, perform a specified machine learning operation, and transmit an execution result to the other processing device through an I/O interface;

when the machine learning arithmetic device comprises a plurality of matrix mirroring instruction processing devices, the plurality of matrix mirroring instruction processing devices can be connected through a specific structure and transmit data;

the matrix mirroring instruction processing devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the matrix mirror image instruction processing devices share the same control system or own respective control systems; the matrix mirror image instruction processing devices share a memory or own memories; the interconnection mode of the matrix mirror image instruction processing devices is any interconnection topology.

8. A combined processing apparatus, characterized in that the combined processing apparatus comprises:

the machine learning computing device, the universal interconnect interface, and the other processing device of claim 7;

the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user,

wherein the combination processing apparatus further comprises: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.

9. A machine learning chip, the machine learning chip comprising:

the machine learning arithmetic device according to claim 7 or the combined processing device according to claim 8.

10. An electronic device, characterized in that the electronic device comprises:

the machine learning chip of claim 9.

11. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface apparatus and a control device and a machine learning chip according to claim 9;

wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;

the storage device is used for storing data;

the interface device is used for realizing data transmission between the machine learning chip and external equipment;

and the control device is used for monitoring the state of the machine learning chip.

12. A matrix mirroring instruction processing method is applied to a matrix mirroring instruction processing device, and comprises the following steps:

analyzing a received matrix mirror image instruction to obtain an operation code and an operation domain of the matrix mirror image instruction, determining a matrix to be mirrored and a target address required by executing the matrix mirror image instruction according to the operation code and the operation domain, and determining a mirror image strategy required by mirror image processing;

performing mirror image processing on the matrix to be mirrored according to the mirror image strategy to obtain a matrix after mirror image, storing the matrix after mirror image into the target address,

the operation code is used for indicating that the matrix mirror image instruction processes the matrix as mirror image processing, and the operation domain comprises the address of the matrix to be mirrored and the target address.

13. The method of claim 12, wherein the operation domain further comprises input shapes of a matrix to be mirrored,

performing mirror image processing on the matrix to be mirrored according to the mirror image strategy to obtain a matrix after mirror image, including:

and carrying out mirror image processing on the matrix to be mirror image according to the input shape and the mirror image strategy to obtain the matrix after mirror image.

14. The method of claim 12, wherein the operation domain further comprises an output shape of the mirrored matrix,

and carrying out mirror image processing on the matrix to be mirror image according to the output shape and the mirror image strategy to obtain the matrix after mirror image.

15. The method of claim 12, wherein the operation domain is further configured to indicate a mirroring policy.

16. The method of claim 12, wherein the opcode is further used to indicate the mirroring policy.

17. The method of claim 12,

the method further comprises the following steps: the matrix to be mirrored is stored and,

the method for analyzing the received matrix mirror image instruction to obtain the operation code and the operation domain of the matrix mirror image instruction comprises the following steps:

storing the matrix mirroring instruction;

analyzing the matrix mirror image instruction to obtain an operation code and an operation domain of the matrix mirror image instruction;

storing an instruction queue, the instruction queue comprising a plurality of instructions to be executed arranged in sequence according to an execution order, the plurality of instructions to be executed comprising the matrix mirroring instruction,

wherein the method further comprises:

when determining that a first to-be-executed instruction in the plurality of to-be-executed instructions has a dependency relationship with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, and after determining that the zeroth to-be-executed instruction is completely executed, controlling execution of the first to-be-executed instruction,