CN116128703A

CN116128703A - Graphics processor, chip, and electronic device

Info

Publication number: CN116128703A
Application number: CN202211667582.XA
Authority: CN
Inventors: 王超; 张慧明; 陈勇军; 陆荣
Original assignee: VeriSilicon Microelectronics Shanghai Co Ltd
Current assignee: VeriSilicon Microelectronics Shanghai Co Ltd
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-05-16

Abstract

The application provides a graphics processor, a chip and an electronic device. The graphics processor comprises at least two data and command distributors and at least two graphics processor cores, wherein each data and command distributor is connected with at least one graphics processor core, and one data and command distributor is connected with one graphics processor core through a group of data and command transmission lines; the graphics processor is configured to provide at least one virtual graphics processor, each of the virtual graphics processors including one of the data and command distributors and some or all of the graphics processor cores coupled thereto. The graphics processor is capable of providing at least one virtual graphics processor that is virtualized such that the graphics processor is commonly used by multiple users.

Description

Graphics processors, chips and electronic equipment

技术领域technical field

本申请属于处理器技术领域，涉及一种图形处理器，特别是涉及一种图形处理器、芯片及电子设备。The present application belongs to the technical field of processors, and relates to a graphics processor, in particular to a graphics processor, a chip and electronic equipment.

背景技术Background technique

图形处理器(graphics processing unit，GPU)，又称显示核心、视觉处理器、显示芯片，是一种专门在个人电脑、工作站、游戏机和一些移动设备(如平板电脑、智能手机等)上进行图像和图形相关运算工作的微处理器。GPU使显卡减少了对中央处理器(centralprocessing unit，CPU)的依赖，并完成部分原本CPU的工作。Graphics processing unit (GPU), also known as display core, visual processor, and display chip, is a graphics processing unit that is specially designed for processing on personal computers, workstations, game consoles, and some mobile devices (such as tablets, smartphones, etc.). Microprocessor for image and graphics related operations. The GPU reduces the graphics card's dependence on the central processing unit (CPU) and completes part of the original CPU's work.

电子设备的图形处理器数量通常比较有限，为了更高效地利用有限的图形处理器资源以更好地满足用户需求，图形处理器虚拟化技术应运而生。图形处理器虚拟化要求将一个实际的图形处理器虚拟化成多个虚拟图形处理器，以容纳多个用户同时使用。每个用户使用一个虚拟图形处理器，每个虚拟图形处理器可以使用一个或多个图形处理器内核。然而，现有图形处理器虚拟化技术中，各虚拟图形处理器使用的图形处理器内核往往是固定的，难以根据实际需求灵活配置。The number of graphics processors in an electronic device is usually relatively limited. In order to more efficiently use the limited graphics processor resources to better meet user needs, a graphics processor virtualization technology emerges as the times require. Graphics processor virtualization requires that an actual graphics processor be virtualized into multiple virtual graphics processors to accommodate simultaneous use by multiple users. Each user uses one virtual GPU, and each virtual GPU can use one or more GPU cores. However, in existing graphics processor virtualization technologies, the graphics processor cores used by each virtual graphics processor are often fixed, and it is difficult to flexibly configure them according to actual needs.

发明内容Contents of the invention

本申请提供图形处理器、芯片及电子设备，于所述图形处理器中，各虚拟图形处理器包含的图形处理器内核可以根据实际需求进行配置。The present application provides a graphics processor, a chip, and an electronic device. In the graphics processor, the graphics processor core included in each virtual graphics processor can be configured according to actual requirements.

第一方面，本申请实施例提供一种图形处理器，所述图形处理器包括至少两个数据和命令分发器以及至少两个图形处理器内核，各所述数据和命令分发器至少与一个所述图形处理器内核相连，其中，一个所述数据和命令分发器与一个所述图形处理器内核之间通过一组数据和命令传输线相连；所述图形处理器被配置为提供至少一个虚拟图形处理器，各所述虚拟图形处理器包含一个所述数据和命令分发器以及与之相连的部分或者全部的所述图形处理器内核。In a first aspect, an embodiment of the present application provides a graphics processor, the graphics processor includes at least two data and command distributors and at least two graphics processor cores, each of the data and command distributors communicates with at least one of the The graphics processor core is connected, wherein, one of the data and command distributors is connected to one of the graphics processor cores through a set of data and command transmission lines; the graphics processor is configured to provide at least one virtual graphics processing Each of the virtual graphics processors includes a data and command distributor and part or all of the graphics processor cores connected thereto.

在第一方面的一种实现方式中，所述图形处理器根据接收到的指令被配置为提供n个所述虚拟图形处理器，其中n为小于或等于N的任意正整数，N为所述图形处理器内核的数量。In an implementation manner of the first aspect, the graphics processor is configured to provide n virtual graphics processors according to the received instruction, where n is any positive integer less than or equal to N, and N is the The number of GPU cores.

在第一方面的一种实现方式中，所述图形处理器的第i个所述数据和命令分发器与

个所述图形处理器内核相连，其中i和ni均为小于或等于N的正整数，floor为向下取整函数。In an implementation manner of the first aspect, the i-th data and command distributor of the graphics processor and

Two graphics processor cores are connected, wherein i and ni are both positive integers less than or equal to N, and floor is a function of rounding down.

在第一方面的一种实现方式中，所述图形处理器包含N个所述数据和命令分发器，其中的1个所述数据和命令分发器与N个所述图形处理器内核相连，其中的mj-mj+1个所述数据和命令分发器与

个所述图形处理器内核相连，mj和mj+1为相邻的能够被N整除的正整数，1≤mj+1＜mj≤N。In an implementation manner of the first aspect, the graphics processor includes N data and command distributors, and one of the data and command distributors is connected to N cores of the graphics processor, wherein mj-mj+1 the data and command distributors with

Two graphics processor cores are connected, mj and mj+1 are adjacent positive integers divisible by N, 1≤mj+1<mj≤N.

在第一方面的一种实现方式中，所述图形处理器还包括数据选择器，连接至少两个所述数据和命令分发器的所述图形处理器内核通过所述数据选择器与所述数据和命令分发器相连。In an implementation manner of the first aspect, the graphics processor further includes a data selector, and the graphics processor cores connected to at least two of the data and command distributors communicate with the data Connect to command dispatcher.

在第一方面的一种实现方式中，所述数据和命令分发器与所述图形处理器内核的数量均为8个，所述数据和命令分发器与所述图形处理器内核的连接方式包括1个一对八连接，1个一对四连接，2个一对二连接，以及4个一对一连接。In an implementation manner of the first aspect, the number of the data and command distributors and the number of the graphics processor cores are eight, and the connection mode of the data and command distributors and the graphics processor cores includes 1 one-to-eight connection, 1 one-to-four connection, 2 one-to-two connections, and 4 one-to-one connections.

在第一方面的一种实现方式中，所述图形处理器的物理层的层数根据所述数据和命令传输线的数量所配置。In an implementation manner of the first aspect, the number of physical layers of the graphics processor is configured according to the number of data and command transmission lines.

在第一方面的一种实现方式中，所述数据和命令分发器与所述图形处理器内核全连接。In an implementation manner of the first aspect, the data and command distributor is fully connected to the graphics processor core.

第二方面，本申请实施例提供一种芯片，所述芯片包括本申请第一方面任一种实现方式所述的图形处理器以及输入输出引脚。In a second aspect, an embodiment of the present application provides a chip, the chip including the graphics processor described in any implementation manner of the first aspect of the present application and input and output pins.

第二方面，本申请实施例提供一种电子设备，所述电子设备包括本申请第一方面任一种实现方式所述的图形处理器以及存储器。In a second aspect, an embodiment of the present application provides an electronic device, where the electronic device includes the graphics processor and the memory described in any implementation manner of the first aspect of the present application.

本申请实施例提供的图形处理器，能够提供至少一个虚拟图形处理器，各虚拟图形处理器包含的图形处理器内核可以根据实际需求进行配置，因此，在具体应用中可以根据实际需求对虚拟图形处理器包含的图形处理器内核进行灵活配置。The graphics processor provided by the embodiment of the present application can provide at least one virtual graphics processor, and the graphics processor core contained in each virtual graphics processor can be configured according to actual needs. Therefore, in specific applications, the virtual graphics processor can be configured according to actual needs The graphics processor core included in the processor can be flexibly configured.

在本申请的一些实施例中，通过优化数据和命令分发器与图形处理器内核之间的连线方式，能够减少数据和命令分发器与图形处理器内核之间的连线数量，避免芯片布局布线(Placeand Route，P&R)阶段的拥堵问题，有利于减小芯片面积。此外，在一些实施例中，图形处理器的物理层的层数根据数据和命令传输线的数量所配置。在这些实施例中，通过优化数据和命令分发器与图形处理器内核之间的连线方式，能够减少图形处理器物理层的层数。In some embodiments of the present application, by optimizing the connection mode between the data and command distributor and the graphics processor core, the number of connections between the data and command distributor and the graphics processor core can be reduced, avoiding chip layout The congestion problem in the wiring (Place and Route, P&R) stage is conducive to reducing the chip area. In addition, in some embodiments, the number of layers of the physical layer of the graphics processor is configured according to the number of data and command transmission lines. In these embodiments, the number of graphics processor physical layers can be reduced by optimizing the wiring between the data and command dispatcher and the graphics processor core.

附图说明Description of drawings

图1显示为电子设备的结构示意图。Figure 1 shows a schematic diagram of the structure of an electronic device.

图2显示为图形处理器的结构示意图。FIG. 2 shows a schematic diagram of the structure of a graphics processor.

图3A显示为本申请一实施例提供的图形处理器的结构示意图。FIG. 3A shows a schematic structural diagram of a graphics processor provided by an embodiment of the present application.

图3B显示为本申请一实施例中数据和命令分发器与图形处理器内核的连接关系示意图。FIG. 3B is a schematic diagram of the connection relationship between the data and command distributor and the graphics processor core in an embodiment of the present application.

图4显示为本申请一实施例提供的图形处理器的结构示意图。FIG. 4 shows a schematic structural diagram of a graphics processor provided by an embodiment of the present application.

图5A和图5B显示为本申请一实施例提供的图形处理器的结构示意图。5A and 5B are schematic structural diagrams of a graphics processor provided by an embodiment of the present application.

图5C显示为本申请一实施例提供的图形处理器的结构示意图。FIG. 5C is a schematic structural diagram of a graphics processor provided by an embodiment of the present application.

图6显示为本申请一实施例提供的图形处理器的结构示意图。FIG. 6 shows a schematic structural diagram of a graphics processor provided by an embodiment of the present application.

图7显示为本申请一实施例提供的芯片的结构示意图。FIG. 7 shows a schematic structural diagram of a chip provided by an embodiment of the present application.

元件标号说明Component designation description

100 电子设备100 Electronic equipment

110 系统处理器110 System Processor

120 图形处理器120 graphics processor

121-1～121-k 图形处理器内核121-1～121-k graphics processor core

122 配置命令处理器122 Configure command processor

123 交叉开关总线123 Crossbar bus

124-1～124-k L2缓存124-1～124-k L2 cache

130 存储器130 memory

140 显示屏140 display screen

300 图形处理器300 Graphics Processor

310-1～310-M 数据和命令分发器310-1～310-M Data and command distributor

330-1～330-N 图形处理器内核330-1～330-N graphics processor core

500 图形处理器500 Graphics Processor

510-1～510-8 数据和命令分发器510-1～510-8 Data and command distributor

520-2～520-8 数据选择器520-2～520-8 data selector

530-1～530-8 图形处理器内核530-1～530-8 graphics processor core

具体实施方式Detailed ways

以下通过特定的具体实例说明本申请的实施方式，本领域技术人员可由本说明书所揭露的内容轻易地了解本申请的其他优点与功效。本申请还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本申请的精神下进行各种修饰或改变。需说明的是，在不冲突的情况下，以下实施例及实施例中的特征可以相互组合。Embodiments of the present application are described below through specific examples, and those skilled in the art can easily understand other advantages and effects of the present application from the content disclosed in this specification. The present application can also be implemented or applied through other different specific implementation modes, and various modifications or changes can be made to the details in this specification based on different viewpoints and applications without departing from the spirit of the present application. It should be noted that, in the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.

在本申请中，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或成一体；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言，可以根据具体情况理解上述术语在本申请中的具体含义。In this application, terms such as "installation", "connection", "connection" and "fixation" should be interpreted in a broad sense, for example, it can be a fixed connection or a detachable connection, unless otherwise clearly specified and limited. , or integrated; it can be mechanically connected or electrically connected; it can be directly connected or indirectly connected through an intermediary, and it can be the internal communication of two components or the interaction relationship between two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in this application according to specific situations.

需要说明的是，以下实施例中所提供的图示仅以示意方式说明本申请的基本构想，遂图式中仅显示与本申请中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制，其实际实施时各组件的型态、数量及比例可为一种随意的改变，且其组件布局型态也可能更为复杂。It should be noted that the diagrams provided in the following embodiments are only schematically illustrating the basic idea of the application, and only the components related to the application are shown in the diagrams rather than the number, shape and Dimensional drawing, the type, quantity and proportion of each component can be changed arbitrarily during actual implementation, and the component layout type may also be more complicated.

本申请以下实施例提供了一种图形处理器，其应用场景包括但不限于电子设备。该电子设备可以为手机、平板电脑、个人计算机(personal computer，PC)、个人数字助理(personal digital assistant，PDA)、智能手表、上网本、可穿戴电子设备、增强现实(augmented reality，AR)设备、虚拟现实(virtual reality，VR)设备、车载设备、智能汽车、智能音箱、机器人、智能眼镜等等不同类型的电子设备。The following embodiments of the present application provide a graphics processor, and its application scenarios include but not limited to electronic devices. The electronic device may be a mobile phone, a tablet computer, a personal computer (personal computer, PC), a personal digital assistant (personal digital assistant, PDA), a smart watch, a netbook, a wearable electronic device, an augmented reality (augmented reality, AR) device, Different types of electronic devices such as virtual reality (VR) devices, vehicle-mounted devices, smart cars, smart speakers, robots, and smart glasses.

请参阅图1，显示为本申请一实施例中电子设备100的结构示意图。电子设备100包括系统处理器110(例如，可以是CPU)、图形处理器120、存储器130和显示屏140。Please refer to FIG. 1 , which is a schematic structural diagram of an electronic device 100 in an embodiment of the present application. The electronic device 100 includes a system processor 110 (eg, may be a CPU), a graphics processor 120 , a memory 130 and a display screen 140 .

在具体的操作中，系统处理器110可启动进入操作系统(operation system，OS)以提供用户系统的各种操作，包括用户应用、数据处理服务、通信服务、存储服务、游戏服务或其他操作。图形处理器120可以为系统处理器110提供图形处理、渲染服务以及增强等操作。具体地，请参阅图2，图形处理器120提供涉及包括图形处理器内核(例如121-1，121-2，…，121-k)、配置命令处理器122、交叉开关(crossbar)总线123等组件的操作，其中k为正整数。可以理解的是，图形处理、渲染服务以及增强等操作可以由图形处理器内核120的一个或多个功能模块来完成，例如图形处理器内核120的一个功能模块可以对应完成一个操作。其中，在图1中图形处理器120可以为通过通信线路150与系统处理器110连接的单独元件，但是应当理解，在其他示例中图形处理器120也可以集成于系统处理器110。In specific operations, the system processor 110 may boot into an operating system (operation system, OS) to provide various operations of the user system, including user applications, data processing services, communication services, storage services, game services, or other operations. The graphics processor 120 may provide operations such as graphics processing, rendering services, and enhancements for the system processor 110 . Specifically, referring to FIG. 2 , the graphics processor 120 provides a graphics processor core (for example, 121-1, 121-2, ..., 121-k), a configuration command processor 122, a crossbar (crossbar) bus 123, etc. The operation of the component, where k is a positive integer. It can be understood that operations such as graphics processing, rendering service, and enhancement can be performed by one or more functional modules of the graphics processor core 120 , for example, one functional module of the graphics processor core 120 can correspondingly complete one operation. Wherein, in FIG. 1 , the graphics processor 120 may be a separate component connected to the system processor 110 through the communication line 150 , but it should be understood that the graphics processor 120 may also be integrated into the system processor 110 in other examples.

存储器130可包括随机存取存储器(random access memory，RAM)、缓存存储器设备或系统处理器110或图形处理器120所采用的其他易失性存储器元件。其中，图形处理器120所采用的其他易失性存储器元件包括可以集成于图形处理器120的高速缓存，例如，图2中的二级(L2)缓存124-1，124-2，…，124-k。存储器130还可包括非易失性存储器元件，诸如硬盘驱动器(hard disk drive，HDD)、闪存存储器设备、固态驱动器(solid state drive，SSD)、或储存用于电子设备100的操作系统、应用或其他软件或固件的其他存储器设备。Memory 130 may include random access memory (random access memory, RAM), a cache memory device, or other volatile memory elements employed by system processor 110 or graphics processor 120 . Wherein, other volatile memory components adopted by the graphics processor 120 include caches that may be integrated into the graphics processor 120, for example, the second-level (L2) caches 124-1, 124-2, . . . , 124 in FIG. 2 -k. The memory 130 may also include a non-volatile memory element, such as a hard disk drive (hard disk drive, HDD), a flash memory device, a solid state drive (solid state drive, SSD), or store an operating system, application or Other memory devices for other software or firmware.

电子设备100之间可通过一个或多个通信链路(诸如一个或多个网络链路)进行通信。例如，通信链路可以使用金属、玻璃、光学、空气、空间或一些其他材料作为传输介质。示例通信链路可以使用各种通信接口和协议，诸如因特网协议(internet protocol，IP)、以太网、通用串行总线(universal serial bus，USB)、蓝牙(bluetooth)、WiFi或其他通信信令或通信格式，包括其组合、改进或变体。通信链路可以是直接链路，或者可以包括中间网络、系统或设备，并且可以包括通过多个物理链路传输的逻辑网络链路。Electronic devices 100 may communicate via one or more communication links, such as one or more network links. For example, a communication link may use metal, glass, optical, air, space, or some other material as the transmission medium. Example communication links may use various communication interfaces and protocols, such as internet protocol (internet protocol, IP), Ethernet, universal serial bus (universal serial bus, USB), bluetooth (bluetooth), WiFi or other communication signaling or Communication formats, including combinations, modifications or variations thereof. Communications links can be direct links, or can include intervening networks, systems, or devices, and can include logical network links transported over multiple physical links.

电子设备100可以包括诸如操作系统、日志、数据库、实用程序、驱动程序、联网软件、用户应用、数据处理应用、游戏应用和存储在计算机可读介质上的其他软件之类的软件。电子设备100的软件可包括由分布式计算系统或云计算服务主控的一个或多个平台。电子设备100的软件可包括逻辑接口元件，诸如软件定义的接口和应用程序编程接口(application programming interface，API)。Electronic device 100 may include software such as operating systems, logs, databases, utilities, drivers, networking software, user applications, data processing applications, game applications, and other software stored on computer-readable media. The software of the electronic device 100 may include one or more platforms hosted by a distributed computing system or cloud computing service. The software of the electronic device 100 may include logical interface elements, such as software-defined interfaces and application programming interfaces (APIs).

电子设备100的软件可用于生成将被图形处理器120渲染的数据并控制图形处理器120的操作来渲染图形以便输出到一个或多个显示屏140上显示。Software of the electronic device 100 may be used to generate data to be rendered by the graphics processor 120 and to control the operation of the graphics processor 120 to render graphics for output to one or more display screens 140 for display.

系统处理器110、图形处理器120、存储器130和显示屏140可通过相耦合的通信线路150进行通信。示例通信线路150可以使用金属、玻璃、光学、空气、空间或一些其他材料作为传输介质。通信线路150可使用各种通信协议和通信信令，诸如计算机总线，包括其组合或变体。通信线路150可以是直接链路，或者可以包括中间网络、系统或设备，并且可以包括通过多个物理链路传输的逻辑网络链路。System processor 110 , graphics processor 120 , memory 130 and display screen 140 may communicate via coupled communication line 150 . Example communication lines 150 may use metal, glass, optical, air, space, or some other material as a transmission medium. Communications link 150 may use various communication protocols and communication signaling, such as a computer bus, including combinations or variations thereof. Communications link 150 may be a direct link, or may include intervening networks, systems, or devices, and may include logical network links transported over multiple physical links.

图2显示为本申请实施例中一种图形处理器120的示例。如图2所示，图形处理器120具体包括多个图形处理器内核121-1，121-2……121-k、配置命令处理器122、交叉开关总线123以及多个L2缓存124-1、124-2、124-3……124-k。其中交叉开关总线123连接于图形处理器内核与L2缓存之间，用于提供图形处理器内核访问L2缓存的通道，以及L2缓存向图形处理器内核返回数据的通道。此外，L2缓存还通过存储接口(memory interface，MIF)与外部的存储器130连接。FIG. 2 shows an example of a graphics processor 120 in the embodiment of the present application. As shown in FIG. 2 , the graphics processor 120 specifically includes a plurality of graphics processor cores 121-1, 121-2 ... 121-k, a configuration command processor 122, a crossbar switch bus 123, and a plurality of L2 caches 124-1, 124-2, 124-3...124-k. The crossbar bus 123 is connected between the graphics processor core and the L2 cache, and is used to provide a channel for the graphics processor core to access the L2 cache, and a channel for the L2 cache to return data to the graphics processor core. In addition, the L2 cache is also connected to the external memory 130 through a memory interface (memory interface, MIF).

在本申请实施例提供的架构下，系统处理器110准备图形处理器120运行的任务和数据，并以命令配置的方式发送至图形处理器内核，具体由配置命令处理器122收到系统处理器110下发的命令，解析出任务，直接下发给图形处理器内核，图形处理器内核开始执行任务。其中，任务也可由配置命令处理器122通过交叉开关总线123发送至存储器130，由图形处理器内核从存储器130读取并进行处理。Under the architecture provided by the embodiment of this application, the system processor 110 prepares the tasks and data for the graphics processor 120 to run, and sends them to the graphics processor core in the form of command configuration, and the configuration command processor 122 receives the system processor The command issued by 110 parses out the task, and directly issues it to the graphics processor core, and the graphics processor core starts to execute the task. Wherein, the task can also be sent to the memory 130 by the configuration command processor 122 through the crossbar bus 123 , and the task can be read from the memory 130 by the graphics processor core and processed.

图形处理器内核执行任务的具体过程包括：图形处理器内核自存储器130读取任务相关的外部数据、进行处理以及写出数据。由于图形处理器内核是多线程(thread)处理，即一条指令处理一批数据，为了减少图形处理器内核的取数据、存数据延迟，提高图形处理器内核的处理效率，典型设计会在图形处理器内核和存储器130之间放置L2缓存，通过L2缓存来预取和缓存大量数据，减少图形处理器内核的等待时间。The specific process for the graphics processor core to execute the task includes: the graphics processor core reads task-related external data from the memory 130 , processes and writes the data. Since the graphics processor core is multi-threaded (thread) processing, that is, one instruction processes a batch of data, in order to reduce the data fetching and data storage delay of the graphics processor core and improve the processing efficiency of the graphics processor core, a typical design will be in the graphics processing An L2 cache is placed between the processor core and the memory 130, a large amount of data is prefetched and cached through the L2 cache, and the waiting time of the graphics processor core is reduced.

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行详细描述。The technical solutions in the embodiments of the present application will be described in detail below with reference to the drawings in the embodiments of the present application.

图3A显示为本申请一实施例中图形处理器300的结构示意图。如图3A所示，图形处理器300包括M个数据和命令分发器(dispatcher)310-1至310-M，以及N个图形处理器内核(cluster)330-1至330-N，其中M和N均为大于或等于2的正整数，每个图形处理器内核都包含一套图形处理管线。在一些实现方式中，M和N的数值可以相同，在另外一些实现方式中，M和N的数值也可以不同。各数据和命令分发器至少可以与一个图形处理器内核相连，至多可以与N个图形处理器内核相连。其中，数据和命令分发器与图形处理器内核的连接方式，包括但不限于数据和命令分发器与图形处理器内核直接相连，或者数据和命令分发器通过数据选择器等与图形处理器内核间接相连。一个数据和命令分发器与一个图形处理器内核之间通过一组数据和命令传输线相连，每一组数据和命令传输线例如为一组1000～2000条的数据和命令传输线。FIG. 3A is a schematic structural diagram of a graphics processor 300 in an embodiment of the present application. As shown in FIG. 3A , the graphics processor 300 includes M data and command distributors (dispatchers) 310-1 to 310-M, and N graphics processor cores (clusters) 330-1 to 330-N, wherein M and N is a positive integer greater than or equal to 2, and each GPU core includes a set of graphics processing pipelines. In some implementation manners, the values of M and N may be the same, and in other implementation manners, the values of M and N may also be different. Each data and command distributor can be connected to at least one graphics processor core, and can be connected to N graphics processor cores at most. Among them, the connection mode between the data and command distributor and the graphics processor core includes, but is not limited to, the data and command distributor is directly connected to the graphics processor core, or the data and command distributor is indirectly connected to the graphics processor core through a data selector, etc. connected. A data and command distributor is connected to a graphics processor core through a set of data and command transmission lines, and each set of data and command transmission lines is, for example, a set of 1000-2000 data and command transmission lines.

本申请实施例中，图形处理器300用于提供至少一个虚拟图形处理器，各虚拟图形处理器包含一个数据和命令分发器以及与该数据和命令分发器相连接的部分或者全部的图形处理器内核。例如，图3A所示图形处理器300中数据和命令分发器310-1以及与之相连的图形处理器内核330-1和330-N可以提供一个虚拟图形处理器，数据和命令分发器310-2以及与之相联的图形处理器内核330-2可以提供另一个虚拟图形处理器。In the embodiment of the present application, the graphics processor 300 is used to provide at least one virtual graphics processor, and each virtual graphics processor includes a data and command distributor and part or all of the graphics processors connected to the data and command distributor kernel. For example, the data and command distributor 310-1 in the graphics processor 300 shown in FIG. 3A and the graphics processor cores 330-1 and 330-N connected thereto can provide a virtual graphics processor, and the data and command distributor 310- 2 and its associated GPU core 330-2 may provide another virtual GPU.

在一些实现方式中，同一时刻图形处理器可以提供多个虚拟图形处理器，每个虚拟图形处理器可以包含多个图形处理器内核，每个图形处理器内核只包含在一个虚拟图形处理器中。每个用户可以使用一个虚拟图形处理器，每个图形处理器内核同一时刻只能被一个用户使用。In some implementations, the graphics processor can provide multiple virtual graphics processors at the same time, and each virtual graphics processor can include multiple graphics processor cores, and each graphics processor core is only included in one virtual graphics processor . Each user can use one virtual GPU, and each GPU core can only be used by one user at a time.

可选地，图3B显示为本申请实施例中数据和命令分发器与图形处理器内核的连接关系示意图。以数据和命令分发器310-1为例，其通过高级可扩展接口协议(advancedextensible interface，axi)和高级高性能总线(advanced high performance bus，ahb)接口(host interface，HI)0与双倍速率同步动态随机存储器(double data rate，DDR)相连。其中，高级可扩展接口协议是一种总线协议，对应一种面向高性能、高带宽、低延迟的片内总线。高级可扩展接口协议能够使片上系统(system on chip，SoC)以更小的面积、更低的功耗，获得更加优异的性能。高级高性能总线是一种高性能总线，主要用于高性能模块之间的连接，其主要特征包括单个时钟边沿操作、非三态的实现方式以及支持突发传输等。图形处理器内核330-1包括着色器(shader)模块、变换反馈(transform feedback，TFB)模块、定位基元组件(position primitive assembly，PPA)模块、最终基本组件(finalprimitive assembly，FPA)模块、像素引擎(pixel engine，PE)模块以及其他模块。其中，最终基本组件模块用于将中间结果写出至内存。定位基元组件用于执行三角形背面剔除(back face culling)、零面积剔除(zero area culling)等剔除操作。FPA模块用于执行视锥体(viewport frustum)变换，像素引擎模块用于对像素执行阿尔法混合(alphablending)等操作。Optionally, FIG. 3B is a schematic diagram showing a connection relationship between the data and command distributor and the graphics processor core in the embodiment of the present application. Taking the data and command distributor 310-1 as an example, it uses an advanced extensible interface protocol (advanced extensible interface, axi) and an advanced high performance bus (advanced high performance bus, ahb) interface (host interface, HI). Synchronous dynamic random access memory (double data rate, DDR) is connected. Among them, the Advanced Extensible Interface Protocol is a bus protocol, corresponding to an on-chip bus oriented to high performance, high bandwidth, and low delay. The advanced scalable interface protocol can enable the system on chip (system on chip, SoC) to obtain more excellent performance with smaller area and lower power consumption. The advanced high-performance bus is a high-performance bus, which is mainly used for the connection between high-performance modules. Its main features include single clock edge operation, non-tri-state implementation, and support for burst transmission. The graphics processor core 330-1 includes a shader (shader) module, a transformation feedback (transform feedback, TFB) module, a position primitive assembly (position primitive assembly, PPA) module, a final basic assembly (final primitive assembly, FPA) module, a pixel Engine (pixel engine, PE) module and other modules. Among them, the final basic component module is used to write out the intermediate results to the memory. The positioning primitive component is used to perform culling operations such as triangle back face culling and zero area culling. The FPA module is used to perform viewport frustum transformation, and the pixel engine module is used to perform operations such as alpha blending on pixels.

根据以上描述可知，本申请实施例提供的图形处理器能够为用户提供至少一个虚拟图形处理器，通过图形处理器虚拟化的方式使得图形处理器能够被多个用户所共享。According to the above description, it can be seen that the graphics processor provided by the embodiment of the present application can provide at least one virtual graphics processor for a user, and the graphics processor can be shared by multiple users through virtualization of the graphics processor.

根据本申请的一实施例中，图形处理器根据接收到的指令被配置为提供n个虚拟图形处理器，其中n为小于或等于N的任意正整数，N为图形处理器内核的数量，其数值例如可以为4、8、16等。可选地，该n个虚拟图形处理器包含所有的N个图形处理器内核，但本申请并不以此为限。According to an embodiment of the present application, the graphics processor is configured to provide n virtual graphics processors according to the received instruction, where n is any positive integer less than or equal to N, and N is the number of graphics processor cores, where Numerical values can be 4, 8, 16, etc., for example. Optionally, the n virtual graphics processors include all N graphics processor cores, but this application is not limited thereto.

于本申请的一实施例中，图形处理器的第i个数据和命令分发器与

个图形处理器内核相连，其中i和n_i均为小于或等于N的正整数，floor为向下取整函数。请参阅图4，以N＝4为例，图形处理器400的第1个数据和命令分发器410-1连接1个图形处理器内核430-1(n₁＝3)，第2个数据和命令分发器410-2连接4个图形处理器内核430-1至430-4(n₂＝1)，第3个数据和命令分发器410-3连接两个图形处理器内核430-1和430-3(n₃＝2)，第4个数据和命令分发器410-4连接两个图形处理器内核430-2和430-4(n₄＝2)。本申请实施例中，数据和命令分发器与图形处理器内核的连接方式，包括但不限于数据和命令分发器与图形处理器内核直接相连，或者数据和命令分发器通过数据选择器等与图形处理器内核间接相连。In one embodiment of the present application, the i-th data and command dispatcher of the graphics processor and

Graphics processor cores are connected, where i and _ni are positive integers less than or equal to N, and floor is a function of rounding down. Please refer to FIG. 4, taking N=4 as an example, the first data and command distributor 410-1 of the graphics processor 400 is connected to a graphics processor core 430-1 (n ₁ =3), the second data and command Command distributor 410-2 is connected to 4 graphics processor cores 430-1 to 430-4 (n ₂ =1), and the third data and command distributor 410-3 is connected to two graphics processor cores 430-1 and 430 -3 (n ₃ =2), the 4th data and command distributor 410-4 connects two graphics processor cores 430-2 and 430-4 (n ₄ =2). In the embodiment of the present application, the connection mode between the data and command distributor and the graphics processor core includes but not limited to the direct connection between the data and command distributor and the graphics processor core, or the connection between the data and command distributor and the graphics processor through a data selector, etc. The processor cores are connected indirectly.

图4所示的图形处理器400可以根据接收到的指令被配置为提供1至4个虚拟图形处理器。当图形处理器被配置为提供1个虚拟图形处理器时，用户可以通过数据和命令分发器410-2使用4个图形处理器内核430-1至430-4。当图形处理器被配置为提供两个虚拟图形处理器时，第1个用户可以通过数据和命令分发器410-2使用图形处理器内核430-1和430-3，第2个用户可以通过数据和命令分发器410-4使用图形处理器内核430-2和430-4。当图形处理器被配置为提供3个虚拟图形处理器时，第1个用户可以通过数据和命令分发器410-1使用图形处理器内核430-1，第2个用户可以通过数据和命令分发器410-2使用图形处理器内核430-2和430-4，第3个用户可以通过数据和命令分发器410-3使用图形处理器内核430-3。当图形处理器被配置为提供4个虚拟图形处理器时，第1个用户可以通过数据和命令分发器410-1使用图形处理器430-1，第2个用户可以通过数据和命令分发器410-2使用图形处理器430-2，第3个用户可以通过数据和命令分发器410-3使用图形处理器430-3，第4个用户可以通过数据和命令分发器410-4使用图形处理器430-4。The graphics processor 400 shown in FIG. 4 may be configured to provide 1 to 4 virtual graphics processors according to received instructions. When the GPU is configured to provide 1 virtual GPU, the user can use 4 GPU cores 430-1 to 430-4 through the data and command distributor 410-2. When the GPU is configured to provide two virtual GPUs, the first user can use the GPU cores 430-1 and 430-3 through the data and command distributor 410-2, and the second user can use the data and command distributor 410-2 And command dispatcher 410-4 uses graphics processor cores 430-2 and 430-4. When the GPU is configured to provide 3 virtual GPUs, the first user can use the GPU core 430-1 through the data and command distributor 410-1, and the second user can use the data and command distributor 410-1 410-2 uses graphics processor cores 430-2 and 430-4, and the third user can use graphics processor core 430-3 through data and command distributor 410-3. When the graphics processor is configured to provide 4 virtual graphics processors, the first user can use the graphics processor 430-1 through the data and command distributor 410-1, and the second user can use the data and command distributor 410 -2 uses the graphics processor 430-2, the third user can use the graphics processor 430-3 through the data and command distributor 410-3, and the fourth user can use the graphics processor through the data and command distributor 410-4 430-4.

根据以上描述可知，本申请实施例中数据和命令分发器与图形处理器内核之间的连接被简化为1个一对四连接(共1×4组连线)，2个一对二连接(共2×2组连线)，以及1个一对一连接(共1×1组连线)，因而本申请实施例中数据和命令分发器与图形处理器内核之间共9组连线。相较于数据和命令分发器与图形处理器内核全连接的方式，本申请实施例提供的连接方式需要的连线更少，有利于避免P&R阶段的拥堵问题，减小芯片面积。According to the above description, it can be seen that the connection between the data and command distributor and the graphics processor core in the embodiment of the present application is simplified as one one-to-four connection (a total of 1×4 group connections), two one-to-two connections ( A total of 2 × 2 groups of connections), and a one-to-one connection (a total of 1 × 1 group of connections), so in the embodiment of the present application, there are a total of 9 groups of connections between the data and command distributor and the graphics processor core. Compared with the method of full connection between the data and command distributor and the graphics processor core, the connection method provided by the embodiment of the present application requires fewer connections, which is beneficial to avoid the congestion problem in the P&R stage and reduce the chip area.

应当理解的是，图4所示N＝4时数据和命令分发器与图形处理器内核之间的连接方式仅为本申请实施例的一种可行方式，但本申请并不以此为限。在一些实现方式中，数据和命令分发器所连接的图形处理器内核可以与图4不同，例如，数据和命令分发器410-1可以不连接430-1而是连接430-2，数据和命令分发器410-3可以不连接430-1和430-3而是连接430-2和430-4。在另一些实现方式中，数据和命令分发器连接的图形处理器内核的数量可以与图4不同，例如，数据和命令分发器410-1可以连接2个、3个或者4个图形处理器内核，数据和命令分发器410-2可以连接1个、2个或者3个图形处理器内核。It should be understood that, when N=4 shown in FIG. 4 , the connection manner between the data and command distributor and the graphics processor core is only a feasible manner of the embodiment of the present application, but the present application is not limited thereto. In some implementations, the graphics processor cores to which the data and command distributors are connected may be different from those shown in FIG. Distributor 410-3 may not connect 430-1 and 430-3 but connect 430-2 and 430-4. In other implementations, the number of graphics processor cores connected to the data and command distributor may be different from that in Figure 4, for example, the data and command distributor 410-1 may be connected to 2, 3 or 4 graphics processor cores , the data and command distributor 410-2 can be connected to 1, 2 or 3 GPU cores.

需要说明的是，为了提升内核利用效率，上述示例中图形处理器在同一时刻提供的所有虚拟图形处理器使用全部的4个图形处理器内核，但本申请并不以此为限。例如，在图形处理器400被配置为仅提供1个虚拟图形处理器时，用户可以通过数据和命令分发器410-2使用两个图形处理器内核430-1和430-3，另外两个图形处理器内核430-2和430-4处于空闲状态。又例如，在图形处理器400被配置为提供两个虚拟图形处理器时，第1个用户可以通过数据和命令分发器410-1使用1个图形处理器内核430-1，第2个用户可以通过数据和命令分发器410-4使用两个图形处理器内核430-2和430-4，另外1个图形处理器内核430-3处于空闲状态。It should be noted that, in order to improve core utilization efficiency, in the above example, all virtual graphics processors provided by the graphics processor at the same time use all 4 graphics processor cores, but the present application is not limited thereto. For example, when the GPU 400 is configured to provide only one virtual GPU, the user can use the two GPU cores 430-1 and 430-3 through the data and command distributor 410-2, and the other two graphics processors Processor cores 430-2 and 430-4 are in an idle state. For another example, when the graphics processor 400 is configured to provide two virtual graphics processors, the first user can use one graphics processor core 430-1 through the data and command distributor 410-1, and the second user can use Two graphics processor cores 430-2 and 430-4 are used by the data and command distributor 410-4, and the other one graphics processor core 430-3 is in an idle state.

于本申请的一实施例中，图形处理器包含N个数据和命令分发器，其中的1个数据和命令分发器与N个图形处理器内核相连，其中的m_j-m_j+1个数据和命令分发器与

个图形处理器内核相连，m_j和m_j+1为相邻的能够被N整除的正整数，1≤m_j+1＜m_j≤N。例如，N＝4时m_j和m_j+1的数值包括两种：m_j+1＝1且m_j＝2，m_j+1＝2且m_j＝4。基于此，本申请实施例提供的图形处理器包含4个数据和命令分发器时，其中的1个数据和命令分发器与4个图形处理器内核相连，其中的另1个数据和命令分发器与2个图形处理器内核相连(m_j+1＝1且m_j＝2)，其中的另外两个数据和命令分发器各自与1个图形处理器内核相连(m_j+1＝2且m_j＝4)。本申请实施例中，数据和命令分发器与图形处理器内核的连接方式，包括但不限于数据和命令分发器与图形处理器内核直接相连，或者数据和命令分发器通过数据选择器等与图形处理器内核间接相连。In an embodiment of the present application, the graphics processor includes N data and command distributors, one of which is connected to N graphics processor cores, and m _j -m _j+1 data and command dispatcher with

Two graphics processor cores are connected, m _j and m _j+1 are adjacent positive integers divisible by N, and 1≤m _j+1 <m _j ≤N. For example, when N=4, the values of m _j and m _j+1 include two types: m _j+1 =1 and m _j =2, and m _j+1 =2 and m _j =4. Based on this, when the graphics processor provided by the embodiment of the present application includes 4 data and command distributors, one of the data and command distributors is connected to the 4 graphics processor cores, and the other one of the data and command distributors It is connected to 2 graphics processor cores (m _j+1 =1 and m _j =2), and the other two data and command distributors are connected to 1 graphics processor core each (m _j+1 =2 and m _j = 4). In the embodiment of the present application, the connection mode between the data and command distributor and the graphics processor core includes but not limited to the direct connection between the data and command distributor and the graphics processor core, or the connection between the data and command distributor and the graphics processor through a data selector, etc. The processor cores are connected indirectly.

接下来将分别以N＝8和N＝16为例对上述连接方案进行详细介绍。请参阅图5A，在一个示例中，N＝8，图形处理器500包括8个数据和命令分发器。其中的1个数据和命令分发器510-1与图形处理器内核530-1直接相连，并与图形处理器内核530-2至530-8通过数据选择器间接相连。其中的1个数据和命令分发器510-5与4个图形处理器内核530-5至530-8通过数据选择器间接相连(m_j+1＝1，m_j＝2)。对于其中的2个数据和命令分发器510-3和510-7，数据和命令分发器510-3与两个图形处理器内核530-3和530-4通过数据选择器间接相连，数据和命令分发器510-7与两个图形处理器内核530-7和530-8通过数据选择器间接相连(m_j+1＝2，m_j＝4)。其中的4个数据和命令分发器510-2、510-4、510-6和510-8各自与对应的图形处理器内核530-2、530-4、530-6和530-8通过数据选择器间接相连(m_j+1＝4，m_j＝8)。Next, the above connection schemes will be described in detail by taking N=8 and N=16 as examples respectively. Referring to FIG. 5A , in an example, N=8, and the graphics processor 500 includes 8 data and command distributors. One of the data and command distributors 510-1 is directly connected to the graphics processor core 530-1, and is indirectly connected to the graphics processor cores 530-2 to 530-8 through a data selector. One of the data and command distributors 510-5 is indirectly connected to the four GPU cores 530-5 to 530-8 through data selectors (m _j+1 =1, m _j =2). For the two data and command distributors 510-3 and 510-7, the data and command distributor 510-3 is indirectly connected to the two graphics processor cores 530-3 and 530-4 through a data selector, and the data and command The distributor 510-7 is indirectly connected to the two GPU cores 530-7 and 530-8 through a data selector (m _j+1 =2, m _j =4). The four data and command distributors 510-2, 510-4, 510-6 and 510-8 are respectively connected with the corresponding graphics processor cores 530-2, 530-4, 530-6 and 530-8 through data selection devices are indirectly connected (m _j+1 =4, m _j =8).

图5A所示的图形处理器500可以被配置为提供1至8个虚拟图形处理器，最少能够支持1个用户(图形处理器500被配置为提供1个虚拟图形处理器时)，最多可以支持8个用户(图形处理器500被配置为提供8个虚拟图形处理器时)。图形处理器500的图形处理器内核具有22中可能的分配情况，具体如下表1所示。例如，第10中情况图形处理器500被配置为提供3个虚拟图形处理器给3个用户使用，其中两个用户占有3个图形处理器内核，另1个用户占用2个图形处理器内核。结合图5A，图形处理器内核的分配方案可以为第1个用户占有图形处理器内核530-1、530-2和530-8，第2个用户占有图形处理器内核530-5、530-6和530-7，第3个用户占有图形处理器内核530-3和530-4。The graphics processor 500 shown in FIG. 5A can be configured to provide 1 to 8 virtual graphics processors, which can support 1 user at least (when the graphics processor 500 is configured to provide 1 virtual graphics processor), and can support at most 8 users (when the GPU 500 is configured to provide 8 virtual GPUs). The graphics processor cores of the graphics processor 500 have 22 possible assignments, as shown in Table 1 below. For example, in the tenth case, the graphics processor 500 is configured to provide 3 virtual graphics processors for 3 users, wherein two users occupy 3 graphics processor cores, and the other user occupies 2 graphics processor cores. In combination with FIG. 5A, the allocation scheme of the graphics processor cores can be that the first user occupies the graphics processor cores 530-1, 530-2 and 530-8, and the second user occupies the graphics processor cores 530-5 and 530-6. and 530-7, the third user occupies graphics processor cores 530-3 and 530-4.

需要说明的是，本申请实施例中数据和命令分发器与图形处理器内核之间的连接方式并不唯一，在一些其他实现方式中也可以采用其他连接方式。例如，图5B显示为N＝8时的另一种连接方案，此种方案与图5A所示的连接方案具有相同的效果。It should be noted that the connection manner between the data and command distributor and the graphics processor core in the embodiment of the present application is not unique, and other connection manners may also be used in some other implementation manners. For example, FIG. 5B shows another connection scheme when N=8, and this scheme has the same effect as the connection scheme shown in FIG. 5A.

表1.图形处理器500的内核数量分配情况Table 1. Distribution of the number of cores of the graphics processor 500

上述示例中，数据和命令分发器与图形处理器内核之间的连接被简化为1个一对八连接(共1×8组连线)，1个一对四连接(共1×4组连线)，2个一对二连接(共2×2组连线)，以及4个一对一连接(共4×1组连线)，因而本申请实施例中数据和命令分发器与图形处理器内核之间共20组连线。其中，每一组连线例如为一组1000～2000条的数据和命令连线。相较于数据和命令分发器与图形处理器内核全连接的方式，本示例提供的连接方式需要的连线更少，有利于避免P&R阶段的拥堵问题，减小芯片面积。In the above example, the connection between the data and command distributor and the GPU core is simplified as 1 one-to-eight connection (a total of 1×8 group connections), one one-to-four connection (a total of 1×4 group connections). line), 2 one-to-two connections (a total of 2×2 group connections), and 4 one-to-one connections (a total of 4×1 group connections), so in the embodiment of the application, the data and command distributor and the graphics processing A total of 20 sets of connections between the cores. Wherein, each group of connections is, for example, a group of 1000-2000 data and command connections. Compared with the method of full connection between the data and command distributor and the GPU core, the connection method provided in this example requires fewer connections, which is beneficial to avoid the congestion problem in the P&R stage and reduce the chip area.

请参阅图5C，在另外一个示例中，N＝16，图形处理器包含16个数据和命令分发器。其中的第1个数据和命令分发器与第1个图形处理器内核直接相连，并与其余15个图形处理器内核通过数据选择器间接相连。其中的第9个数据和命令分发器与8个图形处理器内核通过数据选择器间接相连。其中的第2个数据和命令分发器与5个图形处理器内核通过数据选择器间接相连。其中的第7个数据和命令分发器与4个图形处理器内核通过数据选择器间接相连。其中的第16个数据和命令分发器与3个图形处理器内核通过数据选择器间接相连。其中的第4个、第11个和第13个数据和命令分发器与2个图形处理器内核通过数据选择器间接相连。其中的第3个、第5个、第6个、第8个、第10个、第12个、第14个和第15个数据和命令分发器与1个图形处理器内核通过数据选择器间接相连。本示例中，数据和命令分发器与图形处理器内核之间的连接被简化为1个一对十六连接(共1×16组连线)，1个一对八连接(共1×8组连线)，1个一对五连接(共1×5组连线)，1个一对四连接(共1×4组连线)，1个一对三连接(共1×3组连线)，3个一对二连接(共3×2组连线)，以及8个一对一连接(共8×1组连线)，因而本申请实施例中数据和命令分发器与图形处理器内核之间共50组连线。相较于数据和命令分发器与图形处理器内核全连接的方式，本示例提供的连接方式需要的连线更少，有利于避免P&R阶段的拥堵问题，减小芯片面积。Please refer to FIG. 5C , in another example, N=16, the graphics processor includes 16 data and command dispatchers. The first data and command distributor is directly connected to the first graphics processor core, and indirectly connected to the remaining 15 graphics processor cores through data selectors. Among them, the ninth data and command distributor is indirectly connected to the eight graphics processor cores through the data selector. Among them, the second data and command distributor is indirectly connected to the five graphics processor cores through the data selector. Among them, the seventh data and command distributor is indirectly connected with the four graphics processor cores through the data selector. Among them, the 16th data and command dispatcher is indirectly connected with the 3 graphics processor cores through the data selector. Among them, the 4th, 11th and 13th data and command distributors are indirectly connected to the two graphics processor cores through data selectors. Among them, the 3rd, 5th, 6th, 8th, 10th, 12th, 14th, and 15th data and command distributors are indirectly connected to 1 graphics processor core through data selectors connected. In this example, the connection between the data and command dispatcher and the graphics processor core is simplified as 1 one-to-sixteen connection (a total of 1×16 groups of wires), one pair of eight connections (a total of 1×8 groups of wires) connection), 1 one-to-five connection (a total of 1×5 groups of connections), 1 one-to-four connection (a total of 1×4 groups of connections), 1 one-to-three connection (a total of 1×3 groups of connections ), 3 one-to-two connections (a total of 3 × 2 group connections), and 8 one-to-one connections (a total of 8 × 1 group connections), so in the embodiment of the application, the data and command distributor and the graphics processor There are a total of 50 sets of connections between the cores. Compared with the method of full connection between the data and command distributor and the GPU core, the connection method provided in this example requires fewer connections, which is beneficial to avoid the congestion problem in the P&R stage and reduce the chip area.

于本申请的一实施例中，图形处理器还可以包括数据选择器。连接至少两个数据和命令分发器的图形处理器内核通过数据选择器与数据和命令分发器间接相连。例如，图5A所示的图形处理器500中包含数据选择器520-2至520-8。对于连接至少两个数据和命令分发器的图形处理器内核，例如图形处理器内核530-5，通过数据选择器与对应的数据和命令分发器间接相连。本申请实施例中，数据选择器在同一时刻至多选择一个数据和命令分发器与图形处理器内核相连。In an embodiment of the present application, the graphics processor may further include a data selector. Graphics processor cores connected to at least two data and command distributors are indirectly connected to the data and command distributors through data selectors. For example, the graphics processor 500 shown in FIG. 5A includes data selectors 520-2 to 520-8. For a graphics processor core connected to at least two data and command distributors, such as graphics processor core 530-5, it is indirectly connected to the corresponding data and command distributor through a data selector. In the embodiment of the present application, the data selector selects at most one data and command distributor to connect to the graphics processor core at the same time.

可选地，对于只需要连接1个数据和命令分发器的图形处理器内核，例如图形处理器内核530-1，可以不通过数据选择器与数据和命令分发器相连。Optionally, for a graphics processor core that only needs to be connected to one data and command distributor, such as graphics processor core 530-1, it may not be connected to the data and command distributor through a data selector.

应当理解的是，本申请实施例所述的数据选择器包含所有能够从一组输入信号中选出指定的一个信号并输出的器件或者电路，而非限定为某种特定的器件或者电路。于本申请的一实施例中，数据和命令分发器与图形处理器内核全连接。其中，全连接是指，每一数据和命令分发器与所有的图形处理器内核均相连，每一图形处理器内核与所有的数据和命令分发器均相连。其中，数据和命令分发器与图形处理器内核的连接方式，包括但不限于数据和命令分发器与图形处理器内核直接相连，或者数据和命令分发器通过数据选择器等与图形处理器内核间接相连。图6显示为N＝8时数据和命令分发器与图形处理器内核全连接的示例图，此时，图形处理器可以被配置为提供1至8个虚拟图形处理器，并能满足处理器内核的所有分配情况。It should be understood that the data selector described in the embodiment of the present application includes all devices or circuits capable of selecting and outputting a specified signal from a group of input signals, rather than being limited to a specific device or circuit. In one embodiment of the present application, the data and command dispatcher is fully connected to the GPU core. Wherein, full connection means that each data and command distributor is connected to all graphics processor cores, and each graphics processor core is connected to all data and command distributors. Among them, the connection mode between the data and command distributor and the graphics processor core includes, but is not limited to, the data and command distributor is directly connected to the graphics processor core, or the data and command distributor is indirectly connected to the graphics processor core through a data selector, etc. connected. Figure 6 shows an example diagram of the full connection between the data and command distributor and the graphics processor core when N=8, at this time, the graphics processor can be configured to provide 1 to 8 virtual graphics processors, and can satisfy all allocations.

于本申请的一实施例中，图形处理器的物理层的层数根据数据和命令传输线的数量所配置。具体地，图形处理器的每1层物理层所包含的数据和命令传输线的数量存在最大限制，数据和命令传输线的数量越少，图形处理器的物理层的层数越少。以N＝8为例，当数据和命令分发器与图形处理器内核全连接时，数据和命令传输线的数量为64组，图形处理器的物理层的层数被配置为8层。当采用图5A或图5B所示的连接方式时，数据和命令传输线的数量为20组，图形处理器的物理层的层数可以被配置为4层。此时，采用图5A或图5B所示的连线方式，能够减少图形处理器的物理层层数。In an embodiment of the present application, the number of physical layers of the graphics processor is configured according to the number of data and command transmission lines. Specifically, there is a maximum limit on the number of data and command transmission lines contained in each physical layer of the graphics processor, and the fewer the number of data and command transmission lines, the fewer layers of the physical layer of the graphics processor. Taking N=8 as an example, when the data and command distributor is fully connected to the graphics processor core, the number of data and command transmission lines is 64 groups, and the number of physical layers of the graphics processor is configured as 8 layers. When the connection mode shown in FIG. 5A or FIG. 5B is adopted, the number of data and command transmission lines is 20 groups, and the number of physical layers of the graphics processor can be configured as 4 layers. In this case, the number of physical layers of the graphics processor can be reduced by adopting the connection method shown in FIG. 5A or FIG. 5B .

本申请还提供一种芯片。图7显示为本申请一实施例中芯片的结构示意图，该芯片包括本申请任一实施例所述的图形处理器以及输入输出引脚。The application also provides a chip. FIG. 7 is a schematic structural diagram of a chip in an embodiment of the present application, and the chip includes the graphics processor and input and output pins described in any embodiment of the present application.

本申请还提供一种电子设备，该电子设备包括本申请任一实施例所述的图形处理器以及与该图形处理器通信相连的存储器。The present application also provides an electronic device, which includes the graphics processor described in any embodiment of the present application and a memory connected to the graphics processor in communication.

综上所述，本申请实施例提供的图形处理器，能够提供至少一个虚拟图形处理器，通过图形处理器虚拟化的方式使得图形处理器能够被多个用户共同使用。在本申请的一些实施例中，通过优化数据和命令分发器与图形处理器内核之间的连线方式，能够减少数据和命令分发器与图形处理器内核之间的连线数量，避免芯片布局布线阶段的拥堵问题，有利于减小芯片面积以及减少图形处理器的物理层数量。因此，本申请有效克服了现有技术中的种种缺点而具高度产业利用价值。To sum up, the graphics processor provided by the embodiment of the present application can provide at least one virtual graphics processor, and the graphics processor can be shared by multiple users through virtualization of the graphics processor. In some embodiments of the present application, by optimizing the connection mode between the data and command distributor and the graphics processor core, the number of connections between the data and command distributor and the graphics processor core can be reduced, avoiding chip layout The congestion problem in the wiring stage is conducive to reducing the chip area and reducing the number of physical layers of the graphics processor. Therefore, the present application effectively overcomes various shortcomings in the prior art and has high industrial application value.

上述实施例仅例示性说明本申请的原理及其功效，而非用于限制本申请。任何熟悉此技术的人士皆可在不违背本申请的精神及范畴下，对上述实施例进行修饰或改变。因此，举凡所属技术领域中具有通常知识者在未脱离本申请所揭示的精神与技术思想下所完成的一切等效修饰或改变，仍应由本申请的权利要求所涵盖。The above-mentioned embodiments are only illustrative to illustrate the principles and effects of the present application, but are not intended to limit the present application. Any person familiar with the technology can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present application. Therefore, all equivalent modifications or changes made by those skilled in the art without departing from the spirit and technical ideas disclosed in the application shall still be covered by the claims of the application.

Claims

1. A graphics processor comprising at least two data and command distributors and at least two graphics processor cores, each of said data and command distributors being coupled to at least one of said graphics processor cores, wherein one of said data and command distributors is coupled to one of said graphics processor cores via a set of data and command transmission lines;

the graphics processor is configured to provide at least one virtual graphics processor, each of the virtual graphics processors including one of the data and command distributors and some or all of the graphics processor cores coupled thereto.

2. The graphics processor of claim 1, wherein the graphics processor is configured to provide N of the virtual graphics processors in accordance with the received instructions, where N is any positive integer less than or equal to N, where N is the number of graphics processor cores.

3. The graphics processor as recited in claim 2, wherein an ith of said data and command distributors and

the graphics processor cores are connected, wherein i and n _i All are positive integers less than or equal to N, and floor is a downward rounding function.

4. A graphics processor as claimed in claim 3, wherein said graphics processor comprises N of said data and command distributors, 1 of which is coupled to N of said graphics processor cores, m of which _j -m _j+1 Each of the data and command distributors and

each saidGraphics processor cores are connected, m _j And m _j+1 Is adjacent positive integer which can be divided by N, and is 1.ltoreq.m _j+1 ＜m _j ≤N。

5. The graphics processor of claim 4, further comprising a data selector, the graphics processor core connecting at least two of the data and command distributors being connected to the data and command distributors by the data selector.

6. The graphics processor of claim 4 wherein the number of data and command distributors and the graphics processor cores is 8, the manner in which the data and command distributors are connected to the graphics processor cores comprises 1 one-to-eight connection, 1 one-to-four connection, 2 one-to-two connection, and 4 one-to-one connection.

7. The graphics processor of claim 1, wherein the number of layers of a physical layer of the graphics processor is configured according to the number of data and command transmission lines.

8. The graphics processor of claim 1, wherein the data and command distributor is fully coupled to the graphics processor core.

9. A chip comprising the graphics processor of any one of claims 1 to 8 and input-output pins.

10. An electronic device comprising the graphics processor of any one of claims 1 to 8 and a memory.