TWI865333B

TWI865333B - Computing platform system and graphics processing unit resource management method thereof

Info

Publication number: TWI865333B
Application number: TW113104507A
Authority: TW
Inventors: 陳志嘉; 宋孟霖
Original assignee: 宏碁股份有限公司
Priority date: 2024-02-05
Filing date: 2024-02-05
Publication date: 2024-12-01
Also published as: TW202533161A

Abstract

A computing platform system and a graphics processing unit (GPU) resource management method thereof are provided. The method includes the following steps. At least a first graphics processing unit among a plurality of graphics processing units is assigned to a user. An idle threshold corresponding to the at least one first graphics processing unit is determined. Idle time of the at least one first graphics processing unit is checked. When the idle time of the at least one first graphics processing unit is greater than the idle threshold, at least one first graphics processing unit is released.

Description

Computing platform system and its graphics processing unit resource management method

本發明是有關於一種運算平台系統，且特別是有關於一種運算平台系統與其圖形處理單元資源管理方法。The present invention relates to a computing platform system, and more particularly to a computing platform system and a graphics processing unit resource management method thereof.

圖形處理單元（Graphics Processing Unit，GPU）在計算機科學和工程領域中扮演著重要角色，不僅爲圖形處理提供了强大的支持，也爲廣泛的通用計算任務提供了高性能的解决方案。進一步來說，圖形處理單元最初是爲了處理圖形渲染而設計。然而，隨著技術的發展，圖形處理單元目前逐漸演變成為通用並行處理單元而可用於執行各種計算密集型任務，不僅限於圖形處理。像是，圖形處理單元目前已經廣泛地應用於機器學習、深度學習、科學計算等等需要大量並行計算的領域。Graphics Processing Unit (GPU) plays an important role in computer science and engineering. It not only provides powerful support for graphics processing, but also provides high-performance solutions for a wide range of general computing tasks. In particular, GPUs were originally designed to handle graphics rendering. However, with the development of technology, GPUs are now gradually evolving into general-purpose parallel processing units that can be used to perform a variety of computationally intensive tasks, not just graphics processing. For example, GPUs are currently widely used in machine learning, deep learning, scientific computing, and other fields that require a large amount of parallel computing.

目前，擁有多個圖形處理單元的運算平台為使用者提供了龐大的計算資源，這種硬體配置在處理大規模且複雜的計算任務時表現優越。然而，儘管這些運算平台的圖形處理單元具有強大的運算能力，卻存在一個普遍的問題，即這些分配給使用者的圖形處理單元並非始終處於高負載運行狀態，有時甚至處於閒置狀態。這種現象突顯了圖形處理單元資源的浪費問題，對於能源效率和計算效能都帶來了不利的影響。當圖形處理單元處於閒置狀態時，實際上是一種能源和硬體資源的浪費，這不僅增加了運算平台的運行成本，也對環境產生了不必要的負擔。同時，這種浪費還直接影響了計算項目的可擴展性，因為未充分利用的圖形處理單元意味著計算資源未能被最大化發揮。Currently, computing platforms with multiple graphics processing units provide users with huge computing resources. This hardware configuration performs well in processing large-scale and complex computing tasks. However, despite the powerful computing capabilities of the graphics processing units of these computing platforms, there is a common problem, that is, these graphics processing units allocated to users are not always in a high-load operation state, and sometimes they are even idle. This phenomenon highlights the problem of waste of graphics processing unit resources, which has an adverse impact on energy efficiency and computing performance. When the graphics processing unit is idle, it is actually a waste of energy and hardware resources, which not only increases the operating cost of the computing platform, but also creates an unnecessary burden on the environment. At the same time, this waste also directly affects the scalability of computing projects, because underutilized graphics processing units mean that computing resources cannot be maximized.

有鑑於此，本發明提出一種運算平台系統與其圖形處理單元資源管理方法，其可解決上述技術問題。In view of this, the present invention proposes a computing platform system and a graphics processing unit resource management method thereof, which can solve the above technical problems.

本發明實施例提供一種圖形處理單元資源管理方法，適用於包括多個圖形處理單元的運算平台系統。所述方法包括下列步驟。配置多個圖形處理單元中的至少一第一圖形處理單元給一用戶。決定至少一第一圖形處理單元對應的閒置臨界值。檢查至少一第一圖形處理單元的閒置時間。當至少一第一圖形處理單元的閒置時間大於閒置臨界值，釋放至少一第一圖形處理單元。The embodiment of the present invention provides a method for managing resources of a graphics processing unit, which is applicable to a computing platform system including a plurality of graphics processing units. The method comprises the following steps. Allocate at least one first graphics processing unit among the plurality of graphics processing units to a user. Determine an idle threshold corresponding to the at least one first graphics processing unit. Check the idle time of the at least one first graphics processing unit. When the idle time of the at least one first graphics processing unit is greater than the idle threshold, release the at least one first graphics processing unit.

本發明實施例提供一種運算平台系統，其包括多個圖形處理單元、儲存裝置以及處理器。處理器耦接多個圖形處理單元與儲存裝置，並經配置以執行下列操作。配置多個圖形處理單元中的至少一第一圖形處理單元給一用戶。決定至少一第一圖形處理單元對應的閒置臨界值。檢查至少一第一圖形處理單元的閒置時間。當至少一第一圖形處理單元的閒置時間大於閒置臨界值，釋放至少一第一圖形處理單元。An embodiment of the present invention provides a computing platform system, which includes multiple graphics processing units, a storage device and a processor. The processor is coupled to the multiple graphics processing units and the storage device, and is configured to perform the following operations. At least one first graphics processing unit among the multiple graphics processing units is configured to a user. An idle threshold corresponding to the at least one first graphics processing unit is determined. The idle time of the at least one first graphics processing unit is checked. When the idle time of the at least one first graphics processing unit is greater than the idle threshold, the at least one first graphics processing unit is released.

基於上述，於本發明的實施例中，用以決定是否釋放圖形處理單元資源的閒置臨界值是根據配置給用戶的圖形處理單元而彈性決定。當配置給用戶的圖形處理單元的閒置時間大於對應的閒置臨界值，可釋放圖形處理單元資源。基此，不僅可有效提昇圖形處理單元資源的利用率並減少圖形處理單元資源的浪費。Based on the above, in the embodiment of the present invention, the idle threshold value used to determine whether to release the graphics processing unit resources is flexibly determined according to the graphics processing unit configured for the user. When the idle time of the graphics processing unit configured for the user is greater than the corresponding idle threshold value, the graphics processing unit resources can be released. Based on this, not only can the utilization rate of the graphics processing unit resources be effectively improved, but also the waste of the graphics processing unit resources can be reduced.

本發明的部份實施例接下來將會配合附圖來詳細描述，以下的描述所引用的元件符號，當不同附圖出現相同的元件符號將視為相同或相似的元件。這些實施例只是本發明的一部份，並未揭示所有本發明的可實施方式。更確切的說，這些實施例只是本發明的專利申請範圍中的方法與系統的範例。Some embodiments of the present invention will be described in detail below with reference to the accompanying drawings. When the same element symbols appear in different drawings, they will be regarded as the same or similar elements. These embodiments are only part of the present invention and do not disclose all possible implementations of the present invention. More precisely, these embodiments are only examples of methods and systems within the scope of the patent application of the present invention.

圖1是依照本發明一實施例的運算平台系統的方塊圖。請參照圖1，運算平台系統10包括多個圖形處理單元110_1～110_N、儲存裝置120，以及處理器130。於一些實施例中，運算平台系統10可由一或多台伺服器而實現。Fig. 1 is a block diagram of a computing platform system according to an embodiment of the present invention. Referring to Fig. 1, the computing platform system 10 includes a plurality of graphics processing units 110_1-110_N, a storage device 120, and a processor 130. In some embodiments, the computing platform system 10 can be implemented by one or more servers.

多個圖形處理單元110_1～110_N是一種高效且並行處理能力強大的硬體資源，適用於處理各種計算密集型任務。本發明對於圖形處理單元110_1～110_N的數量並不限制。此外，這些圖形處理單元110_1～110_N的硬體規格可相同或相異，本發明亦不對此設限。須說明的是，不同硬體規格的圖形處理單元的運算效能與設置成本也有所差異。The plurality of graphics processing units 110_1 to 110_N are a kind of hardware resource with high efficiency and strong parallel processing capability, which is suitable for processing various computationally intensive tasks. The present invention does not limit the number of graphics processing units 110_1 to 110_N. In addition, the hardware specifications of these graphics processing units 110_1 to 110_N can be the same or different, and the present invention does not set any limitation on this. It should be noted that the computing performance and installation cost of graphics processing units with different hardware specifications are also different.

儲存裝置120用以儲存資料與供處理器130存取的軟體模組（例如作業系統、應用程式、驅動程式）等資料，其可以例如是任意型式的固定式或可移動式隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、快閃記憶體（flash memory）、硬碟或其組合。The storage device 120 is used to store data and data such as software modules (such as operating systems, applications, and drivers) accessed by the processor 130. It can be, for example, any type of fixed or removable random access memory (RAM), read-only memory (ROM), flash memory, hard disk, or a combination thereof.

處理器130耦接多個圖形處理單元110_1～110_N與儲存裝置120，並可為一般用途處理器、特殊用途處理器、傳統的處理器、數位訊號處理器、多個微處理器（microprocessor）、一個或多個結合數位訊號處理器核心的微處理器、控制器、微控制器、特殊應用積體電路（Application Specific Integrated Circuit，ASIC）、現場可程式閘陣列電路（Field Programmable Gate Array，FPGA）、任何其他種類的積體電路、狀態機、基於進階精簡指令集機器（Advanced RISC Machine，ARM）的處理器以及類似品。處理器130可存取並執行記錄在儲存裝置120中的指令或程式碼，以實現本發明實施例中的圖形處理單元資源管理方法。The processor 130 is coupled to the plurality of graphics processing units 110_1-110_N and the storage device 120, and may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors, one or more microprocessors combined with a digital signal processor core, a controller, a microcontroller, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), any other type of integrated circuit, a state machine, an advanced RISC machine (ARM) based processor, and the like. The processor 130 can access and execute the instructions or program codes recorded in the storage device 120 to implement the graphics processing unit resource management method in the embodiment of the present invention.

圖2是依照本發明一實施例的圖形處理單元資源管理方法的流程圖，而圖2的方法流程可以由圖1的運算平台系統10的各元件實現。請同時參照圖1及圖2，以下即搭配圖1中運算平台系統10的各項元件，說明本實施例的圖形處理單元資源管理方法的步驟。FIG2 is a flow chart of a method for managing graphics processing unit resources according to an embodiment of the present invention, and the method flow of FIG2 can be implemented by various components of the computing platform system 10 of FIG1. Please refer to FIG1 and FIG2 simultaneously, and the steps of the method for managing graphics processing unit resources of this embodiment will be described below in conjunction with various components of the computing platform system 10 of FIG1.

於步驟S210，處理器130配置多個圖形處理單元110_1～110_N中的至少一第一圖形處理單元給一用戶。詳細來說，用戶可於運算平台系統10創建虛擬機（Virtual machine）或容器（Container）。上述容器例如為Docker容器。虛擬機或容器所提供的虛擬環境可提供一個獨立且隔離的運行空間，使得應用程序能夠在相對獨立的環境中運行。當用戶於運算平台系統10創建虛擬機或容器，處理器130可配置多個圖形處理單元110_1～110_N中的至少一第一圖形處理單元給用戶使用。本發明對於第一圖形處理單元的數量並不限制，其可視實際應用而定。換言之，本文中，第一圖形處理單元可為圖形處理單元110_1～110_N其中任意一者或任意多者。In step S210, the processor 130 configures at least one first graphics processing unit among the multiple graphics processing units 110_1~110_N for a user. In detail, the user can create a virtual machine (Virtual machine) or a container (Container) on the computing platform system 10. The above-mentioned container is, for example, a Docker container. The virtual environment provided by the virtual machine or container can provide an independent and isolated running space, so that the application can run in a relatively independent environment. When the user creates a virtual machine or a container on the computing platform system 10, the processor 130 can configure at least one first graphics processing unit among the multiple graphics processing units 110_1~110_N for the user to use. The present invention does not limit the number of the first graphics processing unit, which may be determined according to actual applications. In other words, herein, the first graphics processing unit may be any one or any number of the graphics processing units 110_1 to 110_N.

於一些實施例中，處理器130可根據用戶指定GPU的種類和數量來配置第一圖形處理單元給一用戶。或者，於一些實施例中，處理器130可根據用戶想完成的計算任務內容來決定配置給用戶的第一圖形處理單元的種類與數量。舉例而言，處理器130可將圖形處理單元110_1～110_N中的圖形處理單元110_1配置給一用戶。或者，處理器130可將圖形處理單元110_1～110_N中的圖形處理單元110_1、110_2配置給一用戶。In some embodiments, the processor 130 may configure the first graphics processing unit to a user according to the type and quantity of the GPU specified by the user. Alternatively, in some embodiments, the processor 130 may determine the type and quantity of the first graphics processing unit to be configured to the user according to the content of the computing task that the user wants to complete. For example, the processor 130 may configure the graphics processing unit 110_1 among the graphics processing units 110_1 to a user. Alternatively, the processor 130 may configure the graphics processing units 110_1 and 110_2 among the graphics processing units 110_1 to 110_N to a user.

於步驟S220，處理器130決定至少一第一圖形處理單元對應的閒置臨界值。也就是說，於本發明實施例中，處理器130可根據配置給用戶的第一圖形處理單元來彈性地決定相對應的閒置臨界值。於一些實施例中，不同的圖形處理單元110_1～110_N可對應不同的閒置臨界值。舉例而言，圖形處理單元110_1可對應至第一閒置臨界值，而圖形處理單元110_2可對應至第二閒置臨界值。第一閒置臨界值相異於第二閒置臨界值。In step S220, the processor 130 determines an idle threshold corresponding to at least one first graphics processing unit. That is, in the embodiment of the present invention, the processor 130 can flexibly determine the corresponding idle threshold according to the first graphics processing unit configured for the user. In some embodiments, different graphics processing units 110_1~110_N can correspond to different idle thresholds. For example, the graphics processing unit 110_1 can correspond to the first idle threshold, and the graphics processing unit 110_2 can correspond to the second idle threshold. The first idle threshold is different from the second idle threshold.

於一些實施例中，處理器130可根據至少一第一圖形處理單元的種類決定閒置臨界值。至少一第一圖形處理單元的種類可包括硬體型號或運算性能等級。也就是說，處理器130可根據圖形處理單元110_1～110_N的硬體型號或運算性能等級，來決定各個圖形處理單元110_1～110_N所對應的閒置臨界值。當至少一第一圖形處理單元屬於第一種類，閒置臨界值為第一時間長度。當至少一第一圖形處理單元屬於第二種類，閒置臨界值為第二時間長度。In some embodiments, the processor 130 may determine the idle threshold value according to the type of at least one first graphics processing unit. The type of at least one first graphics processing unit may include a hardware model or a computing performance level. In other words, the processor 130 may determine the idle threshold value corresponding to each graphics processing unit 110_1~110_N according to the hardware model or computing performance level of the graphics processing units 110_1~110_N. When at least one first graphics processing unit belongs to the first type, the idle threshold value is a first time length. When at least one first graphics processing unit belongs to the second type, the idle threshold value is a second time length.

圖3是依照本發明一實施例的決定閒置臨界值的流程圖。請參照圖3，於一些實施例中，圖2的步驟S220可實施為步驟S301至步驟S303。FIG3 is a flow chart of determining an idle threshold value according to an embodiment of the present invention. Referring to FIG3, in some embodiments, step S220 of FIG2 can be implemented as steps S301 to S303.

於步驟S301，處理器130判斷至少一第一圖形處理單元的種類為第一種類或第二種類。舉例而言，處理器130可判斷配置給用戶的第一圖形處理單元為第一硬體型號或第二硬體型號。或者，處理器130可判斷配置給用戶的第一圖形處理單元為第一運算性能等級或第二運算性能等級。In step S301, the processor 130 determines whether the type of at least one first graphics processing unit is a first type or a second type. For example, the processor 130 may determine whether the first graphics processing unit configured for the user is a first hardware model or a second hardware model. Alternatively, the processor 130 may determine whether the first graphics processing unit configured for the user is a first computing performance level or a second computing performance level.

當至少一第一圖形處理單元的種類為第一種類，於步驟S302，處理器130設置閒置臨界值為第一時間長度。當至少一第一圖形處理單元的種類為第二種類，於步驟S303，處理器130設置閒置臨界值為第二時間長度。於一些實施例中，處理器130可根據至少一第一圖形處理單元的種類進行查表，以決定閒置臨界值的時間長度。舉例而言，當第一圖形處理單元的種類為第一種類，處理器130可設置閒置臨界值為48小時。當第一圖形處理單元的種類為第二種類，處理器130可設置閒置臨界值為24小時。When the type of at least one first graphics processing unit is the first type, in step S302, the processor 130 sets the idle threshold value to the first time length. When the type of at least one first graphics processing unit is the second type, in step S303, the processor 130 sets the idle threshold value to the second time length. In some embodiments, the processor 130 may perform a table lookup based on the type of at least one first graphics processing unit to determine the time length of the idle threshold value. For example, when the type of the first graphics processing unit is the first type, the processor 130 may set the idle threshold value to 48 hours. When the type of the first graphics processing unit is the second type, the processor 130 may set the idle threshold value to 24 hours.

值得一提的是，於一些實施例中，對於對應至運算能力較為強大的GPU種類，處理器130可設置較短的閒置臨界值。也就是說，當第一圖形處理單元具有較強的運算能力，處理器130可設置較短的閒置臨界值。反之，當第一圖形處理單元具有較弱的運算能力，處理器130可設置較長的閒置臨界值。It is worth mentioning that in some embodiments, for a GPU type with stronger computing power, the processor 130 may set a shorter idle threshold. That is, when the first graphics processing unit has stronger computing power, the processor 130 may set a shorter idle threshold. On the contrary, when the first graphics processing unit has weaker computing power, the processor 130 may set a longer idle threshold.

另外，於一些實施例中，處理器130可根據至少一第一圖形處理單元的種類與數量決定閒置臨界值。閒置臨界值隨至少一第一圖形處理單元的數量的增加而減少。也就是說，當第一圖形處理單元的數量越多，處理器130決定的閒置臨界值越低。In addition, in some embodiments, the processor 130 may determine the idle threshold value according to the type and quantity of at least one first graphics processing unit. The idle threshold value decreases as the quantity of at least one first graphics processing unit increases. In other words, when the quantity of the first graphics processing unit increases, the idle threshold value determined by the processor 130 is lower.

圖4是依照本發明一實施例的決定閒置臨界值的流程圖。請參照圖4，於一些實施例中，圖2的步驟S220可實施為步驟S401至步驟S404。FIG4 is a flow chart of determining an idle threshold according to an embodiment of the present invention. Referring to FIG4, in some embodiments, step S220 of FIG2 can be implemented as steps S401 to S404.

於步驟S401，處理器130判斷至少一第一圖形處理單元的數量。接著，處理器130可根據至少一第一圖形處理單元的種類與數量決定所述閒置臨界值。進一步來說。於步驟S402，處理器130判斷至少一第一圖形處理單元的種類為第一種類或第二種類。In step S401, the processor 130 determines the number of at least one first graphics processing unit. Then, the processor 130 may determine the idle threshold value according to the type and number of at least one first graphics processing unit. Further, in step S402, the processor 130 determines whether the type of at least one first graphics processing unit is the first type or the second type.

當至少一第一圖形處理單元的種類為第一種類，於步驟S403，處理器130根據第一時間長度與至少一第一圖形處理單元的數量決定閒置臨界值。當至少一第一圖形處理單元的種類為第二種類，於步驟S404，處理器130根據第二時間長度與至少一第一圖形處理單元的數量決定閒置臨界值。其中，對應於第一種類的第一時間長度相異於對應於第二種類的第二時間長度。When the type of at least one first graphics processing unit is the first type, in step S403, the processor 130 determines an idle threshold value according to the first time length and the number of at least one first graphics processing unit. When the type of at least one first graphics processing unit is the second type, in step S404, the processor 130 determines an idle threshold value according to the second time length and the number of at least one first graphics processing unit. The first time length corresponding to the first type is different from the second time length corresponding to the second type.

舉例而言，假設處理器130配置屬於第一硬體型號的二個圖形處理單元110_1、110_2給用戶。處理器130可根據第一硬體型號決定閒置臨界值的初始時間長度為48小時。之後，由於圖形處理單元110_1、110_2的數量是2，則處理器130可將48小時除以2而獲取最終時間長度為24小時的閒置臨界值。或者，假設處理器130配置屬於第二硬體型號的三個圖形處理單元110_3、110_4、110_5給用戶。處理器130可根據第二硬體型號決定閒置臨界值的初始時間長度為24小時。之後，由於圖形處理單元110_3、110_4、110_5的數量是3，則處理器130可將48小時除以3而獲取最終時間長度為8小時的閒置臨界值。For example, assume that the processor 130 configures two graphics processing units 110_1 and 110_2 of a first hardware model for the user. The processor 130 may determine the initial time length of the idle threshold value to be 48 hours based on the first hardware model. Thereafter, since the number of graphics processing units 110_1 and 110_2 is 2, the processor 130 may divide 48 hours by 2 to obtain the idle threshold value with a final time length of 24 hours. Alternatively, assume that the processor 130 configures three graphics processing units 110_3, 110_4, and 110_5 of a second hardware model for the user. The processor 130 may determine the initial time length of the idle threshold value to be 24 hours based on the second hardware model. Afterwards, since the number of the graphics processing units 110_3, 110_4, and 110_5 is 3, the processor 130 may divide 48 hours by 3 to obtain the idle threshold value of a final time length of 8 hours.

又或者，假設處理器130配置屬於第一硬體型號的一個圖形處理單元110_1給用戶。處理器130可根據第一硬體型號決定閒置臨界值的初始時間長度為48小時。之後，由於圖形處理單元110_1的數量是1，則處理器130可將48小時除以1而獲取最終時間長度為48小時的閒置臨界值。再或者，假設處理器130配置屬於第二硬體型號的六個圖形處理單元110_3～110_8給用戶。處理器130可根據第二硬體型號決定閒置臨界值的初始時間長度為48小時。之後，由於圖形處理單元110_3～110_8的數量是6，則處理器130可將48小時除以6而獲取最終時間長度為8小時的閒置臨界值。Alternatively, assume that the processor 130 configures a graphics processing unit 110_1 belonging to the first hardware model for the user. The processor 130 may determine the initial time length of the idle threshold value to be 48 hours based on the first hardware model. Thereafter, since the number of graphics processing units 110_1 is 1, the processor 130 may divide 48 hours by 1 to obtain the idle threshold value with a final time length of 48 hours. Alternatively, assume that the processor 130 configures six graphics processing units 110_3 to 110_8 belonging to the second hardware model for the user. The processor 130 may determine the initial time length of the idle threshold value to be 48 hours based on the second hardware model. Afterwards, since the number of the graphics processing units 110_3 to 110_8 is 6, the processor 130 may divide 48 hours by 6 to obtain the idle threshold value with a final time length of 8 hours.

回到圖2，於步驟S230，處理器130檢查至少一第一圖形處理單元的閒置時間。接著，於步驟S240，處理器130判斷至少一第一圖形處理單元的閒置時間是否大於閒置臨界值。2 , in step S230 , the processor 130 checks the idle time of at least one first graphics processing unit. Then, in step S240 , the processor 130 determines whether the idle time of at least one first graphics processing unit is greater than an idle threshold.

於不同實施例中，處理器130可定時地或不定時地檢查至少一第一圖形處理單元的閒置時間。舉例而言，處理器130可每隔一小時（但不限制於此）就檢查配置給用戶的第一圖形處理單元的閒置時間。或者，處理器130可反應於判定GPU資源不足以配置給新用戶時就檢查配置給用戶的第一圖形處理單元的閒置時間。In various embodiments, the processor 130 may periodically or irregularly check the idle time of at least one first graphics processing unit. For example, the processor 130 may check the idle time of the first graphics processing unit allocated to the user every hour (but not limited thereto). Alternatively, the processor 130 may check the idle time of the first graphics processing unit allocated to the user in response to determining that the GPU resources are insufficient to be allocated to a new user.

於一些實施例中，處理器130可偵測至少一第一圖形處理單元的圖形處理單元使用率（GPU Usage）。圖形處理單元使用率可為一百分比值。當圖形處理單元使用率小於使用率臨界值，處理器130可持續累計至少一第一圖形處理單元的閒置時間。另一方面，當圖形處理單元使用率大於等於使用率臨界值，處理器130可將至少一第一圖形處理單元的閒置時間降低至一預設值。舉例而言，當圖形處理單元使用率大於等於使用率臨界值，處理器130可將至少一第一圖形處理單元的閒置時間歸零或減去一特定值。In some embodiments, the processor 130 may detect a graphics processing unit usage (GPU Usage) of at least one first graphics processing unit. The graphics processing unit usage may be a percentage value. When the graphics processing unit usage is less than a usage threshold value, the processor 130 may continue to accumulate the idle time of at least one first graphics processing unit. On the other hand, when the graphics processing unit usage is greater than or equal to the usage threshold value, the processor 130 may reduce the idle time of at least one first graphics processing unit to a preset value. For example, when the graphics processing unit usage is greater than or equal to the usage threshold value, the processor 130 may return the idle time of at least one first graphics processing unit to zero or subtract a specific value.

舉例而言，圖5A與圖5B是依照本發明一實施例的檢查閒置時間的示意圖。請先參照圖5A，處理器130可於時間點T1配置一或多個第一圖形處理單元給用戶。接著，處理器130可每隔時間ΔT檢查各個第一圖形處理單元的閒置時間。於圖5A範例中，處理器130可於檢查時間點T2偵測某一第一圖形處理單元的圖形處理單元使用率為X1％。由於圖形處理單元使用率X1％小於使用率臨界值THu，則處理器130可將此第一圖形處理單元的閒置時間設置為ΔT。並且，處理器130將判斷第一圖形處理單元的閒置時間ΔT是否大於閒置臨界值。假設閒置時間ΔT未大於閒置臨界值。處理器130接著可於檢查時間點T3偵測某一第一圖形處理單元的圖形處理單元使用率為X2％。由於圖形處理單元使用率X2％小於使用率臨界值THu，則處理器130可持續累計第一圖形處理單元的閒置時間為2*ΔT。For example, FIG. 5A and FIG. 5B are schematic diagrams of checking idle time according to an embodiment of the present invention. Referring to FIG. 5A , the processor 130 may configure one or more first graphics processing units for a user at time point T1. Then, the processor 130 may check the idle time of each first graphics processing unit at intervals of ΔT. In the example of FIG. 5A , the processor 130 may detect that the graphics processing unit utilization rate of a certain first graphics processing unit is X1% at the checking time point T2. Since the graphics processing unit utilization rate X1% is less than the utilization rate threshold value THu, the processor 130 may set the idle time of this first graphics processing unit to ΔT. Furthermore, the processor 130 will determine whether the idle time ΔT of the first graphics processing unit is greater than the idle threshold value. Assuming that the idle time ΔT is not greater than the idle threshold value. The processor 130 can then detect that the graphics processing unit utilization rate of a certain first graphics processing unit is X2% at the check time point T3. Since the graphics processing unit utilization rate X2% is less than the utilization rate threshold value THu, the processor 130 can continue to accumulate the idle time of the first graphics processing unit as 2*ΔT.

請再參照圖5B，處理器130可於時間點T1配置一或多個第一圖形處理單元給用戶。接著，處理器130可每隔時間ΔT檢查各個第一圖形處理單元的閒置時間。於圖5B範例中，處理器130可於檢查時間點T2偵測某一第一圖形處理單元的圖形處理單元使用率為X1％。由於圖形處理單元使用率X1％小於使用率臨界值THu，則處理器130可將此第一圖形處理單元的閒置時間設置為ΔT。並且，處理器130將判斷第一圖形處理單元的閒置時間ΔT是否大於閒置臨界值。假設閒置時間ΔT未大於閒置臨界值。處理器130接著可於檢查時間點T3偵測某一第一圖形處理單元的圖形處理單元使用率為X3％。由於圖形處理單元使用率X3％大於使用率臨界值THu，則處理器130可將第一圖形處理單元的閒置時間歸零。Please refer to Figure 5B again. The processor 130 can configure one or more first graphics processing units for the user at time point T1. Then, the processor 130 can check the idle time of each first graphics processing unit at intervals of time ΔT. In the example of Figure 5B, the processor 130 can detect that the graphics processing unit utilization rate of a certain first graphics processing unit is X1% at the checking time point T2. Since the graphics processing unit utilization rate X1% is less than the utilization rate threshold value THu, the processor 130 can set the idle time of this first graphics processing unit to ΔT. In addition, the processor 130 will determine whether the idle time ΔT of the first graphics processing unit is greater than the idle threshold value. Assume that the idle time ΔT is not greater than the idle threshold value. The processor 130 may then detect that the GPU usage rate of a first GPU is X3% at the checking time point T3. Since the GPU usage rate X3% is greater than the usage rate threshold THu, the processor 130 may reset the idle time of the first GPU to zero.

回到圖2，當至少一第一圖形處理單元的閒置時間大於閒置臨界值（步驟S240判斷為是），於步驟S250，處理器130釋放至少一第一圖形處理單元。舉例而言，假設處理器130配置屬於第一硬體型號的二個圖形處理單元110_1、110_2給用戶。處理器130可跟前述說明而獲取最終時間長度為24小時的閒置臨界值。處理器130可分別判斷圖形處理單元110_1、110_2的閒置時間是否大於24小時。當圖形處理單元110_1、110_2的閒置時間都大於24小時，處理器130可釋放圖形處理單元110_1、110_2。換言之，當至少一第一圖形處理單元的閒置時間大於閒置臨界值處理器130可關閉用戶的虛擬機或容器，以釋放第一圖形處理單元。基此，本發明實施例可回收GPU閒置資源，以實現更高的效能和達成更節約的資源使用。Returning to FIG. 2 , when the idle time of at least one first graphics processing unit is greater than the idle threshold value (step S240 determines yes), in step S250, the processor 130 releases at least one first graphics processing unit. For example, assume that the processor 130 configures two graphics processing units 110_1 and 110_2 of a first hardware model for a user. The processor 130 can obtain the idle threshold value with a final time length of 24 hours according to the above description. The processor 130 can determine whether the idle time of the graphics processing units 110_1 and 110_2 is greater than 24 hours. When the idle time of the graphics processing units 110_1 and 110_2 is greater than 24 hours, the processor 130 may release the graphics processing units 110_1 and 110_2. In other words, when the idle time of at least one first graphics processing unit is greater than the idle threshold, the processor 130 may shut down the user's virtual machine or container to release the first graphics processing unit. Based on this, the embodiment of the present invention can recycle idle GPU resources to achieve higher performance and more economical resource usage.

圖6是依照本發明一實施例的圖形處理單元資源管理方法的流程圖，而圖6的方法流程可以由圖1的運算平台系統10的各元件實現。請同時參照圖1及圖6，以下即搭配圖1中運算平台系統10的各項元件，說明本實施例的圖形處理單元資源管理方法的步驟。FIG6 is a flow chart of a method for managing graphics processing unit resources according to an embodiment of the present invention, and the method flow of FIG6 can be implemented by various components of the computing platform system 10 of FIG1. Please refer to FIG1 and FIG6 at the same time, and the steps of the method for managing graphics processing unit resources of this embodiment will be described below in conjunction with various components of the computing platform system 10 of FIG1.

於步驟S610，處理器130配置多個圖形處理單元中的至少一第一圖形處理單元給一用戶。於步驟S620，處理器130決定至少一第一圖形處理單元對應的閒置臨界值。於步驟S630，處理器130檢查至少一第一圖形處理單元的閒置時間。於步驟S640，處理器130判斷每個第一圖形處理單元的閒置時間是否大於閒置臨界值。上述步驟的相關細節可參照前述實施例說明，於此不贅述。In step S610, the processor 130 allocates at least one first graphics processing unit among a plurality of graphics processing units to a user. In step S620, the processor 130 determines an idle threshold corresponding to the at least one first graphics processing unit. In step S630, the processor 130 checks the idle time of the at least one first graphics processing unit. In step S640, the processor 130 determines whether the idle time of each first graphics processing unit is greater than the idle threshold. The relevant details of the above steps can be referred to the description of the aforementioned embodiment, and are not repeated here.

於本實施例中，當第一圖形處理單元其中至少一者的閒置時間未大於閒置臨界值，處理器130決定不釋放至少一第一圖形處理單元，並回到步驟S630。換言之，當第一圖形處理單元的數量大於1，只要有一個第一圖形處理單元的閒置時間未大於閒置臨界值，處理器130決定不釋放至少一第一圖形處理單元。In this embodiment, when the idle time of at least one of the first graphics processing units is not greater than the idle threshold, the processor 130 decides not to release the at least one first graphics processing unit and returns to step S630. In other words, when the number of first graphics processing units is greater than 1, as long as the idle time of at least one first graphics processing unit is not greater than the idle threshold, the processor 130 decides not to release the at least one first graphics processing unit.

另一方面，當至少一第一圖形處理單元其中每一者的閒置時間大於閒置臨界值（步驟S640判斷為是），於步驟S650，處理器130儲存用戶的虛擬機或容器的工作環境狀態。接著，於步驟S660，處理器130釋放至少一第一圖形處理單元。On the other hand, when the idle time of each of the at least one first graphics processing unit is greater than the idle threshold value (step S640 determines that it is yes), in step S650, the processor 130 stores the working environment state of the user's virtual machine or container. Then, in step S660, the processor 130 releases the at least one first graphics processing unit.

具體來說，處理器130可在關閉用戶的虛擬機或容器之前，將用戶的虛擬機或容器的工作環境狀態記錄下來。於一些實施例中，處理器130可在決定釋放第一圖形處理單元的時候，產生虛擬機的虛擬機快照（Snapshot）或容器的容器快照或容器映像來儲存用戶的虛擬機或容器的工作環境狀態。如此一來，用戶後續要可快速地於運算平台系統10重建相似的工作環境狀態。Specifically, the processor 130 may record the working environment state of the user's virtual machine or container before shutting down the user's virtual machine or container. In some embodiments, the processor 130 may generate a virtual machine snapshot of the virtual machine or a container snapshot or container image of the container to store the working environment state of the user's virtual machine or container when deciding to release the first graphics processing unit. In this way, the user can quickly rebuild a similar working environment state on the computing platform system 10 later.

綜上所述，於本發明的實施例中，用以決定是否釋放圖形處理單元資源的閒置臨界值是根據配置給用戶的圖形處理單元而彈性決定。當配置給用戶的圖形處理單元的閒置時間大於對應的閒置臨界值，可釋放圖形處理單元資源。基此，不僅可有效提昇圖形處理單元資源的利用率並減少圖形處理單元資源的浪費。此外，在釋放圖形處理單元之前，本發明實施例可保留用戶的工作環境狀態，以節省用戶重建工作環境的時間，而提昇使用者體驗。In summary, in the embodiment of the present invention, the idle threshold value used to determine whether to release the graphics processing unit resources is flexibly determined according to the graphics processing unit configured for the user. When the idle time of the graphics processing unit configured for the user is greater than the corresponding idle threshold value, the graphics processing unit resources can be released. Based on this, not only can the utilization rate of the graphics processing unit resources be effectively improved, but also the waste of the graphics processing unit resources can be reduced. In addition, before releasing the graphics processing unit, the embodiment of the present invention can retain the user's work environment status to save the user's time in rebuilding the work environment and improve the user experience.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed as above by the embodiments, they are not intended to limit the present invention. Any person with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be defined by the scope of the attached patent application.

10:運算平台系統 110_1～110_N:圖形處理單元 120:儲存裝置 130:處理器 S210～S250,S301～S303,S401～S404,S710～S660:步驟10: Computing platform system 110_1~110_N: Graphics processing unit 120: Storage device 130: Processor S210~S250, S301~S303, S401~S404, S710~S660: Steps

圖1是依照本發明一實施例的運算平台系統的方塊圖。圖2是依照本發明一實施例的圖形處理單元資源管理方法的流程圖。圖3是依照本發明一實施例的決定閒置臨界值的流程圖。圖4是依照本發明一實施例的決定閒置臨界值的流程圖。圖5A與圖5B是依照本發明一實施例的檢查閒置時間的示意圖。圖6是依照本發明一實施例的圖形處理單元資源管理方法的流程圖。 FIG. 1 is a block diagram of a computing platform system according to an embodiment of the present invention. FIG. 2 is a flow chart of a graphics processing unit resource management method according to an embodiment of the present invention. FIG. 3 is a flow chart of determining an idle threshold value according to an embodiment of the present invention. FIG. 4 is a flow chart of determining an idle threshold value according to an embodiment of the present invention. FIG. 5A and FIG. 5B are schematic diagrams of checking idle time according to an embodiment of the present invention. FIG. 6 is a flow chart of a graphics processing unit resource management method according to an embodiment of the present invention.

S210~S250:步驟 S210~S250: Steps

Claims

A method for managing graphics processing unit resources is applicable to a computing platform system including multiple graphics processing units, the method comprising: Allocating at least one first graphics processing unit among the multiple graphics processing units to a user; Determining an idle threshold corresponding to the at least one first graphics processing unit; Checking the idle time of the at least one first graphics processing unit; and When the idle time of the at least one first graphics processing unit is greater than the idle threshold, releasing the at least one first graphics processing unit.

The method for managing graphics processing unit resources as described in claim 1, wherein the step of determining the idle threshold corresponding to the at least one first graphics processing unit comprises: Determining the idle threshold according to the type of the at least one first graphics processing unit, wherein when the at least one first graphics processing unit belongs to the first type, the idle threshold is a first time length; when the at least one first graphics processing unit belongs to the second type, the idle threshold is a second time length.

A graphics processing unit resource management method as described in claim 2, wherein the type of the at least one first graphics processing unit includes a hardware model or a computing performance level.

The method for managing graphics processing unit resources as described in claim 2, wherein the step of determining the idle threshold value according to the type of the at least one first graphics processing unit comprises: Determining the number of the at least one first graphics processing unit; and Determining the idle threshold value according to the type and number of the at least one first graphics processing unit.

A graphics processing unit resource management method as described in claim 4, wherein the idle threshold decreases as the number of the at least one first graphics processing unit increases.

The method for managing graphics processing unit resources as described in claim 1, wherein before the step of releasing the at least one first graphics processing unit when the idle time of the at least one first graphics processing unit is greater than the idle threshold, the method further comprises: When the idle time of the at least one first graphics processing unit is greater than the idle threshold, storing the working environment state of the user's virtual machine or container.

The graphics processing unit resource management method as described in claim 1, wherein the step of checking the idle time of the at least one first graphics processing unit comprises: Detecting the graphics processing unit utilization rate of the at least one first graphics processing unit; and When the graphics processing unit utilization rate is less than the utilization rate threshold value, continuously accumulating the idle time of the at least one first graphics processing unit.

The graphics processing unit resource management method as described in claim 7, wherein the step of checking the idle time of the at least one first graphics processing unit further includes: When the usage rate of the graphics processing unit is greater than or equal to the usage rate threshold value, reducing the idle time of the at least one first graphics processing unit to a preset value.

The method for managing graphics processing unit resources as described in claim 1, wherein the number of the at least one first graphics processing unit is greater than 1, and when the idle time of the at least one first graphics processing unit is greater than the idle threshold value, the step of releasing the at least one first graphics processing unit comprises: When the idle time of each of the at least one first graphics processing unit is greater than the idle threshold value, releasing the at least one first graphics processing unit; and When the idle time of at least one of the at least one first graphics processing unit is not greater than the idle threshold value, deciding not to release the at least one first graphics processing unit.

A computing platform system includes: a plurality of graphics processing units, a storage device recording a plurality of instructions; and at least one processor, coupling the plurality of graphics processing units and the storage device, configured to: configure at least one first graphics processing unit among the plurality of graphics processing units to a user; determine an idle threshold corresponding to the at least one first graphics processing unit; check the idle time of the at least one first graphics processing unit; and release the at least one first graphics processing unit when the idle time of the at least one first graphics processing unit is greater than the idle threshold.