JP2024545660A

JP2024545660A - In-line interruption of acceleration processing units

Info

Publication number: JP2024545660A
Application number: JP2024535496A
Authority: JP
Inventors: フアドアシュカーアレクサンダー; ピー．ニジャシャーマンゲッシュ; ゼット．クライシャラカン; ラストギマヌ
Original assignee: Advanced Micro Devices Inc
Current assignee: Advanced Micro Devices Inc
Priority date: 2021-12-28
Filing date: 2022-11-18
Publication date: 2024-12-10
Also published as: US12056787B2; KR20240124330A; WO2023129301A1; EP4457616A1; CN118475915A; US20230206379A1

Abstract

A method and system for in-line suspend of an accelerated processing unit (APU) is disclosed. The technique includes receiving a packet containing an operational mode and a command to be executed by the APU, suspending execution of the command received in the previous packet if the operational mode is a suspend start mode, and the APU executing the command in the received packet. Execution of the suspended command is resumed if the operational mode in the next received packet is a suspend end mode.
[Selected figure] Figure 3

Description

（関連出願の相互参照）
本願は、２０２１年１２月２８日に出願された米国特許出願第１７／５６４，０４９号の利益を主張するものであり、その内容は、参照により本明細書に組み込まれる。 CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Patent Application No. 17/564,049, filed December 28, 2021, the contents of which are incorporated herein by reference.

限られた時間内に大量のデータを処理することを必要とするプロセッサは、１つ以上の加速処理ユニット（accelerated processing units、ＡＰＵ）を利用することができる。ＡＰＵを使用する場合、従来、プロセッサは、ＡＰＵによって実行されるコマンドを送信すること、及び、ＡＰＵからコマンド完了肯定応答（command completion acknowledgments）を受信することを含む、ＡＰＵの動作を制御している。通常、ＡＰＵのコンピューティングリソースは、１つ以上のプロセッサ上で実行される複数のアプリケーションによって共有される。アプリケーションが高い優先度で集中的なワークロードの実行を必要とする場合、アプリケーションがＡＰＵの計算リソースをその独占的使用のために確保することを可能にする機能は貴重である。しかしながら、別のワークロードを優先してＡＰＵの現在のワークロードを中断し、次いで再開することは、通常、ＡＰＵとプロセッサとの間の通信を必要とするプロセッサの関与を必要とする。プロセッサとＡＰＵとの間のそのような往復通信は、ＡＰＵによるワークロード実行時間の予測可能性を損なう。 Processors that need to process large amounts of data within a limited time can utilize one or more accelerated processing units (APUs). When using an APU, traditionally the processor controls the operation of the APU, including sending commands to be executed by the APU and receiving command completion acknowledgments from the APU. Typically, the computing resources of the APU are shared by multiple applications running on one or more processors. When an application requires the execution of an intensive workload with high priority, the ability to allow the application to reserve the computing resources of the APU for its exclusive use is valuable. However, suspending the current workload of the APU in favor of another workload and then resuming it requires the involvement of the processor, which typically requires communication between the APU and the processor. Such back-and-forth communication between the processor and the APU reduces the predictability of the workload execution time by the APU.

添付の図面と共に例として与えられる以下の説明から、より詳細な理解を得ることができる。 A more detailed understanding can be had from the following description, given by way of example in conjunction with the accompanying drawings, in which:

本開示の１つ以上の特徴を実装することができる例示的なデバイスのブロック図である。FIG. 1 is a block diagram of an example device capable of implementing one or more features of the present disclosure. 本開示の１つ以上の特徴が実装され得ることに基づく、図１Ａのデバイスによって使用可能なＡＰＵを示す例示的なシステムのブロック図である。FIG. 1B is a block diagram of an example system illustrating an APU usable by the device of FIG. 1A, according to which one or more features of the present disclosure may be implemented. 本開示の１つ以上の特徴を実装することができるＡＰＵのインライン中断を示す例示的なシステムの機能ブロック図である。FIG. 1 is a functional block diagram of an example system illustrating in-line interruption of an APU capable of implementing one or more features of the present disclosure. 本開示の１つ以上の特徴を実装することができるＡＰＵのインライン中断のための例示的な方法のフローチャートである。1 is a flowchart of an example method for in-line interruption of an APU that can implement one or more features of the present disclosure.

ＡＰＵのインライン中断（inline suspension）のためのシステム及び方法が開示される。プロセッサによってＡＰＵにパケットで送信されたコマンドでそれぞれの動作モードをインラインすることによって、ＡＰＵによって処理されているワークロードの中断、次いで再開をトリガするための技術が開示される。このようにしてＡＰＵを中断及び再開する能力は、プロセッサの関与なしにＡＰＵのコンピューティングリソースを排他的に利用するための高い優先度及び集中的なワークロードを可能にし、ひいては予測可能なワークロード実行時間を可能にする。 A system and method for inline suspension of an APU is disclosed. Techniques are disclosed for triggering the suspension and then resumption of a workload being processed by an APU by inlining the respective operating modes with commands sent in packets by the processor to the APU. The ability to suspend and resume an APU in this manner allows high priority and intensive workloads to exclusively utilize the computing resources of the APU without processor involvement, thus allowing predictable workload execution times.

本願において開示される態様は、ＡＰＵのインライン中断のための方法を説明する。この方法は、ＡＰＵによって実行される動作モード及びコマンドを含むパケットを受信することと、動作モードが中断開始モード（suspension initiation mode）であることに応じて、前のパケット内で受信されたコマンドの実行を中断することと、ＡＰＵによって、受信されたパケット内のコマンドを実行することと、を含む。方法は、動作モードが中断終了モードであることに応じて、中断されたコマンドの実行を再開することを更に含む。 Aspects disclosed herein describe a method for in-line suspension of an APU. The method includes receiving a packet including an operational mode and a command to be executed by the APU, suspending execution of the command received in the previous packet in response to the operational mode being a suspension initiation mode, and executing, by the APU, the command in the received packet. The method further includes resuming execution of the suspended command in response to the operational mode being a suspension termination mode.

また、本願に開示される態様は、ＡＰＵのインライン中断のためのシステムを説明する。システムは、少なくとも１つのプロセッサと、命令を記憶するメモリと、を備える。命令は、少なくとも１つのプロセッサによって実行されると、システムに、動作モード及びＡＰＵによって実行されるコマンドを含むパケットを受信させ、動作モードが中断開始モードであることに応じて、前のパケット内で受信されたコマンドの実行を中断し、ＡＰＵによって、受信されたパケット内のコマンドを実行させる。更に、命令は、システムに、動作モードが中断終了モードであることに応じて、中断されたコマンドの実行を再開させる。 Aspects disclosed herein also describe a system for in-line interruption of an APU. The system includes at least one processor and a memory that stores instructions. The instructions, when executed by the at least one processor, cause the system to receive a packet including an operating mode and a command to be executed by the APU, and, in response to the operating mode being an interrupt start mode, interrupt execution of the command received in the previous packet, and cause the APU to execute the command in the received packet. The instructions further cause the system to resume execution of the interrupted command in response to the operating mode being an interrupt end mode.

更に、本願に開示される態様は、動作モード及びＡＰＵによって実行されるコマンドを含むパケットを受信することと、動作モードが中断開始モードであることに応じて、前のパケット内で受信されたコマンドの実行を中断することと、ＡＰＵによって、受信されたパケット内のコマンドを実行することと、が可能なＡＰＵのインライン中断を実行するように適合されたＡＰＵを記述するハードウェア記述言語命令を含む非一時的なコンピュータ可読記憶媒体を記載する方法は、動作モードが中断終了モードであることに応じて、中断されたコマンドの実行を再開することを更に含む。 Additionally, aspects disclosed herein include a method for describing a non-transitory computer-readable storage medium including hardware description language instructions describing an APU adapted to perform an in-line suspend of the APU capable of receiving a packet including an operating mode and a command to be executed by the APU, suspending execution of the command received in the previous packet in response to the operating mode being a suspend start mode, and executing, by the APU, the command in the received packet, further including resuming execution of the suspended command in response to the operating mode being a suspend end mode.

図１Ａは、本開示の１つ以上の特徴を実装することができる例示的なデバイス１００Ａのブロック図である。デバイス１００Ａは、例えば、コンピュータ、ゲーミングデバイス、ハンドヘルドデバイス、セットトップボックス、テレビ、携帯電話、又は、タブレットコンピュータであり得る。デバイス１００Ａは、プロセッサ１０２、ＡＰＵ１０６、メモリ１０４、ストレージ１１６、入力デバイス１０８、及び、出力デバイス１１０を含む。また、デバイス１００Ａは、入力ドライバ１１２及び出力ドライバ１１４を含み得る。一態様では、デバイス１００Ａは、図１Ａに示されていない追加の構成要素を含み得る。 1A is a block diagram of an example device 100A that can implement one or more features of the present disclosure. Device 100A can be, for example, a computer, a gaming device, a handheld device, a set-top box, a television, a mobile phone, or a tablet computer. Device 100A includes a processor 102, an APU 106, a memory 104, a storage 116, an input device 108, and an output device 110. Device 100A can also include an input driver 112 and an output driver 114. In one aspect, device 100A can include additional components not shown in FIG. 1A.

プロセッサ１０２は、中央処理ユニット（ＣＰＵ）又はＣＰＵの１つ以上のコアを含むことができる。ＡＰＵ１０６は、高度並列処理ユニット、グラフィックス処理ユニット（ＧＰＵ）、又は、それらの組み合わせを表すことができる。プロセッサ１０２及びＡＰＵ１０６は、同じダイ上又は別のダイ上に位置し得る。メモリ１０４は、プロセッサ１０２と同じダイ上に位置し得るか、又は、プロセッサ１０２とは別に位置し得る。メモリ１０４は、揮発性又は不揮発性メモリ（例えば、ランダムアクセスメモリ（ＲＡＭ）、ダイナミックＲＡＭ（ＤＲＡＭ）、キャッシュ、又は、これらの組み合わせ）を含むことができる。 The processor 102 may include a central processing unit (CPU) or one or more cores of a CPU. The APU 106 may represent a highly parallel processing unit, a graphics processing unit (GPU), or a combination thereof. The processor 102 and the APU 106 may be located on the same die or on separate dies. The memory 104 may be located on the same die as the processor 102 or may be located separately from the processor 102. The memory 104 may include volatile or non-volatile memory (e.g., random access memory (RAM), dynamic RAM (DRAM), cache, or a combination thereof).

ストレージ１１６は、固定又はリムーバブルストレージ（例えば、ハードディスクドライブ、ソリッドステートドライブ、光ディスク、フラッシュドライブ）を含むことができる。入力デバイス１０８は、キーボード、キーパッド、タッチスクリーン、タッチパッド、検出器、マイクロフォン、加速度計、ジャイロスコープ、生体認証スキャナ、又は、ネットワーク接続（例えば、無線ＩＥＥＥ８０２信号の受信のための無線ローカルエリアネットワークカード）等の１つ以上の入力デバイスを表すことができる。出力デバイス１１０は、ディスプレイ、スピーカ、プリンタ、触覚フィードバックデバイス、１つ以上の光、アンテナ、又は、ネットワーク接続（例えば、無線ＩＥＥＥ８０２信号の送信のための無線ローカルエリアネットワークカード）等の１つ以上の出力デバイスを表すことができる。 Storage 116 may include fixed or removable storage (e.g., hard disk drive, solid state drive, optical disk, flash drive). Input device 108 may represent one or more input devices such as a keyboard, keypad, touch screen, touch pad, detector, microphone, accelerometer, gyroscope, biometric scanner, or network connection (e.g., wireless local area network card for receiving wireless IEEE 802 signals). Output device 110 may represent one or more output devices such as a display, speaker, printer, haptic feedback device, one or more lights, antennas, or network connection (e.g., wireless local area network card for transmitting wireless IEEE 802 signals).

入力ドライバ１１２は、プロセッサ１０２及び入力デバイス１０８と通信し、入力デバイス１０８からプロセッサ１０２への入力の受信を容易にする。出力ドライバ１１４は、プロセッサ１０２及び出力デバイス１１０と通信し、プロセッサ１０２から出力デバイス１１０への出力の送信を容易にする。一態様では、入力ドライバ１１２及び出力ドライバ１１４は、オプションの構成要素であり、デバイス１００Ａは、入力ドライバ１１２及び出力ドライバ１１４が存在しない場合に、同じ方式で動作することができる。 The input driver 112 communicates with the processor 102 and the input device 108 and facilitates receiving input from the input device 108 to the processor 102. The output driver 114 communicates with the processor 102 and the output device 110 and facilitates sending output from the processor 102 to the output device 110. In one aspect, the input driver 112 and the output driver 114 are optional components and the device 100A can operate in the same manner in the absence of the input driver 112 and the output driver 114.

ＡＰＵ１０６は、プロセッサ１０２から計算コマンド及びグラフィックスレンダリングコマンドを受け入れて、それらの計算及びグラフィックスレンダリングコマンドを処理し、及び／又は、ディスプレイ（出力デバイス１１０）に出力を提供するように構成され得る。以下で更に詳細に説明するように、ＡＰＵ１０６は、例えば、単一命令複数データ（ＳＩＭＤ）パラダイムに従って計算を実行するように構成された１つ以上の並列処理ユニットを含むことができる。したがって、様々な機能が、本明細書では、ＡＰＵ１０６によって又はＡＰＵ１０６と併せて実行されるものとして説明されているが、様々な代替例では、ＡＰＵ１０６によって実行されるものとして説明される機能は、ホストプロセッサ（例えば、プロセッサ１０２）によってドライブされず、例えば、グラフィカル出力をディスプレイに提供するように構成することができる同様の能力を有する他のコンピューティングデバイスによって追加的又は代替的に実行され得る。処理システムがＳＩＭＤパラダイムに従って処理タスクを実行することができるかどうかにかかわらず、処理システムは、本明細書で説明される機能を実行するように構成され得る。 The APU 106 may be configured to accept computational and graphics rendering commands from the processor 102, process those computational and graphics rendering commands, and/or provide output to a display (output device 110). As described in more detail below, the APU 106 may include one or more parallel processing units configured to perform computations, for example, according to a single instruction multiple data (SIMD) paradigm. Thus, although various functions are described herein as being performed by or in conjunction with the APU 106, in various alternative examples, the functions described as being performed by the APU 106 may additionally or alternatively be performed by other computing devices having similar capabilities that are not driven by a host processor (e.g., the processor 102) and that may be configured, for example, to provide graphical output to a display. Regardless of whether the processing system is capable of performing processing tasks according to the SIMD paradigm, the processing system may be configured to perform the functions described herein.

図１Ｂは、本開示の１つ以上の特徴を実装することができる図１Ａのデバイスによって使用可能な加速システムを示す例示的なシステム１００Ｂのブロック図である。図１Ｂは、ＡＰＵ１０６上での処理タスクの実行を更に詳細に示す。プロセッサ１０２は、メモリ１０４内で、プロセッサ１０２による実行のための１つ以上のモジュールを維持することができる。モジュールは、オペレーティングシステム１２０、ドライバ１２２、及び、アプリケーション１２６を含む。これらのモジュールは、プロセッサ１０２及びＡＰＵ１０６の動作の様々な特徴を制御することができる。例えば、オペレーティングシステム１２０は、システムコール、すなわち、アプリケーションプログラミングインターフェース（ＡＰＩ）を提供することができ、これは、アプリケーション１２６によって採用され、ハードウェアと直接インターフェースすることができる。ドライバ１２２は、例えば、プロセッサ１０２上で実行されるアプリケーション１２６にＡＰＩを提供して、ＡＰＵ１０６の様々な機能にアクセスすることによって、ＡＰＵ１０６の動作を制御することができる。 FIG. 1B is a block diagram of an exemplary system 100B illustrating an acceleration system usable by the device of FIG. 1A that can implement one or more features of the present disclosure. FIG. 1B illustrates the execution of processing tasks on the APU 106 in further detail. The processor 102 can maintain in the memory 104 one or more modules for execution by the processor 102. The modules include an operating system 120, a driver 122, and an application 126. These modules can control various aspects of the operation of the processor 102 and the APU 106. For example, the operating system 120 can provide system calls, i.e., application programming interfaces (APIs), that can be employed by the application 126 to interface directly with the hardware. The driver 122 can control the operation of the APU 106, for example, by providing the APIs to the application 126 running on the processor 102 to access various functions of the APU 106.

ＡＰＵ１０６は、並列処理又は逐次処理の何れか、及び、順序処理又は非順序処理の何れかを含む、グラフィックス動作及び非グラフィックス動作に関連するコマンドを実行することができる。ＡＰＵ１０６は、プロセッサ１０２から受信したコマンドに基づいて、ピクセル及び／又は幾何学計算を処理する動作（例えば、ディスプレイ（出力デバイス１１０）への画像のレンダリング）等のグラフィックスパイプライン動作を実行するために使用することができる。また、ＡＰＵ１０６は、プロセッサ１０２から受信したコマンドに基づいて、多次元データ、物理シミュレーション、計算流体力学、又は、他の計算タスクの処理に関連する動作等のように、グラフィックス動作に関連しない処理動作を実行することができる。代替的な実施形態では、ＡＰＵ１０６は、信号処理動作を実行することができ（例えば、ＡＰＵ１０６は、デジタル信号プロセッサ又はＤＳＰにおいて具現化され得る）、ビットストリームによって構成されたフィールドプログラマブルゲートアレイ（ＦＰＧＡ）の使用を通じて加速動作を実行することができ、ニューラル処理ユニット（又はＮＰＵ）の使用を通じてニューラル処理動作を実行することができ、又は、プロセッサ１０２よりもむしろ加速処理ユニット（ＡＰＵ）の使用を通じてより効率的に実行され得る他の動作を実行することができる。 APU 106 can execute commands related to graphics and non-graphics operations, including either parallel or sequential processing, and either sequential or non-sequential processing. APU 106 can be used to perform graphics pipeline operations, such as operations that process pixels and/or geometry calculations (e.g., rendering an image to a display (output device 110)) based on commands received from processor 102. APU 106 can also execute processing operations not related to graphics operations, such as operations related to processing multi-dimensional data, physics simulations, computational fluid dynamics, or other computational tasks based on commands received from processor 102. In alternative embodiments, APU 106 may perform signal processing operations (e.g., APU 106 may be embodied in a digital signal processor or DSP), may perform accelerated operations through the use of a bitstream-configured field programmable gate array (FPGA), may perform neural processing operations through the use of a neural processing unit (or NPU), or may perform other operations that may be more efficiently performed through the use of an accelerated processing unit (APU) rather than the processor 102.

ＡＰＵ１０６は、ワークグループプロセッサ（ＷＧＰ）１３２．１～１３２．Ｍを含むことができ、各ＷＧＰ、例えば１３２．１は、ＳＩＭＤパラダイムに従って並列の方式で動作を実行することができる１つ以上のＳＩＭＤユニット、例えば１３８．１．１～１３８．１．Ｎを有することができる。ＳＩＭＤパラダイムは、複数の処理要素が単一のプログラム制御フローユニット及びプログラムカウンタを共有し、これにより、同じプログラムを、異なるデータで実行することができるものである。一例では、各ＳＩＭＤユニット、例えば１３８．１．１は、６４のレーン（すなわち、スレッド）を実行することができ、各レーンは、ＳＩＭＤユニット内の他のレーンと同時に同じ命令を実行するが、その命令を異なるデータで実行する。レーンは、全てのレーンが所定の命令を実行する必要がない場合等に、予測でオフに切り替えることができる。また、予測は、分岐制御フローを有するプログラムを実行するために使用することができる。具体的には、条件付き分岐（又は制御フローが個々のレーンによって実行される計算に基づいている他の命令）を有するプログラムについては、現在実行されていない制御フローパスに対応するレーンの予測及び異なる制御フローパスの直列実行が、任意の制御フローを可能にする。一態様では、ＷＧＰ１３２．１～１３２．Ｍの各々は、ローカルキャッシュを有することができる。別の態様では、複数のＷＧＰがキャッシュを共有することができる。 APU 106 may include workgroup processors (WGPs) 132.1-132.M, where each WGP, e.g., 132.1, may have one or more SIMD units, e.g., 138.1.1-138.1.N, that may execute operations in a parallel fashion according to the SIMD paradigm. The SIMD paradigm is one in which multiple processing elements share a single program control flow unit and program counter, thereby allowing the same program to be executed on different data. In one example, each SIMD unit, e.g., 138.1.1, may execute 64 lanes (i.e., threads), each lane executing the same instruction simultaneously with other lanes in the SIMD unit, but executing the instruction on different data. Lanes may be switched off with prediction, such as when not all lanes need to execute a given instruction. Prediction may also be used to execute programs with branching control flow. Specifically, for programs with conditional branches (or other instructions where control flow is based on calculations performed by individual lanes), prediction of lanes corresponding to currently not executed control flow paths and serial execution of different control flow paths allows arbitrary control flow. In one aspect, each of WGPs 132.1-132.M may have a local cache. In another aspect, multiple WGPs may share a cache.

ＷＧＰ、例えば１３２．１内の実行の基本的単位は、ワークアイテムである。通常は、各ワークアイテムは、特定のレーンにおいて並列に実行され得るプログラムの単一のインスタンシエイションを表す。ワークアイテムは、単一のＳＩＭＤ、例えば１３８．１．１上の「ウェーブフロント（wavefront）」（又は「ウェーブ（wave）」）として同時に実行され得る。１つ以上のウェーブは、ワークグループで実行されてもよく、各ウェーブは、同じプログラムを実行するように指定されたワークアイテムの集合体を含む。ワークグループは、ワークグループを構成するウェーブの各々を実行することによって実行される。また、ウェーブは、単一のＳＩＭＤユニット上で逐次実行され得るか、又は、異なるＳＩＭＤユニット１３８．１．１～１３８．１．Ｎ上で部分的に若しくは完全に並列に実行され得る。したがって、ウェーブは、単一のＳＩＭＤユニット、例えば、１３８．１．１上で同時に実行することができるワークアイテムの集合と考えることができる。したがって、プロセッサ１０２から受信されたコマンドが、プログラムが単一のＳＩＭＤユニット上で同時に実行させることができない程度に特定のプログラムが並列化されるべきであることを示す場合、そのプログラムは、２つ以上のＳＩＭＤユニット（例えば、１３８．１．１～１３８．１．Ｎ）上に並列化されるか、同じＳＩＭＤユニット（例えば、１３８．１．１）上で直列化されるか、又は、必要に応じて並列化と直列化との両方がされ得るウェーブに分けることができる。スケジューラ１３６は、異なるＷＧＰ１３２．１～１３２．Ｍ及びそれらのそれぞれのＳＩＭＤユニット上で様々なウェーブを開始することに関連する動作を実行するように構成され得る。 The basic unit of execution within a WGP, e.g., 132.1, is the work item. Typically, each work item represents a single instantiation of a program that can be executed in parallel on a particular lane. Work items can be executed simultaneously as a "wavefront" (or "wave") on a single SIMD, e.g., 138.1.1. One or more waves may be executed in a workgroup, each wave including a collection of work items designated to execute the same program. A workgroup is executed by executing each of the waves that make up the workgroup. Also, waves can be executed sequentially on a single SIMD unit, or partially or fully in parallel on different SIMD units 138.1.1-138.1.N. Thus, a wave can be thought of as a collection of work items that can be executed simultaneously on a single SIMD unit, e.g., 138.1.1. Thus, if commands received from processor 102 indicate that a particular program should be parallelized to the extent that the program cannot be run simultaneously on a single SIMD unit, the program may be split into waves that may be parallelized on two or more SIMD units (e.g., 138.1.1-138.1.N), serialized on the same SIMD unit (e.g., 138.1.1), or both parallelized and serialized as desired. Scheduler 136 may be configured to perform operations associated with initiating various waves on different WGPs 132.1-132.M and their respective SIMD units.

ＷＧＰ１３２．１～１３２．Ｍによって与えられる並列性は、例えば、ピクセル値に対する動作（例えば、フィルタ動作）、幾何学的データに対する動作（例えば、頂点変換）、及び、他のグラフィックス関連動作等のグラフィックス関連動作に好適である。例えば、プロセッサ１０２上で実行されるアプリケーション１２６は、ＡＰＵ１０６によって実行される計算を伴うことができる。アプリケーション１２６は、ドライバ１２２によって提供されるＡＰＩを使用して、処理コマンドをＡＰＵ１０６に発行することができる。次いで、処理コマンドは、スケジューラ１３６に提供される。スケジューラ１３６は、処理コマンドを、並列実行のためにＷＧＰ１３２．１～１３２．Ｍに割り当てられる計算タスクに変換する。例えば、スケジューラ１３６は、データ（例えば、画像の１０２４ピクセル）に対して実行されるべき命令を含む処理コマンドを受信し得る。それに応じて、スケジューラ１３６は、データをグループ（例えば、各グループが６４ピクセルの処理に必要なデータを含む）に分割し、１つ以上のＷＧＰにおいてウェーブを開始することができ、各ウェーブは、データのグループ及びデータに対して実行する命令に関連付けられる。例えば、スケジューラ１３６は、１つ以上のＷＧＰ１３２のＳＩＭＤ１３８において実行される１６個のウェーブ（例えば、各々が６４ピクセルの処理を担う）を開始することができる。 The parallelism provided by WGPs 132.1-132.M is well suited for graphics-related operations such as, for example, operations on pixel values (e.g., filter operations), operations on geometric data (e.g., vertex transformations), and other graphics-related operations. For example, an application 126 executing on processor 102 may involve computations performed by APU 106. Application 126 may issue processing commands to APU 106 using an API provided by driver 122. The processing commands are then provided to scheduler 136. Scheduler 136 converts the processing commands into computational tasks that are assigned to WGPs 132.1-132.M for parallel execution. For example, scheduler 136 may receive processing commands that include instructions to be performed on data (e.g., 1024 pixels of an image). In response, scheduler 136 may divide data into groups (e.g., each group containing data necessary to process 64 pixels) and initiate waves in one or more WGPs, with each wave associated with a group of data and an instruction to perform on the data. For example, scheduler 136 may initiate 16 waves (e.g., each responsible for processing 64 pixels) to be executed in SIMD 138 of one or more WGPs 132.

図２は、本開示の１つ以上の特徴を実装することができるＡＰＵのインライン中断を示す例示的なシステム２００の機能ブロック図である。システム２００は、プロセッサ２１０（例えば、図１Ａのプロセッサ１０２）、ＡＰＵ２１５（例えば、図１ＢのＡＰＵ１０６）、及び、メモリ２２０（例えば、図１Ａのメモリ１０４）を含む。ＡＰＵ２１５は、コマンドプロセッサ２５０と、シェーダスケジューラ２３０と、ＷＧＰ２４０．１～２４０．Ｎ（例えば、図１ＢのＷＧＰ１３２）を有するシェーダ２４０と、を含む。メモリ２２０は、それぞれメモリインターフェース２２２及び２２４を介して、プロセッサ２１０及びコマンドプロセッサ２５０によってアクセス可能である。プロセッサ２１０は、ユーザアプリケーション２１２及びドライバ２１４等のソフトウェアモジュールを実行するように構成され、それを介してアプリケーション２１２はコマンドプロセッサ２５０とインターフェースすることができる。したがって、コンピュータゲーム又はシミュレータ等のアプリケーション２１２は、ドライバ２１４によって提供されるアプリケーションプログラミングインターフェース（ＡＰＩ）を使用して、シェーダ２４０によって実行されるべき計算タスクを指定するコマンドをコマンドプロセッサ２５０に送ることができる。そのようなコマンドは、以下で更に説明するように、パケットフォーマットに従ってパケット内で配信される。 2 is a functional block diagram of an exemplary system 200 illustrating an in-line interruption of an APU that can implement one or more features of the present disclosure. The system 200 includes a processor 210 (e.g., processor 102 of FIG. 1A), an APU 215 (e.g., APU 106 of FIG. 1B), and a memory 220 (e.g., memory 104 of FIG. 1A). The APU 215 includes a command processor 250, a shader scheduler 230, and a shader 240 having WGPs 240.1-240.N (e.g., WGP 132 of FIG. 1B). The memory 220 is accessible by the processor 210 and the command processor 250 via memory interfaces 222 and 224, respectively. The processor 210 is configured to execute software modules, such as a user application 212 and a driver 214, through which the application 212 can interface with the command processor 250. Thus, an application 212, such as a computer game or simulator, can use an application programming interface (API) provided by the driver 214 to send commands to the command processor 250 that specify computational tasks to be performed by the shaders 240. Such commands are delivered in packets according to a packet format, as described further below.

コマンドプロセッサ２５０は、プロセッサ２１０上で動作するソフトウェアモジュールと、シェーダ２４０等のＡＰＵ２１５の実行（又は処理）エンジンとの間のインターフェースを提供するように構成されている。コマンドプロセッサ２５０は、フェッチャ２５５、ドアベル２６０、グラフィックスコマンドプロセッサ２７０、計算コマンドプロセッサ２８０、及び、キューマネージャ２９０等の機能構成要素を含むことができる。上述したように、ユーザアプリケーション２１２は、ドライバ２１４を介して、コマンドプロセッサ２５０に配信されるコマンドのパケットを生成することができる。配信の１つの機構は、（メモリインターフェース２２２を介して）メモリ２２０内のアプリケーション関連キューにこれらのパケットを記憶し、次いで、１つ以上の新しいパケットがメモリ内のそのキュー内で利用可能であることをコマンドプロセッサのドアベル２６０に信号を送ることによるものであり得る。そのような信号の受信に応じて、ドアベルは、フェッチャ２５５による１つ以上の新しいパケットの読み取りをトリガするように構成されている。次いで、フェッチャ２５５は、（メモリインターフェース２２４を介して）メモリ２２０内のキューからパケットを読み取り、読み取られたパケットを先入れ先出し（ＦＩＦＯ）順序でパケットキュー２５７、２５８内にプッシュする。描画（グラフィックス）コマンドを含むパケットはキュー２５７に記憶され、ディスパッチ計算コマンドを含むパケットはキュー２５８に記憶される。グラフィックスコマンドプロセッサ２７０及び計算コマンドプロセッサ２８０は、パケットが到着すると、それぞれキュー２５７及び２５８からパケットをポップアウトするように構成されている。複数のアプリケーション２１２（例えば、ホスト２１０上で同時に実行される）がそれぞれのパケットを生成する場合、各アプリケーションに関連付けられたパケットは、メモリ内のそれぞれのキューに記憶されてもよく、フェッチャ２５５は、メモリ２２０内の各キューからパケットを読み出し、読み出したパケットをそれぞれのパケットキュー２５７、２５８にプッシュするように構成されている。 The command processor 250 is configured to provide an interface between software modules running on the processor 210 and the execution (or processing) engines of the APU 215, such as the shader 240. The command processor 250 may include functional components such as the fetcher 255, the doorbell 260, the graphics command processor 270, the compute command processor 280, and the queue manager 290. As described above, the user application 212, via the driver 214, may generate packets of commands that are delivered to the command processor 250. One mechanism of delivery may be by storing these packets in an application-related queue in the memory 220 (via the memory interface 222) and then signaling the command processor's doorbell 260 that one or more new packets are available in that queue in memory. In response to receiving such a signal, the doorbell is configured to trigger the reading of the one or more new packets by the fetcher 255. The fetcher 255 then reads packets from the queues in the memory 220 (via the memory interface 224) and pushes the packets into the packet queues 257, 258 in first-in, first-out (FIFO) order. Packets containing drawing (graphics) commands are stored in the queue 257, and packets containing dispatch computation commands are stored in the queue 258. The graphics command processor 270 and the computation command processor 280 are configured to pop packets out of the queues 257 and 258, respectively, as the packets arrive. When multiple applications 212 (e.g., running simultaneously on the host 210) generate respective packets, the packets associated with each application may be stored in a respective queue in the memory, and the fetcher 255 is configured to read packets from each queue in the memory 220 and push the packets into the respective packet queues 257, 258.

一態様では、パケットフォーマットは、パケットヘッダと、１つ以上のコマンドと、を含み得る。本明細書で開示するように、パケットヘッダは、パススルーモード、中断開始モード、及び、中断終了モードを含む動作モードを符号化する。パススルー動作モードでは、コマンドプロセッサ２５０は、通常の動作状態で動作する。すなわち、新たに到着したコマンドは、現在利用可能なコンピューティングリソースによって処理される。例えば、全てのＷＧＰ２４０．１～２４０．Ｎが、以前に受信されたコマンドに関連付けられたウェーブの処理に関与している場合、新たに到着したコマンドは、ＷＧＰのうち１つ以上が利用可能になるまで待たなければならない。対照的に、中断開始動作モードでは、コマンドプロセッサ２５０は、現在処理されているウェーブの実行を中断し、全てのＷＧＰを、新たに到着したコマンドに関連付けられたウェーブの実行に利用可能にするように構成されている。この動作モードは、中断終了動作モードが有効になるまで保持され、有効になった時点で、中断されたウェーブの実行が再開（復元）され、コマンドプロセッサ２５０が通常の動作状態の下で再び動作するように戻る。これらの３つの動作モードが処理され得る方法は、以下に更に開示される。 In one aspect, the packet format may include a packet header and one or more commands. As disclosed herein, the packet header encodes an operating mode, including a pass-through mode, a pause start mode, and a pause end mode. In the pass-through operating mode, the command processor 250 operates under normal operating conditions. That is, a newly arrived command is processed by currently available computing resources. For example, if all WGPs 240.1-240.N are involved in processing a wave associated with a previously received command, the newly arrived command must wait until one or more of the WGPs are available. In contrast, in the pause start operating mode, the command processor 250 is configured to pause execution of the currently processed wave and make all WGPs available to execute the wave associated with the newly arrived command. This operating mode is maintained until a pause end operating mode is enabled, at which point execution of the paused wave is resumed (restored) and the command processor 250 returns to operating under normal operating conditions again. The manner in which these three operating modes may be handled is disclosed further below.

ヘッダ内にパススルー動作モードを符号化するパケットが受信されると、コマンドプロセッサ２５０はパケットのコマンドを復号する。パケット内のコマンドは、ＡＰＵ２１５の構成要素に関連付けられた状態又は制御レジスタを設定するために使用されるコマンドであってもよい。また、パケット内のコマンドは、同期動作に使用されるコマンドであってもよい。パケット内のかなりの数のコマンドは、描画（グラフィックス）コマンド及び計算ディスパッチコマンド等のような、シェーダ２４０に向けられた計算タスクに関係し得る。したがって、コマンドプロセッサ２５０は、コマンドを復号する場合、コマンドに作用し得る（例えば、コマンドに従って状態レジスタを設定する）か、又は、コマンドを宛先構成要素に送信して、それに作用し得る。描画（グラフィックス）コマンド又は計算ディスパッチコマンドは、それぞれグラフィックスコマンドプロセッサ２７０又は計算コマンドプロセッサ２８０によって処理される。これらのプロセッサ２７０、２８０は、それぞれのコマンドをシェーダコマンドに変換する。キューマネージャ２９０は、これらのシェーダコマンドをそれぞれのキューに記憶し、これらのキューをシェーダスケジューラ２３０に供給される実行パイプに接続する。シェーダスケジューラ２３０は、利用可能なＷＧＰ２４０．１～２４０．Ｎにシェーダコマンドを割り当てる。 When a packet encoding the pass-through mode of operation in its header is received, the command processor 250 decodes the command of the packet. The command in the packet may be a command used to set status or control registers associated with components of the APU 215. The command in the packet may also be a command used for synchronization operations. A significant number of the commands in the packet may relate to computational tasks directed to the shader 240, such as drawing (graphics) commands and compute dispatch commands. Thus, when the command processor 250 decodes a command, it may act on the command (e.g., set a status register according to the command) or send the command to the destination component to act on it. The drawing (graphics) command or compute dispatch command is processed by the graphics command processor 270 or compute command processor 280, respectively. These processors 270, 280 convert the respective commands into shader commands. The queue manager 290 stores these shader commands in respective queues and connects these queues to the execution pipes that feed the shader scheduler 230. The shader scheduler 230 assigns shader commands to the available WGPs 240.1 to 240.N.

したがって、パススルー動作モードでは、コマンドは、現在利用可能なコンピューティングリソースに基づいてコマンドプロセッサ２５０によって処理され、すなわち、これらのコマンドによって規定される計算タスクは、シェーダ２４０の現在利用可能なＷＧＰにスケジュールされる（２３０）。しかしながら、そのヘッダ中に動作の中断開始モードを符号化するパケットが受信される場合、コマンドプロセッサ２５０は、パケット内のコマンドが処理される前に、シェーダ２４０のＷＧＰ２４０．１～２４０．Ｎ上で現在処理されているウェーブの中断動作を開始する。同様に、中断終了動作モードをそのヘッダ内に符号化するパケットが受信されると、パケット内のコマンドが処理され、次いで、コマンドプロセッサ２５０は、中断されたウェーブを再開（復元）してそれらの処理を継続することによって中断を終了する。中断段階（すなわち、中断開始モードを符号化するパケットで始まり、中断終了モードを符号化するパケットで終わる段階）中に受信されるコマンドは、ＡＰＵの全てのコンピューティングリソースを利用可能であり、したがって、シェーダ２４０の全てのＷＧＰ２４０．１～２４０．Ｎ上で排他的にスケジュールすることができる。次いで、中断段階中に、ＡＰＵは、パケットを生成したアプリケーション（本明細書で開示される態様によれば、中断モードを開始したアプリケーション）に関連付けられたメモリ２２０内のキューからフェッチされたパケット内のコマンドを排他的に処理する。ＡＰＵは、中断段階が終了するまで、他のアプリケーションに関連付けられたメモリ２２０内のキューに記憶されたパケットを供給しない（例えば、フェッチしない）。 Thus, in the pass-through mode of operation, commands are processed by the command processor 250 based on the currently available computing resources, i.e., the computational tasks defined by these commands are scheduled (230) on the currently available WGPs of the shader 240. However, if a packet is received that encodes a suspend start mode of operation in its header, the command processor 250 initiates the suspend operation of the currently processed wave on the WGPs 240.1-240.N of the shader 240 before the commands in the packet are processed. Similarly, if a packet is received that encodes a suspend end mode of operation in its header, the commands in the packet are processed, and then the command processor 250 ends the suspend by resuming (restoring) the suspended waves to continue their processing. Commands received during the suspend phase (i.e., the phase beginning with a packet encoding a suspend start mode and ending with a packet encoding a suspend end mode) have all the computing resources of the APU available, and therefore can be exclusively scheduled on all the WGPs 240.1-240.N of the shader 240. Then, during the suspend phase, the APU exclusively processes commands in packets fetched from queues in memory 220 associated with the application that generated the packets (the application that initiated the suspend mode, according to aspects disclosed herein). The APU does not provide (e.g., does not fetch) packets stored in queues in memory 220 associated with other applications until the suspend phase ends.

したがって、上述したように、中断開始モードがパケットのヘッダから復号されると、中断動作がトリガされる。すなわち、コマンドプロセッサ２５０は、シェーダスケジューラ２３０に供給する実行パイプへの任意の新しいキューの接続を停止するようにキューマネージャ２９０に信号を送る。更に、キューマネージャ２９０は、実行パイプに現在接続されている任意のキューをオフに切り替える、休止する又は停止するように信号を送られる。一態様では、パケットのヘッダ中の情報に基づいて、中断は、現在動作しているウェーブを中断することによって、そのようなウェーブを排出させることによって、又は、それらの組合せによって実行され得る。中断動作が完了すると、全てのシェーダリソースが利用可能になり、シェーダのＷＧＰは使用されておらず、したがって、中断段階中に受信されたコマンドによって規定される計算タスクでスケジュールされるように利用可能である。中断段階は、後続のパケットのヘッダから中断終了モードが復号されるまで継続する。このとき、上述したように、中断ウェーブが再開（復元）される。 Thus, as described above, when the pause start mode is decoded from the header of the packet, a pause operation is triggered. That is, the command processor 250 signals the queue manager 290 to stop connecting any new queues to the execution pipes that feed the shader scheduler 230. Additionally, the queue manager 290 is signaled to switch off, pause, or stop any queues currently connected to the execution pipes. In one aspect, based on the information in the packet's header, the pause can be performed by pausing the currently running wave, by draining such wave, or by a combination thereof. Once the pause operation is completed, all shader resources are available and the shader's WGP is unused and therefore available to be scheduled with the computational tasks defined by the commands received during the pause phase. The pause phase continues until a pause end mode is decoded from the header of the subsequent packet. At that time, the pause wave is resumed (restored) as described above.

現在実行中のウェーブを中断することは、計算ウェーブ保存復元（ＣＷＳＲ）手順と呼ばれる手順によって使用することができ、それを通してウェーブを中断し復元することができる。ＣＷＳＲ手順では、コマンドプロセッサ２５０は、ウェーブを現在実行しているシェーダ２４０．１～２４０．Ｎに、それらの状態をメモリに保存し、それら自体を実行から除去するように命令する。次いで、コマンドプロセッサ２５０は、ハードウェアマシンをトリガして、ウェーブリプレイリストをメモリ内のスタックに保存する。ウェーブを再開（復元）するために、コマンドプロセッサ２５０は、スタックをハードウェア実行ユニットにプッシュバックし、次いで、再生されたウェーブは、それらの状態を復元し、それらが以前に停止した場所で動作を再開する。 Aborting a currently executing wave can be used by a procedure called the Compute Wave Save and Restore (CWSR) procedure, through which a wave can be suspended and restored. In the CWSR procedure, the command processor 250 commands the shaders 240.1-240.N currently executing the wave to save their state to memory and remove themselves from execution. The command processor 250 then triggers the hardware machine to save the wave replay list to a stack in memory. To resume (restore) the wave, the command processor 250 pushes the stack back to the hardware execution unit, and the played waves then restore their state and resume operation where they previously stopped.

図３は、本開示の１つ以上の特徴が実装され得るＡＰＵ２１５のインライン中断のための例示的な方法３００のフローチャートである。通常の動作中、プロセッサ２１０は、計算タスクに関連付けられたコマンドを、１つ以上のパケットを介してＡＰＵ２１５に送信する。したがって、方法３００は、ステップ３１０において、パケットを受信することから始まる。ステップ３２０において、受信されたパケットのヘッダは、動作モードを決定するために復号される。決定された動作モードが中断開始モードでない場合、ステップ３４０において、受信されたパケット内のコマンドは、ＡＰＵ２１５の現在利用可能なコンピューティングリソースに基づいて実行される。したがって、受信されたパケット内のコマンドに関連付けられたウェーブは、ＷＧＰ２４０．１～２４０．Ｎを、以前に受信されたパケットから受信されたコマンドに関連付けられた他の（現在実行中の）ウェーブと共有しなければならない場合がある。しかしながら、動作モードが中断開始モードであると決定された場合、ステップ３３０において、シェーダ処理は中断される。すなわち、シェーダ２４０の全てのコンピューティングリソースを利用可能にするために、ＷＧＰ２４０．１～２４０．Ｎ内で現在実行中のウェーブが中断される。中断が完了すると、ステップ３４０で、受信されたパケット内のコマンドは、シェーダ２４０によって排他的に実行される。ステップ３５０において、動作モードが中断終了モードであると決定された場合、ステップ３６０において、中断されたウェーブが再開（復元）される。中断されたウェーブが再開（復元）されると（３６０）、（ステップ３３０とステップ３６０との間に続く）中断段階が終了する。すなわち、後続のパケット内のコマンドは、別の中断段階を再びトリガする中断開始動作モードを符号化するパケットが受信されるまで、再開（復元）されたウェーブとコンピューティングリソースを共有しながら実行される。 3 is a flow chart of an exemplary method 300 for in-line interruption of APU 215 in which one or more features of the present disclosure may be implemented. During normal operation, processor 210 transmits commands associated with computational tasks to APU 215 via one or more packets. Thus, method 300 begins with receiving a packet at step 310. At step 320, the header of the received packet is decoded to determine an operational mode. If the determined operational mode is not an interrupt-start mode, then at step 340, the command in the received packet is executed based on the currently available computing resources of APU 215. Thus, the wave associated with the command in the received packet may have to share WGP 240.1-240.N with other (currently executing) waves associated with commands received from previously received packets. However, if the operational mode is determined to be an interrupt-start mode, then shader processing is interrupted at step 330. That is, WGP 240.1-240.N are decoded to make all computing resources of shader 240 available. The currently executing wave in N is suspended. Once the suspension is complete, in step 340, the commands in the received packet are executed exclusively by shader 240. If in step 350, it is determined that the operating mode is a suspended-end mode, then in step 360, the suspended wave is resumed (restored). Once the suspended wave is resumed (restored) (360), the suspension phase (which continues between steps 330 and 360) ends; that is, commands in subsequent packets are executed while sharing computing resources with the resumed (restored) wave until a packet is received that encodes a suspended-enter operating mode that again triggers another suspension phase.

一態様では、動作モードは、例えば、１つ以上のパケットで受信されたコマンドの処理に関連するイベントに基づいて、ＡＰＵ２１５によって変更され得る。モードは、（通常の動作状態の間の）パススルー動作モードから中断開始動作モードに変更されて、中断段階における動作をもたらすことができる。代替的に、モードは、（中断段階での動作中の）パススルー動作モードから中断終了動作モードに変更されて、中断段階での動作を停止してもよい。例えば、グラフィックスコマンドプロセッサ２７０又は計算コマンドプロセッサ２８０によるコマンドの処理中に（又はこれらの処理されたコマンドに関連付けられたシェーダコマンドを実行するウェーブの処理中に）、これらのコマンドの全て又はサブセットの実行のために全てのコンピューティングリソースを引き継ぐことを要求するイベントが発生し得る。そのような状況では、コマンドプロセッサ２５０は、ＡＰＵの全てのコンピューティングリソースをこのコマンドサブセットの実行専用にするために、動作モードをパススルーモードから中断開始モードに変更することを決定することができる。このコマンドサブセットの実行の終了時に、コマンドプロセッサ２５０は、動作モードを中断終了モードに変更し、通常の動作状態に戻ることができる。代替的に、中断段階の下でコマンドを処理している間に、中断段階を終了することを要求するイベントが発生する可能性があり、その場合、コマンドプロセッサ２５０は、動作モードを中断終了モードに変更することができる。 In one aspect, the operating mode may be changed by APU 215 based on an event, for example, associated with the processing of commands received in one or more packets. The mode may be changed from a pass-through operating mode (during normal operating conditions) to a suspend-enter operating mode to effect operation in the suspend phase. Alternatively, the mode may be changed from a pass-through operating mode (during operation in the suspend phase) to an suspend-end operating mode to cease operation in the suspend phase. For example, during the processing of commands by graphics command processor 270 or compute command processor 280 (or during the processing of a wave that executes shader commands associated with these processed commands), an event may occur that requires taking over all computing resources for the execution of all or a subset of these commands. In such a situation, command processor 250 may decide to change the operating mode from pass-through mode to suspend-enter mode in order to dedicate all computing resources of the APU to the execution of this subset of commands. At the end of the execution of this subset of commands, command processor 250 may change the operating mode to suspend-end mode and return to a normal operating state. Alternatively, while processing a command under the suspend phase, an event may occur that requires the suspend phase to be terminated, in which case the command processor 250 can change the operating mode to the suspend termination mode.

本明細書の開示に基づいて、多くの変形が可能であることを理解されたい。特徴及び要素が特定の組み合わせで上述されているが、各特徴又は要素は、他の特徴及び要素を用いずに単独で、又は、他の特徴及び要素を用いて若しくは用いずに様々な組み合わせで使用することができる。 It should be understood that many variations are possible based on the disclosure herein. Although features and elements are described above in particular combinations, each feature or element can be used alone without other features and elements, or in various combinations with or without other features and elements.

提供される方法は、汎用コンピュータ、プロセッサ又はプロセッサコアにおいて実施することができる。好適なプロセッサとしては、例として、汎用プロセッサ、専用プロセッサ、従来型プロセッサ、デジタル信号プロセッサ（ＤＳＰ）、複数のマイクロプロセッサ、ＤＳＰコアと関連付けられた１つ以上のマイクロプロセッサ、コントローラ、マイクロコントローラ、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）回路、任意の他のタイプの集積回路（ＩＣ）、及び／又は、状態マシンが挙げられる。そのようなプロセッサは、処理されたハードウェア記述言語（hardware description language、ＨＤＬ）命令及びネットリストを含む他の中間データ（コンピュータ可読媒体に記憶させることが可能な命令等）の結果を使用して製造プロセスを構成することによって製造することができる。そのような処理の結果はマスクワークとすることができ、次いで、このマスクワークを半導体製造プロセスにおいて使用して、実施形態の態様を実装するプロセッサを製造する。 The provided methods can be implemented in a general purpose computer, processor, or processor core. Suitable processors include, by way of example, general purpose processors, special purpose processors, conventional processors, digital signal processors (DSPs), multiple microprocessors, one or more microprocessors associated with a DSP core, controllers, microcontrollers, application specific integrated circuits (ASICs), field programmable gate array (FPGA) circuits, any other type of integrated circuit (IC), and/or state machines. Such processors can be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions and other intermediate data, including netlists (such as instructions that can be stored on a computer readable medium). The results of such processing can be a mask work, which is then used in a semiconductor manufacturing process to manufacture a processor implementing aspects of the embodiments.

本明細書に提供される方法又はフローチャートは、汎用コンピュータ又はプロセッサによる実行のために非一時的なコンピュータ可読記憶媒体に組み込まれるコンピュータプログラム、ソフトウェア又はファームウェアにおいて実装することができる。非一時的なコンピュータ可読記憶媒体の例としては、読み取り専用メモリ（read only memory、ＲＯＭ）、ランダムアクセスメモリ（ＲＡＭ）、レジスタ、キャッシュメモリ、半導体メモリデバイス、内蔵ハードディスク及びリムーバブルディスク等の磁気媒体、磁気光学媒体、並びに、ＣＤ－ＲＯＭディスク及びデジタル多用途ディスク（digital versatile disk、ＤＶＤ）等の光学媒体が挙げられる。 The methods or flow charts provided herein may be implemented in a computer program, software, or firmware embodied in a non-transitory computer-readable storage medium for execution by a general purpose computer or processor. Examples of non-transitory computer-readable storage media include read only memory (ROM), random access memory (RAM), registers, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks and digital versatile disks (DVDs).

Claims

1. A method for in-line interruption of an accelerated processing unit (APU), comprising:
receiving a packet containing an operating mode and a command for the APU to execute;
suspending execution of a command received in a previous packet in response to the operational mode being a suspend-start mode;
and the APU executing a command in the received packet.
method.

resuming execution of the interrupted command in response to the operational mode being an interrupt-and-terminate mode.
2. The method of claim 1.

Aborting execution of a command received in the previous packet comprises:
disconnecting a queue from each execution pipe, the queue including shader commands associated with commands received in the previous packet;
2. The method of claim 1.

Aborting execution of a command received in the previous packet comprises:
and suspending a wave currently being executed by the APU based on information contained in a received packet, the wave executing a shader command associated with a command received in the previous packet.
2. The method of claim 1.

Aborting execution of a command received in the previous packet comprises:
employing a Compute Wave Save Restore (CWSR) procedure to abort a wave associated with the command received in the previous packet;
2. The method of claim 1.

Aborting execution of a command received in the previous packet comprises:
Draining a wave currently being executed by the APU based on information contained in a received packet, the wave executing a shader command associated with a command received in the previous packet.
2. The method of claim 1.

The operational mode is changed by the APU based on an event associated with processing a command in a received packet or a previously received packet.
2. The method of claim 1.

The APU is one of a graphics processing unit, a digital signal processor, a field programmable gate array processor, or a neural processing unit.
2. The method of claim 1.

said executing being performed by an execution engine of said APU;
2. The method of claim 1.

the execution engine is a shader;
9. The method of claim 8.

a received packet is fetched by the APU from a queue in memory, the queue being associated with an application that generated the received packet;
2. The method of claim 1.

1. A system for in-line interruption of an APU, comprising:
At least one processor;
A memory for storing instructions,
The instructions, when executed by the at least one processor,
receiving a packet containing an operating mode and a command for the APU to execute;
suspending execution of a command received in a previous packet in response to the operational mode being a suspend-start mode;
the APU executing a command in the received packet;
causing the system to
system.

The instruction:
in response to the operating mode being a suspend-end mode, causing the system to resume execution of the suspended command;
The system of claim 12.

Aborting execution of a command received in the previous packet comprises:
disconnecting a queue from each execution pipe, the queue including shader commands associated with commands received in the previous packet;
The system of claim 12.

Aborting execution of a command received in the previous packet comprises:
and suspending a wave currently being executed by the APU based on information contained in a received packet, the wave executing a shader command associated with a command received in the previous packet.
The system of claim 12.

Aborting execution of a command received in the previous packet comprises:
employing a Compute Wave Save Restore (CWSR) procedure to abort a wave associated with the command received in the previous packet;
The system of claim 12.

Aborting execution of a command received in the previous packet comprises:
Draining a wave currently being executed by the APU based on information contained in a received packet, the wave executing a shader command associated with a command received in the previous packet.
The system of claim 12.

The operational mode is changed by the APU based on an event associated with processing a command in a received packet or a previously received packet.
The system of claim 12.

1. A computer-readable storage medium comprising hardware description language instructions describing an accelerated processing unit (APU) configured to perform an in-line interruption of the APU, the computer-readable storage medium comprising:
The APU is
receiving a packet containing an operating mode and a command for the APU to execute;
suspending execution of a command received in a previous packet in response to the operational mode being a suspend-start mode;
the APU executing a command in the received packet;
It is possible to
A computer-readable storage medium.

The APU is
In response to the operation mode being a suspend-end mode, execution of the suspended command can be resumed.
20. The computer readable storage medium of claim 19.

Aborting execution of a command received in the previous packet comprises:
disconnecting a queue from each execution pipe, the queue including shader commands associated with commands received in the previous packet;
20. The computer readable storage medium of claim 19.

Aborting execution of a command received in the previous packet comprises:
and suspending a wave currently being executed by the APU based on information contained in a received packet, the wave executing a shader command associated with a command received in the previous packet.
20. The computer readable storage medium of claim 19.

Aborting execution of a command received in the previous packet comprises:
employing a Compute Wave Save Restore (CWSR) procedure to abort a wave associated with the command received in the previous packet;
20. The computer readable storage medium of claim 19.

Aborting execution of a command received in the previous packet comprises:
Draining a wave currently being executed by the APU based on information contained in a received packet, the wave executing a shader command associated with a command received in the previous packet.
20. The computer readable storage medium of claim 19.

The operational mode is changed by the APU based on an event associated with processing a command in a received packet or a previously received packet.
20. The computer readable storage medium of claim 19.