TWI869788B - Processing circuit and computation scheduling method of artificial intelligence model - Google Patents
Processing circuit and computation scheduling method of artificial intelligence model Download PDFInfo
- Publication number
- TWI869788B TWI869788B TW112108662A TW112108662A TWI869788B TW I869788 B TWI869788 B TW I869788B TW 112108662 A TW112108662 A TW 112108662A TW 112108662 A TW112108662 A TW 112108662A TW I869788 B TWI869788 B TW I869788B
- Authority
- TW
- Taiwan
- Prior art keywords
- sub
- intermediate data
- operator
- tensor
- memory
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 52
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 title claims description 26
- 238000003860 storage Methods 0.000 claims description 30
- 230000001419 dependent effect Effects 0.000 claims description 16
- 238000010586 diagram Methods 0.000 description 19
- 102100034033 Alpha-adducin Human genes 0.000 description 7
- 101000799076 Homo sapiens Alpha-adducin Proteins 0.000 description 7
- 101000629598 Rattus norvegicus Sterol regulatory element-binding protein 1 Proteins 0.000 description 7
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 6
- 102100034004 Gamma-adducin Human genes 0.000 description 5
- 101000799011 Homo sapiens Gamma-adducin Proteins 0.000 description 5
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 5
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 5
- 102100036464 Activated RNA polymerase II transcriptional coactivator p15 Human genes 0.000 description 4
- 102100024348 Beta-adducin Human genes 0.000 description 4
- 101000713904 Homo sapiens Activated RNA polymerase II transcriptional coactivator p15 Proteins 0.000 description 4
- 101000689619 Homo sapiens Beta-adducin Proteins 0.000 description 4
- 229910004444 SUB1 Inorganic materials 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 229910004438 SUB2 Inorganic materials 0.000 description 3
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 3
- 101100311330 Schizosaccharomyces pombe (strain 972 / ATCC 24843) uap56 gene Proteins 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 101150018444 sub2 gene Proteins 0.000 description 3
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 101150080287 SUB3 gene Proteins 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Complex Calculations (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
本發明是關於人工智慧模型,尤其是關於人工智慧模型的處理電路與運算排程方法。The present invention relates to an artificial intelligence model, and more particularly to a processing circuit and a calculation scheduling method of the artificial intelligence model.
在一個晶片上系統(例如,系統單晶片(system-on-chip, SoC))中,記憶體頻寬總量往往是固定的,且被多個模組使用。當某個模組所佔用的記憶體頻寬過大時,會導致其他模組獲取記憶體發生阻塞,進而導致系統性能降低。人工智慧(artificial intelligence, AI)模型作為晶片上系統中的一個模組,經常需要處理大量的資料,對記憶體的頻寬需求大;因此,減小AI模型的頻寬需求成為一個重要的課題。In a system-on-chip (SoC), the total memory bandwidth is often fixed and used by multiple modules. When a module occupies too much memory bandwidth, it will cause other modules to be blocked from accessing memory, which will lead to reduced system performance. As a module in a system-on-chip, artificial intelligence (AI) models often need to process a large amount of data and have a large demand for memory bandwidth; therefore, reducing the bandwidth requirement of AI models has become an important topic.
鑑於先前技術之不足,本發明之一目的在於提供一種人工智慧模型的處理電路及運算排程方法,以改善先前技術的不足。In view of the deficiencies of the prior art, one purpose of the present invention is to provide a processing circuit and calculation scheduling method of an artificial intelligence model to improve the deficiencies of the prior art.
本發明之一實施例提供一種人工智慧模型的處理電路。該處理電路耦接一外部記憶體並且包含一記憶體、一記憶體管理電路以及一運算電路。記憶體管理電路用來從該外部記憶體讀取一張量,並將該張量儲存至該記憶體。運算電路被配置以進行以下操作:對該張量的一第一子張量進行一第一種類的運算,以產生一第一中間資料;對該張量的一第二子張量進行該第一種類的運算,以產生一第二中間資料;對該第一中間資料及該第二中間資料進行一第二種類的運算,以產生一第三中間資料;對該張量的一第三子張量進行該第一種類的運算,以產生一第四中間資料;以及,對該第一中間資料、該第二中間資料及該第四中間資料進行該第二種類的運算,以產生一第五中間資料。An embodiment of the present invention provides a processing circuit of an artificial intelligence model. The processing circuit is coupled to an external memory and includes a memory, a memory management circuit and an operation circuit. The memory management circuit is used to read a tensor from the external memory and store the tensor in the memory. The operation circuit is configured to perform the following operations: perform a first type of operation on a first sub-tensor of the tensor to generate a first intermediate data; perform the first type of operation on a second sub-tensor of the tensor to generate a second intermediate data; perform a second type of operation on the first intermediate data and the second intermediate data to generate a third intermediate data; perform the first type of operation on a third sub-tensor of the tensor to generate a fourth intermediate data; and perform the second type of operation on the first intermediate data, the second intermediate data, and the fourth intermediate data to generate a fifth intermediate data.
本發明之另一實施例提供一種人工智慧模型的處理電路。該處理電路耦接一外部記憶體並且包含一記憶體。該處理電路執行以下操作:從該外部記憶體讀取一張量及複數個核心參數,並將該張量及該些核心參數儲存至該記憶體,其中,該張量包含一第一子張量及一第二子張量,該些核心參數包含一向量核心參數;參考該向量核心參數的一第一部分對該第一子張量進行一第一向量運算,以產生一第一中間資料;以及,參考該向量核心參數的一第二部分對該第二子張量進行一第二向量運算,以產生一第二中間資料。該向量核心參數的該第一部分不等於該向量核心參數的該第二部分。Another embodiment of the present invention provides a processing circuit of an artificial intelligence model. The processing circuit is coupled to an external memory and includes a memory. The processing circuit performs the following operations: read a tensor and a plurality of core parameters from the external memory, and store the tensor and the core parameters to the memory, wherein the tensor includes a first sub-tensor and a second sub-tensor, and the core parameters include a vector core parameter; perform a first vector operation on the first sub-tensor with reference to a first part of the vector core parameter to generate a first intermediate data; and perform a second vector operation on the second sub-tensor with reference to a second part of the vector core parameter to generate a second intermediate data. The first part of the vector core parameter is not equal to the second part of the vector core parameter.
本發明之另一實施例提供一種人工智慧模型的運算排程方法。該人工智慧模型包含一第一運算子及一第二運算子。該運算排程方法包含:將一張量分為H個子張量,H係大於1之整數;將該第一運算子分為H個第一子運算子;將該第二運算子分為H個第二子運算子;確定該H個第一子運算子及該H個第二子運算子之間的一依賴關係;根據該依賴關係排序該H個第一子運算子及該H個第二子運算子,以得到一操作順序;以及,根據該操作順序決定執行該人工智慧模型之一處理電路何時從該處理電路所包含的一記憶體中刪除一目標資料,該目標資料係該H個第一子運算子及該H個第二子運算子的其中一者的一輸出資料。Another embodiment of the present invention provides a method for scheduling operations of an artificial intelligence model, wherein the artificial intelligence model includes a first operator and a second operator. The operation scheduling method includes: dividing a tensor into H sub-tensors, where H is an integer greater than 1; dividing the first operator into H first sub-operators; dividing the second operator into H second sub-operators; determining a dependency relationship between the H first sub-operators and the H second sub-operators; sorting the H first sub-operators and the H second sub-operators according to the dependency relationship to obtain an operation order; and, determining when a processing circuit that executes the artificial intelligence model deletes a target data from a memory included in the processing circuit according to the operation order, the target data being an output data of one of the H first sub-operators and the H second sub-operators.
本發明之實施例所體現的技術手段可以改善先前技術之缺點的至少其中之一,因此本發明相較於先前技術可以減少記憶體用量及/或降低對記憶體的頻寬需求。The technical means embodied in the embodiments of the present invention can improve at least one of the shortcomings of the prior art. Therefore, compared with the prior art, the present invention can reduce the memory usage and/or reduce the bandwidth requirement for the memory.
有關本發明的特徵、實作與功效,茲配合圖式作實施例詳細說明如下。The features, implementation and effects of the present invention are described in detail below with reference to the accompanying drawings.
以下說明內容之技術用語係參照本技術領域之習慣用語,如本說明書對部分用語有加以說明或定義,該部分用語之解釋係以本說明書之說明或定義為準。The technical terms used in the following descriptions refer to the customary terms in this technical field. If this manual explains or defines some of the terms, the interpretation of those terms shall be based on the explanation or definition in this manual.
本發明之揭露內容包含人工智慧模型的處理電路及運算排程方法。由於本發明之人工智慧模型的處理電路所包含之部分元件單獨而言可能為已知元件,因此在不影響該裝置發明之充分揭露及可實施性的前提下,以下說明對於已知元件的細節將予以節略。The disclosure of the present invention includes a processing circuit of an artificial intelligence model and a calculation scheduling method. Since some components included in the processing circuit of the artificial intelligence model of the present invention may be known components individually, the following description will be omitted for details of the known components without affecting the full disclosure and feasibility of the device invention.
圖1是AI網路的一個例子,可視為一個簡單的AI模型,或一個複雜的AI模型的一部分。AI網路100用來對輸入資料Din進行運算,以產生輸出資料Dout。圖1的AI網路100包含三個運算子(operator):減法運算子110(「SUB」)、卷積(convolution)運算子120(「CONV」)及加法運算子130(「ADD」)。減法運算子110對張量(tensor)TS1(即,輸入資料Din)進行減法運算,以產生張量TS2。卷積運算子120對張量TS2進行卷積運算,以產生張量TS3。加法運算子130對張量TS3進行加法運算,以產生張量TS4(即,輸出資料Dout)。在圖1的例子中,張量TS1、張量TS2、張量TS3及張量TS4的大小(維度資訊)皆為[1,3,224,224]。FIG1 is an example of an AI network, which can be viewed as a simple AI model or as part of a complex AI model. The
圖2為本發明人工智慧模型的運算排程方法之一實施例的流程圖。圖2的流程由晶片開發工具(例如,電腦)執行,包含以下步驟。FIG2 is a flow chart of an embodiment of the method for scheduling computation of an artificial intelligence model of the present invention. The flow chart of FIG2 is executed by a chip development tool (eg, a computer) and includes the following steps.
步驟S210:將張量切分為H個子張量(或稱為塊(tile)),H可以是張量的任一維度(H為大於1的整數)。更明確地說,此步驟是根據AI網路100的最後一個運算子的輸出張量的其中一個維度,來決定H的值,然後將張量分為H個子張量。以圖1的AI網路100為例,因為最後一個運算子(加法運算子130)的輸出張量(即,張量TS4)的大小是[1,3,224,224],所以H可以是3或224。切分張量的細節將於下方配合圖3做說明。Step S210: Split the tensor into H sub-tensors (or tiles), where H can be any dimension of the tensor (H is an integer greater than 1). More specifically, this step determines the value of H based on one of the dimensions of the output tensor of the last operator of the
步驟S220:將運算子切分為H個子運算子。本步驟將連同步驟S210於下方配合圖3做說明。Step S220: Split the operator into H sub-operators. This step will be described together with step S210 below with reference to FIG. 3 .
步驟S230:確定多個子運算子之間的依賴關係。此步驟將於下方配合圖5做說明。Step S230: Determine the dependency relationship between multiple sub-operators. This step will be explained below with reference to FIG. 5 .
步驟S240:根據子運算子之間的依賴關係排序子運算子,以得到操作順序。此步驟將於下方配合圖7做說明。Step S240: Sort the sub-operators according to the dependency relationship between the sub-operators to obtain an operation order. This step will be explained below with reference to FIG. 7.
步驟S250:根據該操作順序決定執行人工智慧模型之電子裝置(更明確地說,該電子裝置之處理電路)何時從記憶體中刪除一目標資料,該目標資料是其中一個子運算子的輸出資料(也就是AI網路100的中間資料)。此步驟將於下方配合圖10、圖11、圖12A及圖12B做說明。Step S250: Determine when the electronic device executing the artificial intelligence model (more specifically, the processing circuit of the electronic device) deletes a target data from the memory according to the operation sequence, the target data being the output data of one of the sub-operators (i.e., the intermediate data of the AI network 100). This step will be explained below with reference to FIG. 10 , FIG. 11 , FIG. 12A , and FIG. 12B .
請參閱圖3,圖3是圖1之張量及運算子經切分後的結果。在圖3的實施例中,切分張量之操作(即,步驟S210)所根據的維度是張量TS4的第二個維度(即,H=3)。因此,減法運算子110被切分成減法子運算子110_1(「SUB1」)、減法子運算子110_2(「SUB2」)及減法子運算子110_3(「SUB3」);卷積運算子120被切分成卷積子運算子120_1(「CONV1」)、卷積子運算子120_2(「CONV2」)及卷積子運算子120_3(「CONV3」);加法運算子130被切分成加法子運算子130_1(「ADD1」)、加法子運算子130_2(「ADD2」)及加法子運算子130_3(「ADD3」)。張量TS1被切分成子張量TS1_i1、子張量TS1_i2及子張量TS1_i3(分別為減法子運算子110_1、減法子運算子110_2及減法子運算子110_3的輸入子張量,大小皆為[1,1,224,224],且對應到張量TS1的同一維度(例如,第二個維度))。張量TS4被切分成子張量TS3_o1、子張量TS3_o2及子張量TS3_o3(分別為加法子運算子130_1、加法子運算子130_2及加法子運算子130_3的輸出子張量,大小皆為[1,1,224,224],且對應到張量TS4的同一維度)。張量TS3被切分成子張量TS3_i1、子張量TS3_i2、子張量TS3_i3(分別為加法子運算子130_1、加法子運算子130_2及加法子運算子130_3的輸入子張量,大小皆為[1,1,224,224],且對應到張量TS3的同一維度)。卷積子運算子120_1、卷積子運算子120_2及卷積子運算子120_3的個別的輸出子張量(即,子張量TS2_o1、子張量TS2_o2及子張量TS2_o3)分別等於子張量TS3_i1、子張量TS3_i2及子張量TS3_i3。Please refer to Figure 3, which is the result of the tensor and operator after segmentation in Figure 1. In the embodiment of Figure 3, the dimension according to which the operation of segmenting the tensor (i.e., step S210) is based on the second dimension of the tensor TS4 (i.e., H=3). Therefore, the
需注意的是,因為可視域放大(Visual Field Enlargement)的問題,減法子運算子110_1(減法子運算子110_2或減法子運算子110_3)所輸出的子張量TS1_o1(子張量TS1_o2或子張量TS1_o3)不等於卷積子運算子120_1(卷積子運算子120_2或卷積子運算子120_3)的輸入子張量TS2_i1(子張量TS2_i2或子張量TS2_i3);更明確地說,子張量TS1_o1、TS1_o2、TS1_o3的大小皆為[1,1,224,224],但子張量TS2_i1與子張量TS2_i3的大小皆為[1,2,224,224],而子張量TS2_i2的大小為[1,3,224,224]。It should be noted that due to the problem of visual field enlargement, the output subtensor TS1_o1 (subtensor TS1_o2 or subtensor TS1_o3) of the subtraction operator 110_1 (subtraction operator 110_2 or subtraction operator 110_3) is not equal to the input subtensor TS2_i1 of the convolution operator 120_1 (convolution operator 120_2 or convolution operator 120_3). (subtensor TS2_i2 or subtensor TS2_i3); more specifically, the sizes of subtensors TS1_o1, TS1_o2, and TS1_o3 are all [1,1,224,224], but the sizes of subtensors TS2_i1 and TS2_i3 are both [1,2,224,224], and the size of subtensor TS2_i2 is [1,3,224,224].
由圖3可知,由於子張量TS1_o1、子張量TS1_o2與子張量TS1_o3分別對應到子張量TS1_i1、子張量TS1_i2與子張量TS1_i3,且子張量TS1_i1、子張量TS1_i2與子張量TS1_i3對應到張量TS1的同一維度(例如,第二個維度),所以子張量TS1_o1、子張量TS1_o2與子張量TS1_o3也對應到張量TS1的同一維度。As can be seen from Figure 3, since sub-tensor TS1_o1, sub-tensor TS1_o2 and sub-tensor TS1_o3 correspond to sub-tensor TS1_i1, sub-tensor TS1_i2 and sub-tensor TS1_i3 respectively, and sub-tensor TS1_i1, sub-tensor TS1_i2 and sub-tensor TS1_i3 correspond to the same dimension of tensor TS1 (for example, the second dimension), sub-tensor TS1_o1, sub-tensor TS1_o2 and sub-tensor TS1_o3 also correspond to the same dimension of tensor TS1.
圖2的流程可以有效地管理目標資料在電子裝置的記憶體中的存續時間,有助於減少記憶體的使用量及/或降低對記憶體的頻寬需求。細節將於下方配合圖9、圖10、圖11、圖12A及圖12B做說明。The process of FIG. 2 can effectively manage the duration of target data in the memory of the electronic device, which helps to reduce memory usage and/or reduce the bandwidth requirement for the memory. The details will be explained below in conjunction with FIG. 9, FIG. 10, FIG. 11, FIG. 12A and FIG. 12B.
基於同一張量所切分出來的子張量之間的重疊(overlapping)關係(請參閱圖3),可以得到如圖4所示的子運算子之間的連線的拓樸圖。具體地,基於同一運算子切分出來的各個子運算子的輸入子張量與其所依賴的來源運算子的各個子運算子的輸出子張量之間的重疊關係得到該拓樸圖。如圖所示,卷積子運算子120_1的輸入子張量TS2_i1包含子張量TS1_o1與子張量TS1_o2;也就是說,卷積子運算子120_1需要等減法子運算子110_1及減法子運算子110_2皆結束後才能開始。同理,卷積子運算子120_2需要等減法子運算子110_1、減法子運算子110_2及減法子運算子110_3皆結束後才能開始;卷積子運算子120_3需要等減法子運算子110_2及減法子運算子110_3皆結束後才能開始。加法子運算子130_1、加法子運算子130_2及加法子運算子130_3分別需要等卷積子運算子120_1、卷積子運算子120_2及卷積子運算子120_3結束後才能開始。Based on the overlapping relationship between the sub-tensors split from the same tensor (see FIG3 ), a topological diagram of the connections between the sub-operators as shown in FIG4 can be obtained. Specifically, the topological diagram is obtained based on the overlapping relationship between the input sub-tensors of each sub-operator split from the same operator and the output sub-tensors of each sub-operator of the source operator on which it depends. As shown in the figure, the input sub-tensor TS2_i1 of the convolution sub-operator 120_1 includes the sub-tensor TS1_o1 and the sub-tensor TS1_o2; that is, the convolution sub-operator 120_1 needs to wait until the subtraction sub-operator 110_1 and the subtraction sub-operator 110_2 are both finished before it can start. Similarly, the convolution operator 120_2 needs to wait for the subtraction operator 110_1, the subtraction operator 110_2 and the subtraction operator 110_3 to finish before it can start; the convolution operator 120_3 needs to wait for the subtraction operator 110_2 and the subtraction operator 110_3 to finish before it can start. The addition operator 130_1, the addition operator 130_2 and the addition operator 130_3 need to wait for the convolution operator 120_1, the convolution operator 120_2 and the convolution operator 120_3 to finish before they can start.
請參閱圖5,圖5是圖2之步驟S230之一實施例的詳細流程,包含以下步驟。以下配合圖4說明圖5的細節。Please refer to FIG. 5 , which is a detailed flow chart of an embodiment of step S230 of FIG. 2 , including the following steps. The details of FIG. 5 are explained below in conjunction with FIG. 4 .
步驟S510:決定目標子運算子。例如,選取卷積子運算子120_1作為目標子運算子。Step S510: Determine the target sub-operator. For example, select the convolution sub-operator 120_1 as the target sub-operator.
步驟S520:決定該目標子運算子之來源子運算子。承上例,因為卷積子運算子120_1的輸入子張量TS2_i1的來源包含子張量TS1_o1及子張量TS1_o2,所以卷積子運算子120_1的來源子運算子是減法子運算子110_1及減法子運算子110_2(即,減法子運算子110_1的輸出子張量TS1_o1與減法子運算子110_2的輸出子張量TS1_o2是卷積子運算子120_1的輸入子張量)。同理,卷積子運算子120_2的來源子運算子是減法子運算子110_1、減法子運算子110_2及減法子運算子110_3,卷積子運算子120_3的來源子運算子是減法子運算子110_2及減法子運算子110_3;加法子運算子130_1的來源子運算子是卷積子運算子120_1。Step S520: Determine the source sub-operator of the target sub-operator. Continuing with the above example, since the source of the input sub-tensor TS2_i1 of the convolution sub-operator 120_1 includes the sub-tensor TS1_o1 and the sub-tensor TS1_o2, the source sub-operator of the convolution sub-operator 120_1 is the subtraction sub-operator 110_1 and the subtraction sub-operator 110_2 (i.e., the output sub-tensor TS1_o1 of the subtraction sub-operator 110_1 and the output sub-tensor TS1_o2 of the subtraction sub-operator 110_2 are the input sub-tensors of the convolution sub-operator 120_1). Similarly, the source sub-operators of the convolution sub-operator 120_2 are the subtraction sub-operator 110_1, the subtraction sub-operator 110_2 and the subtraction sub-operator 110_3, the source sub-operators of the convolution sub-operator 120_3 are the subtraction sub-operator 110_2 and the subtraction sub-operator 110_3; and the source sub-operator of the addition sub-operator 130_1 is the convolution sub-operator 120_1.
步驟S530:決定該目標子運算子依賴於該來源子運算子,也就是說,來源子運算子是該目標子運算子的依賴子運算子。舉例來說,減法子運算子110_1、減法子運算子110_2及減法子運算子110_3是卷積子運算子120_2的依賴子運算子。Step S530: Determine whether the target sub-operator is dependent on the source sub-operator, that is, the source sub-operator is a dependent sub-operator of the target sub-operator. For example, subtraction sub-operator 110_1, subtraction sub-operator 110_2 and subtraction sub-operator 110_3 are dependent sub-operators of convolution sub-operator 120_2.
輪流以圖4的每個子運算子作為目標子運算子並且重複圖5的流程即可確定多個子運算子之間的依賴關係,如圖6所示。卷積子運算子120_1依賴於減法子運算子110_1及減法子運算子110_2。卷積子運算子120_2依賴於減法子運算子110_1、減法子運算子110_2及減法子運算子110_3。卷積子運算子120_3依賴於減法子運算子110_2及減法子運算子110_3。加法子運算子130_1、加法子運算子130_2及加法子運算子130_3分別依賴於卷積子運算子120_1、卷積子運算子120_2及卷積子運算子120_3。By taking each sub-operator of FIG. 4 as the target sub-operator in turn and repeating the process of FIG. 5 , the dependency relationship between the multiple sub-operators can be determined, as shown in FIG. 6 . The convolution sub-operator 120_1 depends on the subtraction sub-operator 110_1 and the subtraction sub-operator 110_2. The convolution sub-operator 120_2 depends on the subtraction sub-operator 110_1, the subtraction sub-operator 110_2 and the subtraction sub-operator 110_3. The convolution sub-operator 120_3 depends on the subtraction sub-operator 110_2 and the subtraction sub-operator 110_3. The adder operator 130_1, the adder operator 130_2 and the adder operator 130_3 are dependent on the convolution operator 120_1, the convolution operator 120_2 and the convolution operator 120_3 respectively.
請參閱圖7,圖7是圖2之步驟S240之一實施例的詳細流程,包含以下步驟。圖7的流程是基於深度優先搜尋演算法(Depth first search algorithm)。Please refer to FIG. 7 , which is a detailed flow chart of an embodiment of step S240 in FIG. 2 , including the following steps: The flow chart of FIG. 7 is based on a depth first search algorithm.
步驟S710:尋找入度(indegree)為0的子運算子,並標記該入度為0的子運算子為目標子運算子且已訪問。入度為0的子運算子是沒有被依賴的子運算子。以圖6為例,加法子運算子130_1、加法子運算子130_2及加法子運算子130_3是入度為0的子運算子,也就是最上層的子運算子。Step S710: Find a sub-operator with
步驟S720:判斷是否有找到入度為0的子運算子。如果是,則執行步驟S730;如果否,則執行步驟S795。以下以加法子運算子130_1為例做說明。Step S720: Determine whether a sub-operator with in-
步驟S730:尋找目標子運算子的未訪問過的依賴子運算子。如圖6所示,因為卷積子運算子120_1是加法子運算子130_1的依賴子運算子(即,加法子運算子130_1依賴於卷積子運算子120_1),所以步驟S730找到卷積子運算子120_1。Step S730: Find the unvisited dependent sub-operators of the target sub-operator. As shown in FIG6 , since the convolution sub-operator 120_1 is the dependent sub-operator of the addition sub-operator 130_1 (ie, the addition sub-operator 130_1 depends on the convolution sub-operator 120_1), step S730 finds the convolution sub-operator 120_1.
步驟S740:判斷是否找到依賴子運算子。如果是,則執行步驟S750;如果否,則執行步驟S760。Step S740: Determine whether the dependent sub-operator is found. If yes, execute step S750; if not, execute step S760.
步驟S750:標記該依賴子運算子為目標子運算子且已訪問,然後執行步驟S730。承上例,在步驟S750中,卷積子運算子120_1被標記為目標子運算子且已訪問,然後於再次執行步驟S730時找到卷積子運算子120_1的依賴子運算子(假設找到減法子運算子110_1)。接者再次執行步驟S730及步驟S740;此時因為減法子運算子110_1不依賴於任何子運算子(即沒有依賴子運算子,步驟S740為否),所以流程前往步驟S760。Step S750: Mark the dependent sub-operator as the target sub-operator and has been visited, and then execute step S730. Continuing with the above example, in step S750, the convolution sub-operator 120_1 is marked as the target sub-operator and has been visited, and then when step S730 is executed again, the dependent sub-operator of the convolution sub-operator 120_1 is found (assuming that the subtraction sub-operator 110_1 is found). Then, step S730 and step S740 are executed again; at this time, because the subtraction sub-operator 110_1 does not depend on any sub-operator (that is, there is no dependent sub-operator, step S740 is no), the process goes to step S760.
步驟S760:將目標子運算子加入隊列800。承上例,此時將減法子運算子110_1加入隊列800。請參考圖8A及圖8B,圖8A及圖8B顯示隊列800內容的變化(圖8B接續圖8A)。如圖8A的第一列所示,此時隊列800僅包含減法子運算子110_1(「SUB1」)。Step S760: Add the target sub-operator to the
步驟S770:判斷目標子運算子是否為最上層的子運算子(即,入度為0的子運算子)。如果是,則執行步驟S710;如果否,則執行步驟S780。承上例,因為減法子運算子110_1不是最上層的子運算子,所以步驟S770的判斷結果為否。Step S770: Determine whether the target sub-operator is a top-level sub-operator (i.e., a sub-operator with an in-degree of 0). If yes, execute step S710; if no, execute step S780. Continuing with the above example, since the subtraction sub-operator 110_1 is not a top-level sub-operator, the determination result of step S770 is no.
步驟S780:決定依賴於該目標子運算子之一上層子運算子(即,退回上一層子運算子),並將該上層子運算子標記為目標子運算子。承上例,此時流程會退回到卷積子運算子120_1。Step S780: Determine a parent sub-operator that is dependent on the target sub-operator (i.e., return to the parent sub-operator), and mark the parent sub-operator as the target sub-operator. Continuing with the above example, the process will return to the convolution sub-operator 120_1.
步驟S790:判斷是否有未標記過的依賴子運算子。承上例,此時因為目標子運算子(卷積子運算子120_1)的依賴子運算子(減法子運算子110_1與減法子運算子110_2)中還有未被標記過的子運算子(即,減法子運算子110_2),所以步驟S790的判斷結果為是;接著,流程執行以下的步驟:步驟S730(找到減法子運算子110_2)→步驟S740(判斷結果為是)→步驟S750(將減法子運算子110_2標記為已訪問)→步驟S730(找不到減法子運算子110_2的依賴子運算子)→步驟S740(判斷結果為否)→步驟S760(將減法子運算子110_2加入隊列800後(如圖8A的第二列所示))→步驟S770(判斷結果為否)→步驟S780(將卷積子運算子120_1標記為目標子運算子)→步驟S790。此時因為目標子運算子(即,卷積子運算子120_1)的所有依賴子運算子(即,減法子運算子110_1與減法子運算子110_2)都已經被訪問過,所以步驟S790的判斷結果為否,而因此卷積子運算子120_1在接下來的步驟S760中被加入隊列800(如圖8A的第三列所示)。繼續執行步驟S770、步驟S780(將加法子運算子130_1標記為目標子運算子)、步驟S790及步驟S760(加法子運算子130_1被加入隊列800)後,步驟S770的判斷結果為是(因為加法子運算子130_1是最上層的子運算子),流程回到步驟S710以選取下一個入度為0的子運算子(例如加法子運算子130_2)。Step S790: Determine whether there are any unmarked dependent sub-operators. Continuing with the above example, at this time, because there are unmarked sub-operators (i.e., subtraction sub-operator 110_2) among the dependent sub-operators (subtraction sub-operator 110_1 and subtraction sub-operator 110_2) of the target sub-operator (convolution sub-operator 120_1), the determination result of step S790 is yes; then, the process executes the following steps: step S730 (find subtraction sub-operator 110_2) → step S740 (the determination result is yes) → step S75 0 (mark the subtraction operator 110_2 as visited) → step S730 (cannot find the dependent sub-operator of the subtraction operator 110_2) → step S740 (the judgment result is no) → step S760 (add the subtraction operator 110_2 to the queue 800 (as shown in the second column of Figure 8A)) → step S770 (the judgment result is no) → step S780 (mark the convolution operator 120_1 as the target sub-operator) → step S790. At this time, because all dependent sub-operators (i.e., subtraction sub-operators 110_1 and subtraction sub-operators 110_2) of the target sub-operator (i.e., convolution sub-operator 120_1) have been visited, the judgment result of step S790 is no, and therefore the convolution sub-operator 120_1 is added to the
上述的步驟S710~步驟S790將被重複執行(圖6的所有子運算子被加入隊列800的過程如圖8A及圖8B所示,不再贅述),直到所有的入度為0的子運算子都被訪問過(即,步驟S720的結果為否,流程前往步驟S795)。The above steps S710 to S790 will be repeatedly executed (the process of adding all sub-operators in Figure 6 to queue 800 is shown in Figures 8A and 8B and will not be repeated) until all sub-operators with an in-degree of 0 have been visited (that is, the result of step S720 is no, and the process goes to step S795).
步驟S795:依序取出隊列800中的所有子運算子。以圖8B為例,子運算子被取出隊列800的順序(也就是子運算子的操作順序)是:SUB1→SUB2→CONV1→…→CONV3→ADD3(即,與子運算子被加入隊列的順序相反)。Step S795: Sequentially remove all sub-operators from the
圖9是本發明電子裝置之一實施例的功能方塊圖。電子裝置900包含晶片901及外部記憶體902(例如,動態隨機存取記憶體(Dynamic Random Access Memory, DRAM))。晶片901與外部記憶體902互相耦接或電連接。晶片901包含處理電路910及處理器920。處理電路910與處理器920互相耦接或電連接。FIG9 is a functional block diagram of an embodiment of the electronic device of the present invention. The
處理器920控制處理電路910來共同實現晶片901的功能。處理器920可以是具有程式執行能力的電路或電子元件,例如中央處理器、微處理器、微處理單元、數位訊號處理器、特殊應用積體電路(Application Specific Integrated Circuit, ASIC),或其等效電路。The
處理電路910可以是一個智能處理單元(intelligence processing unit, IPU)或神經網路處理單元(neural-network processing unit, NPU)。處理電路910包含運算電路912(例如包含但不限於卷積引擎、向量(vector)引擎)、暫存電路914(例如包含但不限於多個暫存器)、記憶體管理電路916(例如,直接記憶體存取(Direct Memory Access, DMA))及記憶體918(例如,靜態隨機存取記憶體(Static Random Access Memory, SRAM))。暫存電路914用來儲存運算電路912執行卷積運算或向量運算時所需的資料。記憶體918可以儲存圖3之各個子運算子的輸出子張量。The
外部記憶體902儲存輸入資料Din、核心參數Kp及輸出資料Dout。記憶體管理電路916用來從外部記憶體902讀取輸入資料Din及核心參數Kp並且將其存入記憶體918、從記憶體918讀取輸入資料Din的至少一部分及核心參數Kp的至少一部分並且將其存入暫存電路914,以及將運算電路912所產生的輸出資料Dout儲存至外部記憶體902。The
以下配合圖9、圖10、圖11、圖12A及圖12B說明圖2之步驟S250的細節。圖10顯示圖3的部分的子運算子的輸出子張量的生命週期(life span)列表。橫軸對應到前述的子運算子的操作順序(不必然對應到實際的時間長度);更明確地說,減法子運算子110_1的輸出子張量TS1_o1在操作順序為0的時間點產生,並且在操作順序為5的時間點結束(即,不會再被其他子運算子使用)。請注意,子張量TS3_o1、子張量TS3_o2及子張量TS3_o3分別於操作順序為3、6及8的時間點開始產生;然而,因為子張量TS3_o1、子張量TS3_o2及子張量TS3_o3不會被提前從記憶體918中刪除(因為各為輸出資料Dout的一部分),所以圖10的生命週期列表未繪示該三個子張量。The following is a detailed description of step S250 of FIG2 in conjunction with FIG9, FIG10, FIG11, FIG12A and FIG12B. FIG10 shows a list of the life spans of the output subtensors of some of the sub-operators of FIG3. The horizontal axis corresponds to the operation order of the aforementioned sub-operators (not necessarily corresponding to the actual length of time); more specifically, the output subtensor TS1_o1 of the subtraction sub-operator 110_1 is generated at the time point when the operation order is 0, and ends at the time point when the operation order is 5 (i.e., it will no longer be used by other sub-operators). Please note that sub-tensor TS3_o1, sub-tensor TS3_o2 and sub-tensor TS3_o3 are generated at time points of
圖2之步驟S250的細節包含根據圖10之生命週期列表分配記憶體918,分配記憶體918的流程如圖11所示。圖12A及圖12B為本發明的活躍列表之一實施例的示意圖。以下的說明請同時參考圖10、圖11、圖12A及圖12B。活躍列表用來顯示子張量於記憶體918中的活躍情形(更明確地說,顯示子張量被存入記憶體918以及被從記憶體918刪除的時間點)。圖11包含以下步驟。The details of step S250 of Figure 2 include allocating
步驟S1110:建立生命週期列表。生命週期列表的一個例子如圖10所示。Step S1110: Create a life cycle list. An example of a life cycle list is shown in FIG10 .
步驟S1120:搜尋生命週期列表,以找出當前生命週期活躍的子張量。例如,子張量TS1_o1在操作順序為0的時間點開始變得活躍,而子張量TS1_o1的活躍期間為操作順序為0的時間點至操作順序為5的時間點。Step S1120: Search the life cycle list to find the sub-tensor that is currently active in the life cycle. For example, sub-tensor TS1_o1 becomes active at the time point when the operation sequence is 0, and the active period of sub-tensor TS1_o1 is from the time point when the operation sequence is 0 to the time point when the operation sequence is 5.
步驟S1130:將活躍的子張量加入活躍列表。如圖12A所示,子張量TS1_o1在生命週期為0時被加入活躍列表。Step S1130: Add the active sub-tensor to the active list. As shown in FIG12A , the sub-tensor TS1_o1 is added to the active list when the life cycle is 0.
步驟S1140:對活躍的子張量分配記憶體,即,在記憶體918中安排相對應的儲存空間。承上例,如圖12A所示,部分的記憶體918在生命週期為0時分配給子張量TS1_o1。Step S1140: Allocate memory to the active sub-tensor, that is, arrange corresponding storage space in the
步驟S1150:刪除活躍列表中不再活躍的子張量。舉例來說,因為在圖10中子張量TS2_o1在操作順序為3的時間點之後便不再活躍,所以在圖12A中子張量TS2_o1在生命週期為3時被刪除。Step S1150: Delete the sub-tensor that is no longer active in the active list. For example, because the sub-tensor TS2_o1 in FIG. 10 is no longer active after the time point when the operation sequence is 3, the sub-tensor TS2_o1 in FIG. 12A is deleted when the life cycle is 3.
步驟S1160:釋放對應於不再活躍的子張量的記憶體。因應前一步的從活躍列表中刪除子張量,此步驟釋放記憶體918中相對應的儲存空間,如此一來可以更及時且彈性地使用記憶體918。Step S1160: Release the memory corresponding to the sub-tensor that is no longer active. In response to the deletion of the sub-tensor from the active list in the previous step, this step releases the corresponding storage space in the
步驟S1170:生命週期加1。Step S1170: The life cycle is increased by 1.
若當前生命週期無不再活躍的子張量,則跳過步驟S1150和S1160直接執行步驟S1170。If there is no sub-tensor that is no longer active in the current life cycle, skip steps S1150 and S1160 and directly execute step S1170.
步驟S1180:判斷生命週期是否結束(即,判斷圖10的操作順序是否結束)。如果是,則結束圖11的流程;如果否,則執行步驟S1120以繼續找出活躍的子張量。Step S1180: Determine whether the life cycle is terminated (i.e., determine whether the operation sequence of FIG. 10 is terminated). If yes, terminate the process of FIG. 11; if no, execute step S1120 to continue to find active sub-tensors.
如上所述,根據圖10的生命週期列表及圖11的流程可以得到圖12A及圖12B的活躍列表。圖12A及圖12B的生命週期對應到圖10的操作順序。舉例來說,在圖10中,子張量TS1_o1在操作順序為0的時間點產生並且在操作順序為5的時間點結束;因此,在圖12A中子張量TS1_o1的生命週期是0~4。類似地,在圖10中子張量TS2_o2存在於操作順序為5的時間點及操作順序為6的時間點之間;因此,在圖12B中,子張量TS2_o2只存在於生命週期5。如此一來,晶片901的開發者或設計者便可根據圖12A及圖12B的活躍列表來設計或管理記憶體918;因此,可以在不增加記憶體918的前提下(為了節省成本)降低對外部記憶體902的頻寬需求(可以提升外部記憶體902的整體效能),或是在不增加外部記憶體902的記憶體頻寬的前提下進一步節省記憶體918。As described above, the active lists of FIG. 12A and FIG. 12B can be obtained according to the life cycle list of FIG. 10 and the process of FIG. 11. The life cycle of FIG. 12A and FIG. 12B corresponds to the operation sequence of FIG. 10. For example, in FIG. 10, subtensor TS1_o1 is generated at the time point of
作為比較,若沒有對圖1的運算子及張量進行切分,則晶片開發者必須預先分配記憶體918的其中一個儲存區塊給張量TS2(總資料量等效於子張量TS1_o1的資料量、子張量TS1_o2的資料量與子張量TS1_o3的資料量的總和),而且該儲存區塊直到卷積運算子120結束才能被釋放;另外,若對運算子切分後的各子運算未進行操作順序排序,則需對同一子張量如子張量TS1_i1重複運算多次,使整個AI模型的數據量增大,導致成本上升(因為對記憶體918的需求增加)或是效能下降(因為對外部記憶體902的頻寬需求增加)。在實際操作時,由於運算子的數量以及張量的尺寸都非常龐大,所以本發明所能達成的功效相當顯著。In comparison, if the operator and tensor of Figure 1 are not split, the chip developer must pre-allocate one of the storage blocks of
本發明可以擴展至包含更多運算子的AI模型。請參閱圖13,圖13是另一個AI模型的示意圖。圖13的左邊顯示AI模型包含N個運算子(運算子1、運算子2、……、運算子N),圖13的右邊顯示一個運算子被切分成H個子運算子。箭頭表示子運算子執行時的依賴關係。舉例來說,運算子2的第1子運算子需等到運算子1的第3子運算子執行完畢後才能執行。The present invention can be extended to an AI model that includes more operators. Please refer to Figure 13, which is a schematic diagram of another AI model. The left side of Figure 13 shows that the AI model includes N operators (
圖14是另一個AI模型的示意圖。AI網路1400包含加法運算子1410及卷積運算子1420,張量的大小為[1,56,56,224]。經切分後(如圖14的下半部所示),加法運算子1410及卷積運算子1420各被切分成56個子運算子(「ADD1」~「ADD56」及「CONV1」~「CONV56」)。每個加法子運算子(「ADD1」~「ADD56」)所輸出的子張量的大小為[1,1,56,224]。然而,並非所有卷積子運算子(「CONV1」~「CONV56」)的輸入子張量的大小都相同。更明確地說,第一卷積子運算子(「CONV1」)的輸入子張量的大小為[1,2,56,224],而第二卷積子運算子(「CONV2」)的輸入子張量的大小為[1,3,56,224](即,前述之可視域放大的問題)。FIG14 is a schematic diagram of another AI model.
圖15A及圖15B是本發明人工智慧模型的執行方法之一實施例的流程圖。圖15A及圖15B包含以下步驟。Figures 15A and 15B are flow charts of an embodiment of an implementation method of the artificial intelligence model of the present invention. Figures 15A and 15B include the following steps.
步驟S1505:記憶體管理電路916從外部記憶體902讀取一個張量(即,輸入資料Din)及複數個核心參數Kp,並將該張量及核心參數Kp儲存至記憶體918。Step S1505: The
步驟S1510:處理電路910(更明確地說,運算電路912)對該張量的一第一子張量進行一第一種類的運算,以產生一第一中間資料,並且記憶體管理電路916將第一中間資料存入記憶體918。以圖4為例,第一子張量可以是減法子運算子110_1的輸入子張量(即,子張量TS1_i1),第一種類的運算可以是減法運算(向量運算的一種),而第一中間資料可以是減法子運算子110_1的輸出子張量(即,子張量TS1_o1)。以圖14為例,第一子張量可以是加法子運算子(例如「ADD1」)的輸入子張量,第一種類的運算可以是加法運算(向量運算的一種),而第一中間資料可以是加法子運算子(例如「ADD1」)的輸出子張量。Step S1510: The processing circuit 910 (more specifically, the operation circuit 912) performs a first type of operation on a first sub-tensor of the tensor to generate a first intermediate data, and the
步驟S1520:處理電路910(更明確地說,運算電路912)對該張量的一第二子張量進行該第一種類的運算,以產生一第二中間資料,並且記憶體管理電路916將第二中間資料存入記憶體918。以圖4為例,第二子張量可以是減法子運算子110_2的輸入子張量(即,子張量TS1_i2),第一種類的運算可以是減法運算,而第二中間資料可以是減法子運算子110_2的輸出子張量(即,子張量TS1_o2)。以圖14為例,第二子張量可以是加法子運算子(例如「ADD2」)的輸入子張量,第一種類的運算可以是加法運算,而第二中間資料可以是加法子運算子(例如「ADD2」)的輸出子張量。Step S1520: The processing circuit 910 (more specifically, the operation circuit 912) performs the first type of operation on a second sub-tensor of the tensor to generate a second intermediate data, and the
步驟S1530:處理電路910(更明確地說,運算電路912)對該第一中間資料及該第二中間資料進行一第二種類的運算,以產生一第三中間資料,並且記憶體管理電路916將第三中間資料存入記憶體918。以圖4及圖14為例,第二種類的運算可以是卷積運算(例如,卷積子運算子120_1或「CONV1」),而第三中間資料可以是該卷積運算的結果(例如,圖4之子張量TS2_o1)。Step S1530: The processing circuit 910 (more specifically, the operation circuit 912) performs a second type of operation on the first intermediate data and the second intermediate data to generate a third intermediate data, and the
步驟S1540:記憶體管理電路916將第三中間資料從記憶體918中刪除。如圖10所示,因為操作順序為3的時間點之後的操作不會再使用子張量TS2_o1(即,子張量TS2_o1於圖12A的生命週期3開始變得不再活躍),所以可以將子張量TS2_o1從記憶體918中刪除,以釋放部分的記憶體918。Step S1540: The
步驟S1550:處理電路910(更明確地說,運算電路912)對該張量的一第三子張量進行該第一種類的運算,以產生一第四中間資料,並且記憶體管理電路916將第四中間資料存入記憶體918。以圖4為例,第三子張量可以是減法子運算子110_3的輸入子張量(即,子張量TS1_i3),第一種類的運算可以是減法運算,而第四中間資料可以是減法子運算子110_3的輸出子張量(即,子張量TS1_o3)。以圖14為例,第三子張量可以是加法子運算子(例如「ADD3」)的輸入子張量,第一種類的運算可以是加法運算,而第四中間資料可以是加法子運算子(例如「ADD3」)的輸出子張量。Step S1550: The processing circuit 910 (more specifically, the operation circuit 912) performs the first type of operation on a third sub-tensor of the tensor to generate a fourth intermediate data, and the
步驟S1560:處理電路910(更明確地說,運算電路912)對該第一中間資料、該第二中間資料及該第四中間資料進行該第二種類的運算,以產生一第五中間資料,並且記憶體管理電路916將第五中間資料存入記憶體918。以圖4及圖14為例,第二種類的運算可以是卷積運算(例如,卷積子運算子120_2或「CONV2」),而第五中間資料可以是該卷積運算的結果(例如,圖4之子張量TS2_o2)。Step S1560: The processing circuit 910 (more specifically, the operation circuit 912) performs the second type of operation on the first intermediate data, the second intermediate data, and the fourth intermediate data to generate a fifth intermediate data, and the
步驟S1570:記憶體管理電路916將第一中間資料從記憶體918中刪除。如圖10所示,因為操作順序為5的時間點之後的操作不會再使用子張量TS1_o1(即,子張量TS1_o1於圖12B的生命週期5開始變得不再活躍),所以可以將子張量TS1_o1從記憶體918中刪除,以釋放部分的記憶體918。Step S1570: The
步驟S1580:記憶體管理電路916將第五中間資料從記憶體918中刪除。如圖10所示,因為操作順序為6的時間點之後的操作不會再使用子張量TS2_o2(即,子張量TS2_o2於圖12B的生命週期6開始變得不再活躍),所以可以將子張量TS2_o2從記憶體918中刪除,以釋放部分的記憶體918。Step S1580: The
如圖15A及圖15B的討論所示,晶片901的開發者可以預先根據記憶體918的使用狀態來安排處理電路910(更明確地說,記憶體管理電路916)所執行的指令(在一些實施例中由處理器920提供給處理電路910),以達到充分利用記憶體918的目的。As shown in the discussion of Figures 15A and 15B, the developer of
請參閱圖16,圖16是本發明暫存電路914及記憶體918的儲存內容之一實施例的示意圖。在圖16的例子中,儲存在記憶體918的核心參數Kp包含減法核心參數Kp_s(向量核心參數的一種)、卷積核心參數Kp_c及加法核心參數Kp_a(向量核心參數的一種)。Please refer to Figure 16, which is a schematic diagram of an embodiment of the
減法核心參數Kp_s包含子參數Kp_s1、子參數Kp_s2及子參數Kp_s3。減法核心參數Kp_s是減法運算子110對張量TS1進行減法運算時所需的參數,而子參數Kp_s1、子參數Kp_s2及子參數Kp_s3可以分別對應到圖4的減法子運算子110_1、減法子運算子110_2及減法子運算子110_3。也就是說,運算電路912在執行減法子運算子110_1(110_2或110_3)時參考子參數Kp_s1(Kp_s2或Kp_s3)來對子張量TS1_i1(TS1_i2或TS1_i3)進行運算。在一些實施例中,子參數Kp_s1不等於子參數Kp_s2及子參數Kp_s3,且子參數Kp_s2不等於子參數Kp_s3。The subtraction core parameter Kp_s includes subparameters Kp_s1, Kp_s2, and Kp_s3. The subtraction core parameter Kp_s is a parameter required by the
加法核心參數Kp_a包含子參數Kp_a1、子參數Kp_a2及子參數Kp_a3。加法核心參數Kp_a是加法運算子130對張量TS3進行加法運算時所需的參數,而子參數Kp_a1、子參數Kp_a2及子參數Kp_a3可以分別對應到圖4的加法子運算子130_1、加法子運算子130_2及加法子運算子130_3。也就是說,運算電路912在執行加法子運算子130_1(130_2或130_3)時參考子參數Kp_a1(Kp_a2或Kp_a3)來對子張量TS3_i1(TS3_i2或TS3_i3)進行運算。在一些實施例中,子參數Kp_a1不等於子參數Kp_a2及子參數Kp_a3,且子參數Kp_a2不等於子參數Kp_a3。The addition core parameter Kp_a includes sub-parameters Kp_a1, Kp_a2, and Kp_a3. The addition core parameter Kp_a is a parameter required by the
卷積核心參數Kp_c包含子參數Kp_c1、子參數Kp_c2及子參數Kp_c3。不同於減法運算及加法運算,雖然運算子及張量皆經過切分,但是卷積子運算子120_1、卷積子運算子120_2及卷積子運算子120_3進行卷積運算時仍需要參考完整的卷積核心參數Kp_c。The convolution core parameter Kp_c includes sub-parameters Kp_c1, Kp_c2, and Kp_c3. Different from the subtraction operation and the addition operation, although the operator and the tensor are split, the convolution sub-operator 120_1, the convolution sub-operator 120_2, and the convolution sub-operator 120_3 still need to refer to the complete convolution core parameter Kp_c when performing the convolution operation.
請參閱圖17,圖17是圖15A之步驟S1510或步驟S1520的細部流程,包含以下步驟。以下的說明請同時參閱圖16。Please refer to FIG. 17 , which is a detailed flow chart of step S1510 or step S1520 of FIG. 15A , including the following steps. Please refer to FIG. 16 for the following description.
步驟S1710:記憶體管理電路916從記憶體918讀取該向量核心參數的一目標部分,並且將該目標部分儲存至暫存電路914。更明確地說,對步驟S1510而言,目標部分可以是子參數Kp_s1。對步驟S1520而言,目標部分可以是子參數Kp_s2。如圖16所示,暫存電路914儲存子參數Kp_s1及/或子參數Kp_s2及其他資料(例如待運算的子張量)。由於步驟S1510及步驟S1520的第一種類的運算(例如,向量運算)只需用到減法核心參數Kp_s的一部分(即,目標部分),所以此時暫存電路914可以不用儲存完整的減法核心參數Kp_s以節省儲存空間。在一些實施例中,記憶體管理電路916於步驟S1510(S1520)之前將子參數Kp_s1(Kp_s2)存入暫存電路914,並且於步驟S1510(S1520)完成後將子參數Kp_s1(Kp_s2)從暫存電路914中刪除,以節省儲存空間。Step S1710: The
步驟S1720:運算電路912參考該向量核心參數的該目標部分對目標子張量進行目標向量運算,以產生目標中間資料。更明確地說,對步驟S1510而言,目標子張量可以是子張量TS1_i1,目標向量運算可以是減法子運算子110_1,而目標中間資料可以是子張量TS1_o1。對步驟S1520而言,目標子張量可以是子張量TS1_i2,目標向量運算可以是減法子運算子110_2,而目標中間資料可以是子張量TS1_o2。對圖14而言,目標部分可以是子參數Kp_a1,目標子張量可以是一加法子運算子(例如「ADD1」)的輸入子張量,目標向量運算可以是該加法子運算子,而第一中間資料可以是該加法子運算子的輸出子張量。Step S1720: The
如上所述,因為原本的張量被切分為多個子張量,所以對子張量進行向量運算只需參考部分的核心參數即可。As mentioned above, because the original tensor is split into multiple sub-tensors, vector operations on the sub-tensors only require reference to some of the core parameters.
本技術領域具有通常知識者可以從圖17的說明了解圖15B之步驟S1550的細節,故不再贅述。舉例來說,對步驟S1550而言,目標部分可以是子參數Kp_s3,目標子張量可以是子張量TS1_i3,目標向量運算可以是減法子運算子110_3,而目標中間資料可以是子張量TS1_o3。A person skilled in the art can understand the details of step S1550 of FIG. 15B from the description of FIG. 17 , so it is not described in detail. For example, for step S1550, the target part may be sub-parameter Kp_s3, the target sub-tensor may be sub-tensor TS1_i3, the target vector operation may be subtraction sub-operator 110_3, and the target intermediate data may be sub-tensor TS1_o3.
請參閱圖18,圖18是圖15A之步驟S1530的細部流程,包含以下步驟。以下的說明請同時參閱圖19。Please refer to FIG. 18, which is a detailed flow chart of step S1530 of FIG. 15A, including the following steps. Please refer to FIG. 19 for the following description.
步驟S1810:記憶體管理電路916從記憶體918讀取卷積核心參數Kp_c,並且將卷積核心參數Kp_c儲存至暫存電路914。如圖19所示,暫存電路914於卷積運算開始前儲存卷積核心參數Kp_c及其他資料(例如待運算的子張量)。由於步驟S1530的第二種類的運算(例如,卷積運算)需用到完整的卷積核心參數Kp_c,所以此時暫存電路914需儲存完整卷積核心參數Kp_c。Step S1810: The
步驟S1820:運算電路912參考卷積核心參數Kp_c對第一中間資料及第二中間資料進行該第二種類的運算,以產生第三中間資料。參考卷積核心參數Kp_c來對張量進行卷積運算為本技術領域具有通常知識者所熟知,故不再贅述。在一些實施例中,記憶體管理電路916於步驟S1820結束後從暫存電路914刪除卷積核心參數Kp_c。Step S1820: The
請參閱圖20,圖20是圖15B之步驟S1560的細部流程,包含以下步驟。以下的說明請同時參閱圖19。Please refer to FIG. 20 , which is a detailed flow chart of step S1560 of FIG. 15B , including the following steps. Please refer to FIG. 19 for the following description.
步驟S2010:步驟S2010與步驟S1810相似,故不再贅述。請注意,若記憶體管理電路916於步驟S1530結束後沒有將卷積核心參數Kp_c從暫存電路914刪除,則可以略過步驟S2010。Step S2010: Step S2010 is similar to step S1810 and will not be described in detail. Please note that if the
步驟S2020:運算電路912參考卷積核心參數Kp_c對第一中間資料、第二中間資料及第四中間資料進行該第二種類的運算,以產生第五中間資料。步驟S2020與步驟S1820相似,故不再贅述。Step S2020: The
請參閱圖21,圖21是本發明記憶體管理電路916之一實施例的功能方塊圖。記憶體管理電路916包含至少2個通道(通道916a及通道916b),每個通道可以獨立操作,而且多個通道可以同時操作。基於此特性,本發明進一步將子運算子及子張量切分為數個小塊,使得處理電路910可以採用多級流水線(multistage pipeline)的方式對AI模型的運算,以提升晶片901的效能。Please refer to FIG. 21 , which is a functional block diagram of an embodiment of the
請參閱圖22,圖22是本發明多級流水線的示意圖。圖22以減法子運算子110_1、減法子運算子110_2及卷積子運算子120_1為例做說明。在圖22的例子中,減法子運算子110_1進一步被切分為運算塊110_1a(「SUB1a」)及運算塊110_1b(「SUB1b」),且子張量TS1_i1進一步被切分為資料塊TS1_i1a及資料塊TS1_i1b;減法子運算子110_2進一步被切分為運算塊110_2a(「SUB2a」)及運算塊110_2b(「SUB2b」),且子張量TS1_i2進一步被切分為資料塊TS1_i2a及資料塊TS1_i2b;卷積子運算子120_1進一步被切分為運算塊120_1a(「CONV1a」)及運算塊120_1b(「CONV1b」),且子張量TS2_i1進一步被切分為資料塊TS2_i1a及資料塊TS2_i1b。如此一來,當通道916a進行與運算塊110_1a相關的操作時(時間點T0與時間點T3之間),通道916b可以實質上同時進行與運算塊110_1b相關的操作(時間點T1與時間點T4之間)。相較於單級流水線(single-stage pipeline)(即,沒有同時使用多個通道來對AI模型進行運算),同時使用2個通道約可節省一半的時間。類似地,同時使用N個通道大約只需單級流水線的處理時間的1/N。Please refer to FIG. 22, which is a schematic diagram of the multi-stage pipeline of the present invention. FIG. 22 uses the subtraction sub-operator 110_1, the subtraction sub-operator 110_2 and the convolution sub-operator 120_1 as examples for explanation. In the example of FIG. 22, the subtraction sub-operator 110_1 is further divided into operation blocks 110_1a ("SUB1a") and operation blocks 110_1b ("SUB1b"), and the sub-tensor TS1_i1 is further divided into data blocks TS1_i1a and data blocks TS1_i1b; the subtraction sub-operator 110_2 is further divided into operation blocks 110_2a ("SUB2a") and operation blocks 110_ 2b ("SUB2b"), and the sub-tensor TS1_i2 is further divided into data blocks TS1_i2a and TS1_i2b; the convolution sub-operator 120_1 is further divided into operation blocks 120_1a ("CONV1a") and operation blocks 120_1b ("CONV1b"), and the sub-tensor TS2_i1 is further divided into data blocks TS2_i1a and TS2_i1b. In this way, when the
請參閱圖23,圖23是本發明多級流水線操作之一實施例的流程圖,包含以下步驟。Please refer to Figure 23, which is a flow chart of an embodiment of the multi-stage pipeline operation of the present invention, including the following steps.
步驟S2310:記憶體管理電路916使用第一通道(例如,通道916a)從記憶體918讀取第一子張量(例如,子張量TS1_i1)的第一資料塊(例如,資料塊TS1_i1a),並將第一資料塊存入暫存電路914。舉例來說,步驟S2310可以對應到圖22的時間點T0與時間點T1之間(即,「SUB1a載入(load)」操作)。Step S2310: The
步驟S2320:運算電路912對第一資料塊(例如,資料塊TS1_i1a)進行第一種類的運算(例如,減法運算)以產生該第一中間資料的一第一部分(例如,子張量TS1_o1的一部分)。舉例來說,步驟S2320可以對應到圖22的時間點T1與時間點T2之間(即,「SUB1a計算(compute)」操作)。Step S2320: The
步驟S2330:記憶體管理電路916使用第二通道(例如,通道916b)從記憶體918讀取第一子張量的第二資料塊(例如,資料塊TS1_i1b),並將第二資料塊存入暫存電路914。舉例來說,步驟S2330可以對應到圖22的時間點T1與時間點T2之間(即,「SUB1b載入」操作)。換言之,步驟S2320與步驟S2330至少部分同時執行。Step S2330: The
步驟S2340:記憶體管理電路916使用第一通道(例如,通道916a)將該第一中間資料的該第一部分(例如,子張量TS1_o1的一部分)儲存至記憶體918。舉例來說,步驟S2340可以對應到圖22的時間點T2與時間點T3之間(即,「SUB1a儲存(store)」操作)。Step S2340: The
步驟S2350:運算電路912對該第二資料塊(例如,資料塊TS1_i1b)進行該第一種類的運算(例如,減法運算)以產生該第一中間資料的一第二部分(例如,子張量TS1_o1的一部分)。舉例來說,步驟S2350可以對應到圖22的時間點T2與時間點T3之間(即,「SUB1b計算」操作)。換言之,步驟S2340與步驟S2350至少部分同時執行。Step S2350: The
步驟S2360:記憶體管理電路916使用該第二通道(例如,通道916b)將該第一中間資料的該第二部分(例如,子張量TS1_o1的一部分)儲存至記憶體918。舉例來說,步驟S2360可以對應到圖22的時間點T3與時間點T4之間(即,「SUB1b儲存」操作)。Step S2360: The
本技術領域具有通常知識者可以根據圖23的說明了解圖22的時間點T4以後的其他操作,故不再贅述。A person having ordinary knowledge in this technical field can understand other operations after time point T4 in Figure 22 based on the description of Figure 23, so they will not be elaborated here.
雖然本發明之實施例如上所述,然而該些實施例並非用來限定本發明,本技術領域具有通常知識者可根據本發明之明示或隱含之內容對本發明之技術特徵施以變化,凡此種種變化均可能屬於本發明所尋求之專利保護範疇,換言之,本發明之專利保護範圍須視本說明書之申請專利範圍所界定者為準。Although the embodiments of the present invention are described above, these embodiments are not intended to limit the present invention. A person having ordinary knowledge in the technical field may modify the technical features of the present invention according to the explicit or implicit contents of the present invention. All such modifications may fall within the scope of patent protection sought by the present invention. In other words, the scope of patent protection of the present invention shall be subject to the scope of the patent application defined in this specification.
100,1400:AI網路 Din:輸入資料 Dout:輸出資料 110,SUB:減法運算子 120,1420,CONV:卷積運算子 130,1410,ADD:加法運算子 TS1,TS2,TS3,TS4:張量 110_1,110_2,110_3,SUB1,SUB2,SUB3:減法子運算子 120_1,120_2,120_3,CONV1,CONV2,CONV3,CONV56:卷積子運算子 130_1,130_2,130_3,ADD1,ADD2,ADD3,ADD56:加法子運算子 TS1_i1,TS1_i2,TS1_i3,TS3_o1,TS3_o2,TS3_o3,TS3_i1,TS3_i2,TS3_i3,TS2_o1,TS2_o2,TS2_o3,TS1_o1,TS1_o2,TS1_o3,TS2_i1,TS2_i2,TS2_i3:子張量 800:隊列 900:電子裝置 901:晶片 902:外部記憶體 910:處理電路 920:處理器 912:運算電路 914:暫存電路 916:記憶體管理電路 918:記憶體 Kp:核心參數 Kp_s:減法核心參數 Kp_c:卷積核心參數 Kp_a:加法核心參數 Kp_s1,Kp_s2,Kp_s3,Kp_a1,Kp_a2,Kp_a3,Kp_c1,Kp_c2,Kp_c3:子參數 916a,916b:通道 110_1a,110_1b,110_2a,110_2b,120_1a,120_1b,SUB1a,SUB1b,SUB2a,SUB2b,CONV1a,CONV1b:運算塊 TS1_i1a,TS1_i1b,TS1_i2a,TS1_i2b,TS2_i1a,TS2_i1b:資料塊 T0,T1,T2,T3,T4:時間點 S210,S220,S230,S240,S250,S510,S520,S530,S710,S720,S730,S740,S750,S760,S770,S780,S790,S795,S1110,S1120,S1130,S1140,S1150,S1160,S1170,S1180,S1505,S1510,S1520,S1530,S1540,S1550,S1560,S1570,S1580,S1710,S1720,S1810,S1820,S2010,S2020,S2310,S2320,S2330,S2340,S2350,S2360:步驟 100,1400:AI network Din:input data Dout:output data 110,SUB:subtraction operator 120,1420,CONV:convolution operator 130,1410,ADD:addition operator TS1,TS2,TS3,TS4:tensor 110_1,110_2,110_3,SUB1,SUB2,SUB3:subtraction operator 120_1,120_2,120_3,CONV1,CONV2,CONV3,CONV56:convolution operator 130_1,130_2,130_3,ADD1,ADD2,ADD3,ADD56:addition operator TS1_i1,TS1_i2,TS1_i3,TS3_o1,TS3_o2,TS3_o3,TS3_i1,TS3_i2,TS3_i3,TS2_o1,TS2_o2,TS2_o3,TS1_o1,TS1_o2,TS1_o3,TS2_i1,TS2_i2,TS2_i3: subtensor 800: queue 900: electronic device 901: chip 902: external memory 910: processing circuit 920: processor 912: operation circuit 914: cache circuit 916: memory management circuit 918: memory Kp: core parameter Kp_s: subtraction core parameter Kp_c: Convolution core parameter Kp_a: Addition core parameter Kp_s1, Kp_s2, Kp_s3, Kp_a1, Kp_a2, Kp_a3, Kp_c1, Kp_c2, Kp_c3: Sub parameter 916a, 916b: Channel 110_1a, 110_1b, 110_2a, 110_2b, 120_1a, 120_1b, SUB1a, SUB1b, SUB2a, SUB2b, CONV1a, CONV1b: Operation block TS1_i1a, TS1_i1b, TS1_i2a, TS1_i2b, TS2_i1a, TS2_i1b: Data block T0, T1, T2, T3, T4: Time point S210,S220,S230,S240,S250,S510,S520,S530,S710,S720,S730,S740,S750,S 760,S770,S780,S790,S795,S1110,S1120,S1130,S1140,S1150,S1160,S1170,S 1180,S1505,S1510,S1520,S1530,S1540,S1550,S1560,S1570,S1580,S1710,S1720,S1810,S1820,S2010,S2020,S2310,S2320,S2330,S2340,S2350,S2360: Steps
圖1顯示AI網路的一個例子; 圖2為本發明人工智慧模型的運算排程方法之一實施例的流程圖; 圖3是圖1之張量及運算子經切分後的結果; 圖4顯示子運算子之間的連線的拓樸圖(topological graph); 圖5是圖2之步驟S230之一實施例的詳細流程; 圖6顯示多個子運算子之間的依賴關係; 圖7是圖2之步驟S240之一實施例的詳細流程; 圖8A及圖8B顯示本發明的隊列之一實施例的示意圖; 圖9是本發明電子裝置之一實施例的功能方塊圖; 圖10顯示本發明的生命週期列表之一實施例的示意圖; 圖11顯示分配記憶體的流程圖; 圖12A及圖12B為本發明的活躍列表之一實施例的示意圖; 圖13是另一個AI模型的示意圖; 圖14是另一個AI模型的示意圖; 圖15A及圖15B是本發明人工智慧模型的執行方法之一實施例的流程圖; 圖16是本發明暫存電路及記憶體的儲存內容之一實施例的示意圖; 圖17是圖15A之步驟S1510或步驟S1520的細部流程; 圖18是圖15A之步驟S1530的細部流程; 圖19是本發明暫存電路及記憶體的儲存內容之另一實施例的示意圖; 圖20是圖15B之步驟S1560的細部流程; 圖21是本發明記憶體管理電路之一實施例的功能方塊圖; 圖22是本發明多級流水線的示意圖;以及 圖23是本發明多級流水線操作之一實施例的流程圖。 FIG1 shows an example of an AI network; FIG2 is a flow chart of an embodiment of the operation scheduling method of the artificial intelligence model of the present invention; FIG3 is the result of the tensor and operator of FIG1 after segmentation; FIG4 shows a topological graph of the connection between sub-operators; FIG5 is a detailed flow chart of an embodiment of step S230 of FIG2; FIG6 shows the dependency relationship between multiple sub-operators; FIG7 is a detailed flow chart of an embodiment of step S240 of FIG2; FIG8A and FIG8B show schematic diagrams of an embodiment of the queue of the present invention; FIG9 is a functional block diagram of an embodiment of the electronic device of the present invention; FIG10 shows a schematic diagram of an embodiment of the life cycle list of the present invention; FIG11 shows a flowchart of memory allocation; FIG12A and FIG12B are schematic diagrams of an embodiment of the active list of the present invention; FIG13 is a schematic diagram of another AI model; FIG14 is a schematic diagram of another AI model; FIG15A and FIG15B are flowcharts of an embodiment of the execution method of the artificial intelligence model of the present invention; FIG16 is a schematic diagram of an embodiment of the temporary storage circuit and the storage content of the memory of the present invention; FIG17 is a detailed flow of step S1510 or step S1520 of FIG15A; FIG18 is a detailed flow of step S1530 of FIG15A; FIG19 is a schematic diagram of another embodiment of the temporary storage circuit and the storage content of the memory of the present invention; FIG. 20 is a detailed flow chart of step S1560 of FIG. 15B; FIG. 21 is a functional block diagram of an embodiment of the memory management circuit of the present invention; FIG. 22 is a schematic diagram of a multi-stage pipeline of the present invention; and FIG. 23 is a flow chart of an embodiment of the multi-stage pipeline operation of the present invention.
900:電子裝置 900: Electronic devices
901:晶片 901: Chip
902:外部記憶體 902: External memory
910:處理電路 910: Processing circuit
912:運算電路 912: Operational circuit
914:暫存電路 914: Temporary circuit
916:記憶體管理電路 916: Memory management circuit
918:記憶體 918:Memory
920:處理器 920: Processor
Kp:核心參數 Kp: core parameters
Din:輸入資料 Din: Input data
Dout:輸出資料 Dout: output data
Claims (16)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW112108662A TWI869788B (en) | 2023-03-09 | 2023-03-09 | Processing circuit and computation scheduling method of artificial intelligence model |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW112108662A TWI869788B (en) | 2023-03-09 | 2023-03-09 | Processing circuit and computation scheduling method of artificial intelligence model |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202437087A TW202437087A (en) | 2024-09-16 |
| TWI869788B true TWI869788B (en) | 2025-01-11 |
Family
ID=93609508
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW112108662A TWI869788B (en) | 2023-03-09 | 2023-03-09 | Processing circuit and computation scheduling method of artificial intelligence model |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI869788B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110674936A (en) * | 2019-09-24 | 2020-01-10 | 上海寒武纪信息科技有限公司 | Neural network processing method and device, computer equipment and storage medium |
| EP3888012A1 (en) * | 2018-12-31 | 2021-10-06 | Microsoft Technology Licensing, LLC | Adjusting precision and topology parameters for neural network training based on a performance metric |
| US20220391665A1 (en) * | 2019-09-24 | 2022-12-08 | Anhui Cambricon Information Technology Co., Ltd. | Method for splitting neural network model by using multi-core processor, and related product |
| TW202301109A (en) * | 2021-06-17 | 2023-01-01 | 美商萬國商業機器公司 | Single function to perform combined matrix multiplication and bias add operations |
-
2023
- 2023-03-09 TW TW112108662A patent/TWI869788B/en active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3888012A1 (en) * | 2018-12-31 | 2021-10-06 | Microsoft Technology Licensing, LLC | Adjusting precision and topology parameters for neural network training based on a performance metric |
| CN110674936A (en) * | 2019-09-24 | 2020-01-10 | 上海寒武纪信息科技有限公司 | Neural network processing method and device, computer equipment and storage medium |
| US20220391665A1 (en) * | 2019-09-24 | 2022-12-08 | Anhui Cambricon Information Technology Co., Ltd. | Method for splitting neural network model by using multi-core processor, and related product |
| TW202301109A (en) * | 2021-06-17 | 2023-01-01 | 美商萬國商業機器公司 | Single function to perform combined matrix multiplication and bias add operations |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202437087A (en) | 2024-09-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20200201763A1 (en) | Memory hierarchy-aware processing | |
| CN109992366B (en) | Task scheduling method and scheduling device | |
| US11645120B2 (en) | Memory bandwidth allocation for multi-tenant FPGA cloud infrastructures | |
| WO2020125396A1 (en) | Processing method and device for shared data and server | |
| CN113296788B (en) | Instruction scheduling method, device, equipment and storage medium | |
| TWI706343B (en) | Sample playback data access method, device and computer equipment | |
| CN113360188B (en) | Parallel processing method and device for optimizing sparse matrix-vector multiplication | |
| CN118394465A (en) | Task scheduling method, system, electronic device and storage medium | |
| US20250284539A1 (en) | Distributed graph data processing system, method, apparatus and device, and storage medium | |
| TWI869788B (en) | Processing circuit and computation scheduling method of artificial intelligence model | |
| WO2024119988A1 (en) | Process scheduling method and apparatus in multi-cpu environment, electronic device, and medium | |
| WO2013058396A1 (en) | Task allocation device and task allocation method | |
| CN109343859A (en) | Information processing method, device and storage medium | |
| JP7367365B2 (en) | Task execution management device, task execution management method, and task execution management program | |
| CN116069722A (en) | Processing circuit of artificial intelligent model and operation scheduling method | |
| CN116301874A (en) | Code compiling method, electronic device and storage medium | |
| WO2025200670A1 (en) | Load balancing-based data processing system and method, device, and medium | |
| WO2025112505A1 (en) | Data storage method, apparatus and device, and computer-readable storage medium | |
| JP2018132948A (en) | Loading program, loading method, and information processing device | |
| CN112307272B (en) | Method, device, computing equipment and storage medium for determining relation information between objects | |
| Song et al. | Compress blocks or not: Tradeoffs for energy consumption of a big data processing system | |
| JPWO2009031474A1 (en) | Information search system, information search method and program | |
| CN119847762A (en) | Data parallelism-based job scheduling system and method | |
| US10404257B2 (en) | Information processing apparatus, information processing method, and computer-readable medium storing information processing program | |
| CN109597689B (en) | Distributed file system memory optimization method, device, equipment and medium |