TWI774411B

TWI774411B - Model compression method and model compression system

Info

Publication number: TWI774411B
Application number: TW110120608A
Authority: TW
Inventors: 郭王鼎志
Original assignee: 威盛電子股份有限公司
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2022-08-11
Also published as: CN113570045A; TW202248904A

Abstract

A model compression method includes: performing a model pruning operation upon an original model with deep neural network architecture to generate a compressed model, feeding a same test data into the original model and the compressed model, estimating similarity between a first output data obtained from processing the test data by the original model and a second output data obtained from processing the test data by the compressed model, and determining how to further adjust the model pruning operation through reinforcement learning that uses the similarity as a reward.

Description

Model compression method and model compression system

本發明係有關於模型壓縮，尤指一種透過以相似度作為獎勵的強化學習機制來調整模型剪枝操作的模型壓縮方法與模型壓縮系統。The present invention relates to model compression, in particular to a model compression method and model compression system for adjusting model pruning operations through a reinforcement learning mechanism that uses similarity as a reward.

在模型壓縮當中有個技術是透過現有的模型(教師模型)去訓練一個更小的模型(學生模型)，教師模型通常參數量較大且不易於部屬在現有設備上，所以透過這種方式來訓練一個能力相似的小模型並部屬到移動設備上，然而這種方式多半還是必須手動設計學生模型的參數，因此亟需一種能夠自動找尋合適的學生模型(亦即壓縮後模型)的方式。There is a technique in model compression to train a smaller model (student model) through the existing model (teacher model). The teacher model usually has a large number of parameters and is not easy to deploy on existing equipment, so this method is used to train a smaller model (student model). To train a small model with similar capabilities and deploy it to a mobile device, however, most of the methods still have to manually design the parameters of the student model, so there is an urgent need for a method that can automatically find a suitable student model (ie, the compressed model).

因此，本發明的目的之一在於提出一種透過以相似度作為獎勵的強化學習機制來進行模型剪枝操作的模型壓縮方法與模型壓縮系統。Therefore, one of the objectives of the present invention is to propose a model compression method and model compression system for performing model pruning operations through a reinforcement learning mechanism that uses similarity as a reward.

在本發明的一實施例中，揭露一種模型壓縮方法。該模型壓縮方法包含：針對具有一深度神經網路架構之一原始模型來進行一模型剪枝操作，以產生一壓縮後模型；將同一測試資料分別輸入至該原始模型以及該壓縮後模型；計算該原始模型處理該測試資料所得到之一第一輸出資料以及該壓縮後模型處理該測試資料所得到之一第二輸出資料之間的相似度；以及以該相似度作為獎勵，透過強化學習來判斷如何進一步調整該模型剪枝操作。In an embodiment of the present invention, a model compression method is disclosed. The model compression method includes: performing a model pruning operation on an original model with a deep neural network structure to generate a compressed model; inputting the same test data into the original model and the compressed model respectively; calculating The similarity between a first output data obtained by the original model processing the test data and a second output data obtained by the compressed model processing the test data; and using the similarity as a reward, through reinforcement learning Determine how to further adjust the model pruning operation.

在本發明的另一實施例中，揭露一種模型壓縮系統。該模型壓縮系統包含一儲存裝置以及一處理器。該儲存裝置用以儲存一程式碼。該處理器用以載入並執行該程式碼，以執行以下操作：針對具有一深度神經網路架構之一原始模型來進行一模型剪枝操作，以產生一壓縮後模型；將同一測試資料分別輸入至該原始模型以及該壓縮後模型；計算該原始模型處理該測試資料所得到之一第一輸出資料以及該壓縮後模型處理該測試資料所得到之一第二輸出資料之間的相似度；以及以該相似度作為獎勵，透過強化學習來判斷如何進一步調整該模型剪枝操作。In another embodiment of the present invention, a model compression system is disclosed. The model compression system includes a storage device and a processor. The storage device is used for storing a code. The processor is used for loading and executing the code to perform the following operations: performing a model pruning operation on an original model having a deep neural network architecture to generate a compressed model; inputting the same test data respectively to the original model and the compressed model; calculating the similarity between a first output data obtained by processing the test data by the original model and a second output data obtained by processing the test data by the compressed model; and Using the similarity as a reward, reinforcement learning is used to determine how to further adjust the model pruning operation.

本發明模型壓縮方法採用相似度來作為模型剪枝(模型壓縮)的依據，因此使用者無需提供標記過的資料來作為測試資料，可減少資料標記的成本與時間，另外，使用者無需提供測試原始程式碼，可直接輸入模型來進行壓縮，故能有效推廣模型壓縮的應用，再者，壓縮後留下的泛化特徵也不易產生過擬合。The model compression method of the present invention uses similarity as the basis for model pruning (model compression), so the user does not need to provide the marked data as the test data, which can reduce the cost and time of data marking. In addition, the user does not need to provide the test data The original code can be directly input into the model for compression, so it can effectively promote the application of model compression. Moreover, the generalization features left after compression are not prone to overfitting.

第1圖為根據本發明一實施例之模型壓縮系統的示意圖。如第1圖所示，模型壓縮系統100包含一處理器102以及一儲存裝置104。儲存裝置104用以儲存一程式碼Code_MC，例如儲存裝置104可以是傳統硬碟、固態硬碟、記憶體等等，但本發明並不以此為限。處理器102可載入並執行程式碼Code_MC，以執行第2圖所示之模型壓縮方法中的各個步驟。FIG. 1 is a schematic diagram of a model compression system according to an embodiment of the present invention. As shown in FIG. 1 , the model compression system 100 includes a processor 102 and a storage device 104 . The storage device 104 is used to store a program code Code_MC. For example, the storage device 104 may be a conventional hard disk, a solid-state hard disk, a memory, etc., but the invention is not limited thereto. The processor 102 can load and execute the code Code_MC to execute various steps in the model compression method shown in FIG. 2 .

請一併參閱第2圖與第3圖。第2圖為本發明一實施例之模型壓縮方法的流程圖。第3圖為第2圖所示之模型壓縮方法的操作示意圖。請注意，假若可以得到相同結果，則模型壓縮方法不一定要完全遵照第2圖所示之步驟來依序執行，此外，根據設計需求及/或應用需求，模型壓縮方法亦可修改來新增其它步驟。於步驟202，針對具有一深度神經網路(deep neural network)架構之一原始模型302來進行一模型剪枝(model pruning)操作，以產生一壓縮後模型(compression model)304。舉例來說，原始模型302可以是基於卷積神經網路(convolution neural network, CNN)架構進行訓練所得到的模型，而模型剪枝的目標是只需保留重要的權重而刪除影響較小的權重，換言之，相較於原始模型302，壓縮後模型304會具有較少的參數數量，故可以降低計算成本與儲存空間，如此一來，便可將壓縮後模型304部署至運算能力有限的產品端，像是手機、邊緣裝置(edge device) 等等。此外，本發明的模型剪枝操作也希望能讓壓縮後模型302的輸出能盡量趨近原始模型302的輸出，進一步內容將於後詳述。Please refer to Figure 2 and Figure 3 together. FIG. 2 is a flowchart of a model compression method according to an embodiment of the present invention. FIG. 3 is a schematic diagram of the operation of the model compression method shown in FIG. 2 . Please note that if the same result can be obtained, the model compression method does not have to be executed in sequence according to the steps shown in Figure 2. In addition, according to design requirements and/or application requirements, the model compression method can also be modified and added. other steps. In step 202 , a model pruning operation is performed on an original model 302 having a deep neural network architecture to generate a compression model 304 . For example, the original model 302 may be a model obtained by training based on a convolutional neural network (CNN) architecture, and the goal of model pruning is to keep only important weights and delete less influential weights , in other words, compared with the original model 302, the compressed model 304 will have a smaller number of parameters, so the computational cost and storage space can be reduced, so that the compressed model 304 can be deployed to the product end with limited computing power , such as mobile phones, edge devices, etc. In addition, the model pruning operation of the present invention also hopes that the output of the compressed model 302 can be as close to the output of the original model 302 as possible, and further details will be described later.

於步驟204，將同一測試資料308分別輸入至原始模型302以及壓縮後模型304來進行處理，換言之，基於同一測試資料308，原始模型302的輸出以及壓縮後模型304的輸出便可用來評估壓縮後模型304是否與原始模型302相似。In step 204, the same test data 308 is input to the original model 302 and the compressed model 304 for processing. In other words, based on the same test data 308, the output of the original model 302 and the output of the compressed model 304 can be used to evaluate the compressed model. Whether the model 304 is similar to the original model 302 .

於步驟206，計算原始模型302處理測試資料308所得到之輸出資料D1以及壓縮後模型304處理測試資料308所得到之輸出資料D2之間的相似度(similarity)，因此，該相似度的數值可代表修剪後的模型的輸出特徵是否與修剪前的模型的輸出特徵相似。In step 206, the similarity between the output data D1 obtained by processing the test data 308 in the original model 302 and the output data D2 obtained by processing the test data 308 in the compressed model 304 is calculated. Represents whether the output features of the pruned model are similar to the output features of the model before pruning.

於步驟208，以步驟206所計算得到的相似度作為獎勵(reward)，透過強化學習(reinforcement learning)來判斷如何進一步調整模型剪枝操作。舉例來說，強化學習的主體(agent)可採用深度確定性策略梯度(deep deterministic policy gradient, DDPG)演算法來決定所要採取的動作(action)，其中該動作係用以選擇壓縮的部位，進而達到調整模型剪枝操作的目的。在其他實施例中，亦可使用其他演算法（例如截止自然策略梯度(Truncated Natural Policy Gradient, TNPG)演算法、交叉熵(Cross Entropy Method, CEM)演算法等）來決定所要採取的動作。In step 208, the similarity calculated in step 206 is used as a reward, and how to further adjust the model pruning operation is determined through reinforcement learning. For example, an agent of reinforcement learning can use a deep deterministic policy gradient (DDPG) algorithm to determine the action to take, where the action is used to select the parts to be compressed, and then To achieve the purpose of adjusting the model pruning operation. In other embodiments, other algorithms (eg, Truncated Natural Policy Gradient (TNPG) algorithm, Cross Entropy Method (CEM) algorithm, etc.) may also be used to determine the action to be taken.

舉例來說，假設使用者所輸入之原始模型302是一個由3個卷積層(convolution layer)所組成的結構，且所具有的通道尺寸(channel size)分別為[32, 64, 128]。一開始初始化的主體306根據原始模型302的模型信息（例如包括輸入尺寸、各層卷積核尺寸、各層浮點運算次數等等）給出3層各自的初始壓縮率為[60%, 40%, 70%]，因此，本發明模型壓縮方法使用強化學習的方式來對原始模型302進行壓縮，使得壓縮後模型304所具有的通道尺寸分別為[12, 38, 38]，並得到相似度為0.3，接著，相似度為0.3的結果會回饋給主體306，由主體306判斷接下來的壓縮方向（例如調整壓縮的部位），基於該相似度以及該模型信息調整所期望的各層壓縮率，後續模型壓縮操作透過調整後的模型剪枝操作來對原始模型302進行壓縮，使得壓縮後模型304所具有的通道尺寸分別為[14, 32, 64]，並得到相似度為0.4；上述模型壓縮操作會迭代執行，以透過強化學習的方式來得到具有較高相似度的壓縮後模型304。For example, it is assumed that the original model 302 input by the user is a structure composed of three convolution layers (convolution layers) with channel sizes [32, 64, 128], respectively. The main body 306 initialized at the beginning gives the respective initial compression rates of the three layers [60%, 40%, 70%], therefore, the model compression method of the present invention uses reinforcement learning to compress the original model 302, so that the channel sizes of the compressed model 304 are [12, 38, 38] respectively, and the similarity is 0.3 , and then, the result with a similarity of 0.3 will be fed back to the main body 306, and the main body 306 will determine the next compression direction (for example, adjust the compression part), and adjust the desired compression rate of each layer based on the similarity and the model information. The compression operation compresses the original model 302 through the adjusted model pruning operation, so that the channel sizes of the compressed model 304 are [14, 32, 64] respectively, and the similarity is 0.4; the above model compression operation will Iterative execution is performed to obtain a compressed model 304 with higher similarity through reinforcement learning.

於本發明之一實施方式中，模型壓縮方法可參照基於自動化機器學習之模型壓縮(AutoML for Model Compression, AMC)的習知架構來實作模型剪枝，但並不以此為限，本領域技術人員可以知道其他多種模型壓縮方法，於此不再贅述。習知模型壓縮架構是將壓縮後模型的輸出作為獎勵，以讓強化學習來判斷如何進一步對原始模型進行壓縮，進一步來說，習知架構採用準確率(accuracy)來作為強化學習之主體的獎勵，為了準確率的計算，需要使用者提供標記過的資料(labeled data)來作為饋入至壓縮後模型的測試資料，以便透過標記所提供的資訊來得知壓縮後模型之輸出的準確率，然而，對使用者而言，資料的標記相當費時費工，此外，計算準確率時，一般會取壓縮後模型之輸出中的最大值來跟標記進行比較，因此，準確率的計算根本不在意壓縮後模型之輸出中除了最大值以外的其它數值，假若輸入資料較難判斷時，壓縮後模型之輸出中的數值會彼此十分接近，單用準確率來作為強化學習之主體的獎勵，可能造成模型過度自信並且損失部分特徵的判斷能力，因此造成類似過擬合(overfitting)的結果或者是與原始模型不相同的輸出，再者，為了得知要採用哪種準確率的算法，習知架構亦需要使用者提供測試原始程式碼(source code)。In one embodiment of the present invention, the model compression method can refer to the conventional framework of automatic machine learning based model compression (AutoML for Model Compression, AMC) to implement model pruning, but it is not limited thereto. The skilled person may know other various model compression methods, which will not be repeated here. The conventional model compression architecture uses the output of the compressed model as a reward, so that reinforcement learning can determine how to further compress the original model. Further, the conventional architecture uses accuracy as the reward for the main body of reinforcement learning. , in order to calculate the accuracy, the user needs to provide labeled data as the test data fed into the compressed model, so that the accuracy of the output of the compressed model can be known through the information provided by the labeling. However, , For users, the marking of data is quite time-consuming and labor-intensive. In addition, when calculating the accuracy, the maximum value in the output of the compressed model is generally compared with the marking. Therefore, the calculation of the accuracy does not care about compression at all. In addition to the maximum value in the output of the latter model, if the input data is difficult to judge, the values in the output of the compressed model will be very close to each other. Using the accuracy rate alone as the reward for the main body of reinforcement learning may cause the model Overconfidence and loss of judgment ability of some features, thus resulting in similar overfitting (overfitting) results or different outputs from the original model. Furthermore, in order to know which accuracy algorithm to use, the conventional architecture also The user is required to provide the test source code.

相較於習知架構採用準確率來作為強化學習之主體的獎勵，本發明模型壓縮方法改用相似度來作為強化學習之主體的獎勵，並透過強化學習(例如DDPG演算法)進行模型剪枝的調整。模型剪枝(模型壓縮)的最主要目的是使壓縮後模型304能跟使用者所提供的原始模型302相似，因此，本發明模型壓縮方法可將原始模型302的輸出資料D1與壓縮後模型304的輸出資料D2進行相似性的比較，以作為模型剪枝(模型壓縮)的依據。Compared with the conventional architecture that uses accuracy as the reward for the main body of reinforcement learning, the model compression method of the present invention uses similarity as the reward for the main body of reinforcement learning, and performs model pruning through reinforcement learning (such as DDPG algorithm). adjustment. The main purpose of model pruning (model compression) is to make the compressed model 304 similar to the original model 302 provided by the user. Therefore, the model compression method of the present invention can combine the output data D1 of the original model 302 with the compressed model 304 The similarity of the output data D2 is compared as a basis for model pruning (model compression).

於本發明之一實施例中，相似度可藉由計算原始模型X之輸出與壓縮後模型Y之輸出的皮爾森相關係數(Pearson’s correlation coefficient)來得到，例如：In an embodiment of the present invention, the similarity can be obtained by calculating the Pearson's correlation coefficient between the output of the original model X and the output of the compressed model Y, for example:

Output matrix of X= [1.0, 2.0, 3.0]Output matrix of X= [1.0, 2.0, 3.0]

Output matrix of Y= [2.0, 20.0, 38.0]Output matrix of Y= [2.0, 20.0, 38.0]

Pearson’s correlation coefficient ρ(X,Y) =

=1.0 Pearson's correlation coefficient ρ(X,Y) =

=1.0

於本發明之另一實施例中，相似度可藉由計算原始模型X之輸出與壓縮後模型Y之輸出的餘弦相似度(Cosine similarity)來得到，例如：In another embodiment of the present invention, the similarity can be obtained by calculating the cosine similarity between the output of the original model X and the output of the compressed model Y, for example:

Output matrix of X= [1.0, 2.0, 3.0]Output matrix of X= [1.0, 2.0, 3.0]

Output matrix of Y= [2.0, 20.0, 38.0]Output matrix of Y= [2.0, 20.0, 38.0]

Cosine similarity (X,Y) =

=0.9698612260388879 Cosine similarity (X,Y) =

=0.9698612260388879

然而，上述僅作為範例說明之用，並非用來作為本發明的限制條件，實際上，本發明模型壓縮方法亦可根據設計需求及/或應用需求來採用其它適合的相似度算法，這些設計上的變化亦落入本發明的範疇。However, the above is only used as an example and is not used as a limitation of the present invention. In fact, the model compression method of the present invention can also adopt other suitable similarity algorithms according to design requirements and/or application requirements. Variations also fall within the scope of the present invention.

如上所述，相似度的計算是基於原始模型302與壓縮後模型304各自的輸出資料D1、D2，於本實施例中，假若原始模型302是基於卷積神經網路架構進行訓練所得到的模型，則模型剪枝操作(所要進行壓縮的部位是由強化學習之主體306所採取的動作來選取)僅會施加於卷積層，因此輸出資料D1、D2可以是卷積神經網路架構中位於卷積層後面之任一層的輸出。第4圖為第3圖所示之原始模型302與壓縮後模型304所具備之卷積神經網路架構的示意圖。如圖所示，卷積神經網路架構400包含有輸入層(input layer)、卷積層404_1～404_N (N≧1)、池化層(pooling layer)406_1～406_N (N≧1)，全連接層(filly-connected layer)408_1～408_M (M≧1)以及輸出層(output layer)410。於本發明之一實施例中，輸出資料D1可以是原始模型302之一全連接層(例如408_i，1≦i≦M)的輸出，以及輸出資料D2可以是壓縮後模型304之同一全連接層(例如408_i，1≦i≦M)的輸出。於本發明之另一實施例中，輸出層410為最後一層，並且會執行 Softmax函式以使得全連接層408_M之所有節點輸出的機率分佈總和為 1，此外，輸出資料D1可以是原始模型302之最後一層的Softmax函式輸出，以及輸出資料D2可以是壓縮後模型304之最後一層的Softmax函式輸出。請注意，第4圖所示之卷積神經網路架構400僅作為範例說明之用，並非作為本發明的限制條件，實作上，本發明模型壓縮方法亦可適用於其它神經網路架構，這些設計上的變化亦落入本發明的範疇。As mentioned above, the calculation of the similarity is based on the respective output data D1 and D2 of the original model 302 and the compressed model 304. In this embodiment, if the original model 302 is a model obtained by training based on the convolutional neural network architecture , then the model pruning operation (the part to be compressed is selected by the action taken by the reinforcement learning main body 306) will only be applied to the convolutional layer, so the output data D1 and D2 can be the convolutional neural network architecture located in the volume The output of any layer after the stack. FIG. 4 is a schematic diagram of the convolutional neural network architecture of the original model 302 and the compressed model 304 shown in FIG. 3 . As shown in the figure, the convolutional neural network architecture 400 includes an input layer, convolutional layers 404_1 to 404_N (N≧1), pooling layers 406_1 to 406_N (N≧1), fully connected Layers (filled-connected layers) 408_1 to 408_M (M≧1) and an output layer (output layer) 410 . In an embodiment of the present invention, the output data D1 may be the output of a fully connected layer (eg, 408_i, 1≦i≦M) of the original model 302 , and the output data D2 may be the same fully connected layer of the compressed model 304 . (eg 408_i, 1≦i≦M) output. In another embodiment of the present invention, the output layer 410 is the last layer, and the Softmax function will be executed to make the sum of the probability distributions of the outputs of all nodes in the fully connected layer 408_M to be 1. In addition, the output data D1 may be the original model 302. The Softmax function output of the last layer of the model 304, and the output data D2 may be the Softmax function output of the last layer of the compressed model 304. Please note that the convolutional neural network architecture 400 shown in FIG. 4 is only used for illustrative purposes, not a limitation of the present invention. In practice, the model compression method of the present invention can also be applied to other neural network architectures. These design variations also fall within the scope of the present invention.

綜上所述，相似度的計算是基於原始模型與壓縮後模型各自的輸出資料，故無需將壓縮後模型的輸出資料跟測試資料的標記進行比較，換言之，相較於習知架構採用準確率來作為強化學習之主體的獎勵而需要使用者提供標記過的資料來作為測試資料，本發明模型壓縮方法採用相似度來作為強化學習之主體的獎勵而可以不使用標記過的資料來作為測試資料（亦即，測試資料308是未標記過的資料(non-labeled data)），由於測試資料308不用包含標記，故能減少資料標記的成本與時間。此外，本發明模型壓縮方法採用相似度的計算，故使用者無需提供測試原始程式碼，可以直接輸入模型來進行壓縮，因此能有效推廣模型壓縮的應用。再者，本發明模型壓縮方法改用相似度來作為強化學習之主體的獎勵，故壓縮後留下的泛化特徵也不易產生過擬合。以上所述僅為本發明之較佳實施例，凡依本發明申請專利範圍所做之均等變化與修飾，皆應屬本發明之涵蓋範圍。 To sum up, the similarity calculation is based on the respective output data of the original model and the compressed model, so there is no need to compare the output data of the compressed model with the labels of the test data. As the reward for the main body of reinforcement learning, the user needs to provide the marked data as the test data. The model compression method of the present invention uses similarity as the reward for the main body of the reinforcement learning and can not use the marked data as the test data. (That is, the test data 308 is non-labeled data). Since the test data 308 does not include labels, the cost and time of data labeling can be reduced. In addition, the model compression method of the present invention adopts the calculation of similarity, so the user does not need to provide the original test code, and can directly input the model for compression, so the application of model compression can be effectively promoted. Furthermore, the model compression method of the present invention uses similarity as the reward for the main body of reinforcement learning, so the generalization features left after compression are not prone to overfitting. The above descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made according to the scope of the patent application of the present invention shall fall within the scope of the present invention.

100:模型壓縮系統 102:處理器 104:儲存裝置 202,204,206,208:步驟 302:原始模型 304:壓縮後模型 306:主體 308:測試資料 400:卷積神經網路架構 402:輸入層 404_1,404_N:卷積層 406_1,406_N:池化層 408_1,408_M:全連接層 410:輸出層 Code_MC:程式碼 D1,D2:輸出資料100: Model Compression System 102: Processor 104: Storage Device 202, 204, 206, 208: steps 302: Original Model 304: Compressed model 306: Subject 308: Test data 400: Convolutional Neural Network Architecture 402: Input layer 404_1, 404_N: Convolutional layer 406_1, 406_N: Pooling layer 408_1, 408_M: Fully connected layer 410: output layer Code_MC: code D1, D2: output data

第1圖為根據本發明一實施例之模型壓縮系統的示意圖。第2圖為本發明一實施例之模型壓縮方法的流程圖。第3圖為第2圖所示之模型壓縮方法的操作示意圖。第4圖為第3圖所示之原始模型與壓縮後模型所具備之卷積神經網路架構的示意圖。 FIG. 1 is a schematic diagram of a model compression system according to an embodiment of the present invention. FIG. 2 is a flowchart of a model compression method according to an embodiment of the present invention. FIG. 3 is a schematic diagram of the operation of the model compression method shown in FIG. 2 . Figure 4 is a schematic diagram of the convolutional neural network architecture of the original model and the compressed model shown in Figure 3.

202,204,206,208:步驟 202, 204, 206, 208: steps

Claims

A model compression method, including: performing a model pruning operation on an original model having a deep neural network architecture to generate a compressed model; Input the same test data to the original model and the compressed model respectively; calculating a similarity between a first output data obtained by processing the test data by the original model and a second output data obtained by processing the test data by the compressed model; and Using the similarity as a reward, reinforcement learning is used to determine how to further adjust the model pruning operation.

The model compression method of claim 1, wherein the test data is untagged data.

The model compression method of claim 1, wherein the similarity is obtained by calculating a Pearson correlation coefficient between the first output data and the second output data.

The model compression method of claim 1, wherein the similarity is obtained by calculating a cosine similarity between the first output data and the second output data.

The model compression method of claim 1, wherein the first output data is an output of a fully connected layer of the original model, and the second output data is an output of a fully connected layer of the compressed model.

The model compression method of claim 1, wherein the first output data is the Softmax function output of the last layer of the original model, and the second output data is the Softmax function of the last layer of the compressed model output.

A model compression system comprising: a storage device for storing a code; and a processor to load and execute the code to perform the following operations: performing a model pruning operation on an original model having a deep neural network architecture to generate a compressed model; Input the same test data to the original model and the compressed model respectively; calculating a similarity between a first output data obtained by processing the test data by the original model and a second output data obtained by processing the test data by the compressed model; and Using the similarity as a reward, reinforcement learning is used to determine how to further adjust the model pruning operation.

The model compression system of claim 7, wherein the test data is unlabeled data.

The model compression system of claim 7, wherein the similarity is obtained by calculating a Pearson correlation coefficient between the first output data and the second output data.

The model compression system of claim 7, wherein the similarity is obtained by calculating a cosine similarity between the first output data and the second output data.

The model compression system of claim 7, wherein the first output data is an output of a fully connected layer of the original model, and the second output data is an output of a fully connected layer of the compressed model.

The model compression system of claim 7, wherein the first output data is the Softmax function output of the last layer of the original model, and the second output data is the Softmax function of the last layer of the compressed model output.