HK40048702B

HK40048702B - Method and computer system for compressing neural network, and storage medium

Info

Publication number: HK40048702B
Application number: HK42021038400.4A
Authority: HK
Inventors: 蒋薇; 王炜; 刘杉
Original assignee: 腾讯美国有限责任公司
Priority date: 2020-01-23
Filing date: 2021-09-08
Publication date: 2023-09-29

Description

Methods, computer systems, and storage media for compressing neural network models

相关申请的交叉引用Cross-references to related applications

本申请要求于2020年1月23日在美国专利和商标局提交的第62/964,996号美国临时专利申请以及2020年11月3日在美国专利和商标局提交的第17/088,061号美国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims priority to U.S. Provisional Patent Application No. 62/964,996, filed January 23, 2020, with the U.S. Patent and Trademark Office, and U.S. Patent Application No. 17/088,061, filed November 3, 2020, with the U.S. Patent and Trademark Office, the entire contents of which are incorporated herein by reference.

技术领域Technical Field

本公开通常涉及数据处理领域，并且尤其涉及神经网络。This disclosure generally relates to the field of data processing, and more particularly to neural networks.

背景技术Background Technology

国际标准化组织(International Organization for Standardization)ISO/国际电工委员会(International Electrotechnical Commission)IEC动态图像专家组(Moving Picture Experts Group)MPEG(JTC 1/SC 29/WG 11)一直在积极寻找未来视频编解码技术的标准化的潜在需求，以进行视觉分析和理解。ISO于2015年采用视觉搜索紧凑描述符(Compact Descriptors for Visual Search,CDVS)标准作为静止图像标准，其提取特征表示用于图像相似性匹配。CDVS标准被列为MPEG 7和ISO/IEC 15938-15的第15部分，并且于2018年定稿，该标准提取了视频片段的全局和局部、手动设计和基于深度神经网络(Deep Neural Networks,DNN)的特征描述符。DNN在诸如语义分类、目标检测/识别、目标跟踪、视频质量增强等大量视频应用中的成功，提出了压缩DNN模型的强烈需求。The International Organization for Standardization (ISO), the International Electrotechnical Commission (IEC), the Moving Picture Experts Group (MPEG) (JTC 1/SC 29/WG 11) have been actively seeking potential standardization needs for future video codec technologies for visual analysis and understanding. In 2015, ISO adopted the Compact Descriptors for Visual Search (CDVS) standard as a still image standard, extracting feature representations for image similarity matching. The CDVS standard is listed as Part 15 of MPEG 7 and ISO/IEC 15938-15 and finalized in 2018. This standard extracts global and local, manually designed, feature descriptors based on deep neural networks (DNNs) from video clips. The success of DNNs in numerous video applications such as semantic classification, object detection/recognition, object tracking, and video quality enhancement has created a strong demand for compressed DNN models.

发明内容Summary of the Invention

本公开实施例涉及压缩神经网络模型的方法、系统和计算机可读存储介质，可以压缩神经网络模型，并提高神经网络模型的计算效率。This disclosure relates to methods, systems, and computer-readable storage media for compressing neural network models, which can compress neural network models and improve the computational efficiency of neural network models.

根据一个方面，提供一种用于压缩神经网络模型的方法。该方法可以包括将对应于多维张量的至少一个索引进行重新排序，所述多维张量与神经网络相关联。确定与至少一个重新排序的索引相关联的权重系数集。根据确定的权重系数集来压缩神经网络的模型。According to one aspect, a method for compressing a neural network model is provided. The method may include reordering at least one index corresponding to a multidimensional tensor associated with the neural network; determining a set of weight coefficients associated with the at least one reordered index; and compressing the neural network model based on the determined set of weight coefficients.

根据另一个方面，提供一种压缩神经网络模型的计算机系统。该计算机系统可以包括重新排序模块，用于将对应于多维张量的至少一个索引进行重新排序，所述多维张量与神经网络相关联。确定与至少一个重新排序的索引相关联的权重系数集。根据确定的权重系数集压缩神经网络的模型。According to another aspect, a computer system for compressing a neural network model is provided. The computer system may include a reordering module for reordering at least one index corresponding to a multidimensional tensor associated with the neural network. A set of weight coefficients associated with the at least one reordered index is determined. The model of the neural network is compressed based on the determined set of weight coefficients.

根据又一个方面，提供一种用于压缩神经网络模型的非易失性计算机可读介质。该非易失性计算机可读介质可以包括处理器和存储器；所述存储器存储有计算机程序，所述计算机程序被所述处理器执行时，使得所述处理器执行至少一个计算机可读存储设备以及被存储在至少一个有形存储设备中的至少一个上的程序指令，该程序指令可由处理器执行。该程序指令可由处理器执行，用于执行一种方法，该方法可以相应地包括将对应于与神经网络相关联的多维张量的至少一个索引进行重新排序。确定与至少一个重新排序的索引相关联的权重系数集。基于确定的权重系数集来压缩神经网络的模型。According to another aspect, a non-volatile computer-readable medium is provided for compressing a neural network model. The non-volatile computer-readable medium may include a processor and a memory; the memory stores a computer program that, when executed by the processor, causes the processor to execute program instructions stored on at least one computer-readable storage device and at least one of at least one tangible storage devices, the program instructions being executable by the processor. The program instructions, executable by the processor, are used to perform a method that may accordingly include reordering at least one index corresponding to a multidimensional tensor associated with the neural network; determining a set of weight coefficients associated with the at least one reordered index; and compressing the neural network model based on the determined set of weight coefficients.

通过本公开实施例提供的压缩神经网络模型的方法、系统和计算机可读存储介质，可以提高压缩学习的权重系数的效率，从而加速使用优化的权重系数的计算，这可以实现显著压缩神经网络模型，并提高神经网络模型的计算效率。The method, system, and computer-readable storage medium for compressing neural network models provided in this disclosure can improve the efficiency of compressing learning weight coefficients, thereby accelerating the calculation of optimized weight coefficients. This can significantly compress neural network models and improve the computational efficiency of neural network models.

附图说明Attached Figure Description

这些和其它目的、特征和优点将从以下结合附图阅读的说明性实施例的详细描述中变得显而易见。由于附图是用于方便本领域技术人员结合详细描述进行清楚的理解，因此附图的各种特征并非按比例绘制。在附图中：These and other objects, features, and advantages will become apparent from the following detailed description of illustrative embodiments, which will be read in conjunction with the accompanying drawings. Since the drawings are intended to facilitate clear understanding by those skilled in the art in conjunction with the detailed description, various features in the drawings are not drawn to scale. In the drawings:

图1示出了根据至少一个实施例的联网计算机环境；Figure 1 illustrates a networked computer environment according to at least one embodiment;

图2是根据至少一个实施例的神经网络模型压缩系统的框图；Figure 2 is a block diagram of a neural network model compression system according to at least one embodiment;

图3示出了根据至少一个实施例的由压缩神经网络模型的程序所执行的步骤的操作流程图；Figure 3 shows an operation flowchart of the steps performed by a program of a compressed neural network model according to at least one embodiment;

图4是根据至少一个实施例的图1中描绘的计算机和服务器的内部部件和外部部件的框图；Figure 4 is a block diagram of the internal and external components of the computer and server depicted in Figure 1 according to at least one embodiment;

图5是根据至少一个实施例的包括图1中描绘的计算机系统的示例性云计算环境的框图；以及Figure 5 is a block diagram of an exemplary cloud computing environment including the computer system depicted in Figure 1 according to at least one embodiment; and

图6是根据至少一个实施例的图5的示例性云计算环境的功能层的框图。Figure 6 is a block diagram of the functional layers of an exemplary cloud computing environment of Figure 5 according to at least one embodiment.

具体实施方式Detailed Implementation

本文公开了所要求保护的结构和方法的详细实施例；然而，可理解的是，所公开的实施例仅仅是图示了可以以各种形式实施的所要求保护的结构和方法。然而，这些结构和方法可以以许多不同的形式来实施，并且不应当被解释为限于本文阐述的示例性实施例。相反，提供这些示例性实施例是为了使本公开清楚和完整，并且将范围完整地传达给本领域技术人员。在描述中，可以省略公知特征和技术的细节，以避免不必要地模糊所呈现的实施例。This document discloses detailed embodiments of the claimed structures and methods; however, it is understood that the disclosed embodiments are merely illustrative of the claimed structures and methods that can be implemented in various forms. These structures and methods can be implemented in many different forms and should not be construed as limited to the exemplary embodiments set forth herein. Rather, these exemplary embodiments are provided to make this disclosure clear and complete, and to fully convey the scope to those skilled in the art. Details of well-known features and techniques may be omitted in the description to avoid unnecessarily obscuring the presented embodiments.

本公开实施例通常涉及数据处理领域，并且尤其涉及神经网络。以下描述的示例性实施例提供了用于压缩神经网络模型的系统、方法和计算机程序。因此，一些实施例具有通过允许提高所学习的权重系数的压缩效率来改进计算领域的能力，这可以显著减小深度神经网络模型大小。This disclosure generally relates to the field of data processing, and more particularly to neural networks. The exemplary embodiments described below provide systems, methods, and computer programs for compressing neural network models. Therefore, some embodiments have the ability to improve computational efficiency by allowing for increased compression efficiency of learned weight coefficients, which can significantly reduce the size of deep neural network models.

如前所述，ISO/IEC MPEG(JTC 1/SC 29/WG 11)一直在积极寻找未来视频编解码技术的标准化的潜在需求，以进行视觉分析和理解。ISO于2015年采用CDVS标准作为静态图像标准，其提取用于图像相似性匹配的特征表示(Feature Representations)。CDVS标准被列为MPEG 7和ISO/IEC 15938-15的第15部分，并且于2018年定稿，该标准提取了视频片段的全局和局部、手动设计和基于DNN的特征描述符。DNN在诸如语义分类、目标检测/识别、目标跟踪、视频质量增强等大量视频应用中的成功，提出了压缩DNN模型的强烈需求。As mentioned earlier, ISO/IEC MPEG (JTC 1/SC 29/WG 11) has been actively seeking potential standardization needs for future video codec technologies for visual analysis and understanding. In 2015, ISO adopted the CDVS standard as a still image standard, which extracts feature representations for image similarity matching. The CDVS standard is listed as Part 15 of MPEG 7 and ISO/IEC 15938-15 and was finalized in 2018. This standard extracts global and local, manually designed, DNN-based feature descriptors from video segments. The success of DNNs in numerous video applications such as semantic classification, object detection/recognition, object tracking, and video quality enhancement has created a strong demand for compressed DNN models.

因此，MPEG正在积极致力于神经网络标准(Neural Network standard,NNR)的编码表示(Coded Representation)，该NNR的编码表示对DNN模型进行编码以节省存储和计算。存在几种学习紧凑DNN模型的方法。目标是删除不重要的权重系数，并且假设权重系数的值越小，它们的重要性就越低。通过将稀疏性促进正则化项添加到网络训练目标中或贪婪地删除网络参数，已提出了几种网络修剪方法来明确实现此目标。从压缩DNN模型的角度来看，在学习紧凑网络模型之后，权重系数可以通过量化，然后进行熵编解码来进一步压缩。此类进一步的压缩过程可以显著地减小DNN模型的存储大小，这对于在移动设备、芯片等上部署模型是必不可少的。Therefore, MPEG is actively working on the coded representation of the Neural Network standard (NNR), which encodes DNN models to save storage and computation. Several methods exist for learning compact DNN models. The goal is to remove unimportant weight coefficients, assuming that the smaller the value of the weight coefficient, the less important they are. Several network pruning methods have been proposed to explicitly achieve this goal, either by adding sparsity-enhancing regularization terms to the network training objective or by greedily removing network parameters. From the perspective of compressing DNN models, after learning a compact network model, the weight coefficients can be further compressed by quantization followed by entropy encoding/decoding. Such further compression can significantly reduce the storage size of DNN models, which is essential for deploying models on mobile devices, chips, etc.

权重统一正则化可以提高后续压缩处理中的压缩效率。迭代网络再训练/修正框架用于联合优化原始训练目标和包括压缩率损失、统一失真损失和计算速度损失的权重统一损失，使得所学习的网络权重系数保持原始目标性能，适于进一步的压缩，并且可以加速使用所学习的权重系数的计算。该提出的方法可以应用于压缩原始的预训练的DNN模型。它还可以用作附加的处理模块，以进一步压缩任何修剪的DNN模型。Weight uniform regularization can improve compression efficiency in subsequent compression processing. An iterative network retraining/correction framework is used to jointly optimize the original training objective and the weight uniform loss, which includes compression ratio loss, uniform distortion loss, and computation speed loss. This ensures that the learned network weight coefficients maintain the original target performance, are suitable for further compression, and accelerate the computation of the learned weight coefficients. The proposed method can be applied to compressing original pre-trained DNN models. It can also be used as an additional processing module to further compress any pruned DNN model.

统一正则化可以提高进一步压缩学习的权重系数的效率，从而加速使用优化的权重系数的计算。这可以显著减小DNN模型大小并加速推理计算。通过迭代再训练过程，可以保持原始训练目标的性能，这可以允许压缩和计算效率。迭代再训练过程还赋予了在不同时间引入不同损失的灵活性，使得系统在优化过程期间聚焦在不同的目标上。本文公开的方法、计算机系统和计算机程序通常可以应用于具有不同数据形式的数据集。输入/输出数据通常是4D张量，该4D张量可以是真实的视频片段、图像或提取的特征图。Unified regularization can improve the efficiency of further compressing the weight coefficients during learning, thereby accelerating the computation of optimized weight coefficients. This can significantly reduce the size of the DNN model and accelerate inference computation. Through an iterative retraining process, the performance of the original training objective can be maintained, allowing for both compression and computational efficiency. The iterative retraining process also provides the flexibility to introduce different losses at different times, allowing the system to focus on different objectives during the optimization process. The methods, computer systems, and computer programs disclosed in this paper can generally be applied to datasets with different data formats. The input/output data is typically a 4D tensor, which can be a real video clip, an image, or an extracted feature map.

本文参考根据各个实施例的方法、装置(系统)和计算机可读介质的流程图和/或框图来描述各方面。应当理解，流程图图示和/或框图的每个框，以及流程图图示和/或框图中的框的组合可由计算机可读程序指令实现。This document describes aspects with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer-readable media according to various embodiments. It should be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

现在参考图1，联网计算机环境的功能框图示出了用于压缩神经网络模型的神经网络模型压缩系统100(以下被称为“系统”)。应当理解，图1仅提供一种实施方案的图示，并不意味着对可实现不同实施例的环境的任何限制。可基于设计和实现需求对所描绘的环境进行许多修改。Referring now to Figure 1, a functional block diagram of a networked computer environment illustrates a neural network model compression system 100 (hereinafter referred to as the "System") for compressing neural network models. It should be understood that Figure 1 is merely an illustration of one embodiment and does not imply any limitation on the environment in which different embodiments may be implemented. Many modifications can be made to the depicted environment based on design and implementation requirements.

系统100可以包括计算机102和服务器计算机114。计算机102可通过通信网络110(以下被称为“网络”)与服务器计算机114进行通信。计算机102包括处理器104以及存储在数据存储设备106上的软件程序108，并且能够与用户对接以及与服务器计算机114通信。如下面将参考图4讨论的，计算机102可以分别包括内部组件800A和外部组件900A，并且服务器计算机114可以分别包括内部组件800B和外部组件900B。计算机102可为例如移动设备、电话、个人数字助理、上网本、膝上型计算机、平板计算机、台式计算机或者能够运行程序、访问网络和访问数据库的任何类型的计算设备。System 100 may include computer 102 and server computer 114. Computer 102 may communicate with server computer 114 via communication network 110 (hereinafter referred to as "network"). Computer 102 includes processor 104 and software program 108 stored on data storage device 106, and is capable of interfacing with a user and communicating with server computer 114. As will be discussed below with reference to FIG4, computer 102 may include internal component 800A and external component 900A, and server computer 114 may include internal component 800B and external component 900B, respectively. Computer 102 may be, for example, a mobile device, telephone, personal digital assistant, netbook, laptop computer, tablet computer, desktop computer, or any type of computing device capable of running programs, accessing networks, and accessing databases.

如以下结合图5和图6所讨论的，服务器计算机114还可以在云计算服务模型中操作，诸如软件即服务(Software as a Service，SaaS)、平台即服务(Platform as aService，PaaS)或基础设施即服务(Infrastructure as a Service，IaaS)。服务器计算机114还可位于云计算部署模型中，诸如私有云、社区云、公共云或混合云中。As discussed below in conjunction with Figures 5 and 6, server computer 114 can also operate in cloud computing service models, such as Software as a Service (SaaS), Platform as a Service (PaaS), or Infrastructure as a Service (IaaS). Server computer 114 can also reside in cloud computing deployment models, such as private clouds, community clouds, public clouds, or hybrid clouds.

用于压缩神经网络模型的服务器计算机114能够运行与数据库112交互的神经网络模型压缩程序(Neural Network Model Compression Program)116(以下被称为“程序”)。下面将结合图3更详细地解释神经网络模型压缩程序方法。在一个实施例中，计算机102可作为包括用户接口的输入设备操作，而程序116可主要在服务器计算机114上运行。在可替代实施例中，程序116可主要在至少一个计算机102上运行，而服务器计算机114可用于处理和存储程序116使用的数据。应当注意，程序116可以是独立程序或者可以被集成到较大的神经网络模型压缩程序中。The server computer 114, used for compressing neural network models, is capable of running a neural network model compression program 116 (hereinafter referred to as the "program") that interacts with the database 112. The neural network model compression program method will be explained in more detail below with reference to Figure 3. In one embodiment, the computer 102 may operate as an input device including a user interface, while the program 116 may run primarily on the server computer 114. In an alternative embodiment, the program 116 may run primarily on at least one computer 102, while the server computer 114 may be used to process and store the data used by the program 116. It should be noted that the program 116 may be a standalone program or may be integrated into a larger neural network model compression program.

然而，应当注意，在一些实例中，可以以任何比例在计算机102和服务器计算机114之间共享对程序116的处理。在另一个实施例中，程序116可在一个以上计算机、服务器计算机或者计算机和服务器计算机的某一组合上操作，例如，通过网络110与单个服务器计算机114进行通信的多个计算机102。在另一个实施例中，例如，程序116可在多个服务器计算机114上操作，所述多个服务器计算机114通过网络110与多个客户端计算机进行通信。可替代地，程序可在网络服务器上运行，该网络服务器通过网络与服务器和多个客户端计算机进行通信。However, it should be noted that in some instances, processing of program 116 can be shared between computer 102 and server computer 114 in any proportion. In another embodiment, program 116 may operate on more than one computer, server computer, or a combination of computers and server computers, for example, multiple computers 102 communicating with a single server computer 114 via network 110. In another embodiment, for example, program 116 may operate on multiple server computers 114 communicating with multiple client computers via network 110. Alternatively, the program may run on a network server that communicates with servers and multiple client computers via a network.

网络110可以包括有线连接、无线连接、光纤连接或其某一组合。通常，网络110可为支持计算机102和服务器计算机114之间的通信的连接和协议的任何组合。网络110可以包括各种类型的网络，诸如，例如局域网(Local Area Network,LAN)、广域网(Wide AreaNetwork,WAN)(例如因特网)、电信网络(例如公共交换电话网络(Public SwitchedTelephone Network,PSTN))、无线网络、公共交换网络、卫星网络、蜂窝网络(例如，第五代(the fifth generation,5G)网络、长期演进(Long-Term Evolution LTE)网络、第三代(the third generation,3G)网络、码分多址(Code Division Multiple Access,CDMA)网络等)、公共陆地移动网络(Public Land Mobile Network,PLMN)、城域网(MetropolitanArea Network,MAN)、专用网络、自组织网络、内部网、基于光纤的网络等，以及/或者这些或其它类型网络的组合。Network 110 may include wired connections, wireless connections, fiber optic connections, or a combination thereof. Typically, network 110 may be any combination of connections and protocols that support communication between computer 102 and server computer 114. Network 110 may include various types of networks, such as, for example, Local Area Network (LAN), Wide Area Network (WAN) (e.g., the Internet), telecommunications networks (e.g., Public Switched Telephone Network (PSTN)), wireless networks, public switched networks, satellite networks, cellular networks (e.g., the fifth generation (5G), Long Term Evolution (LTE), the third generation (3G), Code Division Multiple Access (CDMA), etc.), Public Land Mobile Network (PLMN), Metropolitan Area Network (MAN), private networks, self-organizing networks, intranets, fiber-optic networks, etc., and/or combinations of these or other types of networks.

图1所示的设备和网络的数量和布置是作为示例提供的。实际上，可存在附加的设备和/或网络、更少的设备和/或网络、不同的设备和/或网络、或者与图1所示的设备和/或网络具有不同地布置的设备和/或网络。此外，图1所示的两个或更多个设备可在单个设备中实现，或者图1所示的单个设备可以实现为多个分布式设备。此外或可替代地，系统100的一组设备(例如，至少一个设备)可执行至少一个函数，所述至少一个函数被描述为由系统100的另一组设备执行。The number and arrangement of devices and networks shown in Figure 1 are provided as examples. In practice, additional devices and/or networks, fewer devices and/or networks, different devices and/or networks, or devices and/or networks with different arrangements than those shown in Figure 1 may exist. Furthermore, the two or more devices shown in Figure 1 may be implemented in a single device, or the single device shown in Figure 1 may be implemented as multiple distributed devices. Additionally or alternatively, a group of devices in system 100 (e.g., at least one device) may execute at least one function, which is described as being executed by another group of devices in system 100.

现在参考图2，描述了神经网络模型压缩系统200。神经网络模型压缩系统200可以用作迭代学习过程的框架。神经网络模型压缩系统200可以包括统一索引次序和方法选择模块202、权重统一模块204、网络转发计算模块206、计算目标损失模块208、计算梯度模块210、以及反向传播和权重更新模块212。Referring now to Figure 2, a neural network model compression system 200 is described. The neural network model compression system 200 can be used as a framework for an iterative learning process. The neural network model compression system 200 may include a unified index order and method selection module 202, a weight unification module 204, a network forwarding calculation module 206, a target loss calculation module 208, a gradient calculation module 210, and a backpropagation and weight update module 212.

设表示将目标y分配给输入x的数据集。设Θ＝{w}表示DNN的权重系数集。神经网络训练的目标是学习最优的权重系数集Θ*，从而可以将目标损失最小化。例如，在以前的网络修剪方法中，目标损失有两个部分，经验数据损失(empirical data loss)和稀疏性促进正则化损失(sparsity-promotingregularization loss)￡_R(Θ)：Let denot be the dataset that assigns the objective y to the input x. Let Θ = {w} denote the set of weight coefficients of the DNN. The goal of training the neural network is to learn the optimal set of weight coefficients Θ*, thereby minimizing the objective loss. For example, in previous network pruning methods, the objective loss has two parts: empirical data loss and sparsity-promoting regularization loss ￡ _R (Θ):

其中λ_R≥0是平衡数据损失和正则化损失的贡献的超参数。Where _λR ≥0 is a hyperparameter that balances the contributions of data loss and regularization loss.

稀疏性促进正则化损失将正则化放置在整个权重系数之上，并且得到的稀疏权重与推理效率或计算加速具有弱关系。从另一个角度来看，在修剪之后，稀疏权重可以进一步经历另一个网络训练过程，其中可以学习最优的权重系数集，该最优的权重系数集可以提高进一步的模型压缩的效率。Sparsity facilitates regularization loss by placing regularization over the entire set of weights, and the resulting sparse weights have a weak correlation with inference efficiency or computational speedup. From another perspective, after pruning, the sparse weights can undergo another network training process where an optimal set of weights can be learned, which can improve the efficiency of further model compression.

本公开提出了权重统一损失该权重统一损失与原始目标损失一起进行优化：This disclosure proposes a weighted uniform loss that is optimized together with the original target loss:

其中λ_U≥0是平衡原始训练目标和权重统一的贡献的超参数。通过联合优化可以获得最优的权重系数集，所述最优的权重系数集能够极大地帮助进一步压缩的效率。加权统一损失考虑了如何将卷积操作作为GEMM矩阵乘法过程执行的基本过程，从而产生优化的加权系数，可极大地加快计算速度。值得注意的是，我们的权重统一损失可以被视为一般目标损失的附加正则化项，具有(当λ_R>0)或不具有(当λ_R＝0)一般正则化。而且，我们的方法可以灵活地应用于任何正则化损失￡_R(Θ)。Where _λU ≥ 0 is a hyperparameter balancing the contributions of the original training objective and weight unification. Joint optimization yields an optimal set of weight coefficients, which significantly improves the efficiency of further compression. The weighted unification loss considers how convolution operations are performed as a fundamental process of GEMM matrix multiplication, resulting in optimized weighting coefficients that greatly accelerate computation. Notably, our weighted unification loss can be viewed as an additional regularization term to the general objective loss, with (when _λR > 0) or without (when _λR = 0) general regularization. Moreover, our method is flexible enough to be applied to any regularized loss _εR (Θ).

在至少一个实施例中，权重统一损失￡_U(Θ)进一步包括压缩率损失￡_C(Θ)、统一失真损失￡_I(Θ)，以及计算速度损失￡_S(Θ)：In at least one embodiment, the weight unification loss £ _U (Θ) further includes a compression ratio loss £ _C (Θ), a unification distortion loss £ _I (Θ), and a computation speed loss £ _S (Θ):

￡_U(Θ)＝￡_I(Θ)+λ_C￡_C(Θ)+λ_S￡_S(Θ)，￡ _U (Θ)＝￡ _I (Θ)+λ _C ￡ _C (Θ)+λ _S ￡ _S (Θ),

这些损失项的详细说明将在后面的部分中描述。对于学习有效性和学习效率两者，进一步提出了迭代优化过程。满足期望结构的部分权重系数可以是固定的。可以通过反向传播训练损失来更新权重系数的非固定部分。通过迭代地执行这两个步骤，可以逐渐地确定越来越多的权重，并且可以有效地逐渐优化联合损失。Detailed explanations of these loss terms will be provided in later sections. For both learning effectiveness and learning efficiency, an iterative optimization process is further proposed. The partial weight coefficients that satisfy the desired structure can be fixed. The non-fixed parts of the weight coefficients can be updated by backpropagating the training loss. By iteratively performing these two steps, more and more weights can be gradually determined, and the joint loss can be effectively and progressively optimized.

而且，在至少一个实施例中，每个层被单独压缩，可以进一步被写为：Moreover, in at least one embodiment, each layer is compressed individually, and can be further written as:

其中L_U(W^j)是在第j层上定义的统一损失；N是测量量化损耗的总层数；W^j表示第j层的权重系数。同样，由于L_U(W^j)是针对每一层独立计算的，因此在本公开的其余部分中，省略了脚本j而不失普遍性。Where _LU ( ^Wj ) is the uniform loss defined on the j-th layer; N is the total number of layers for measuring quantization loss; and ^Wj represents the weight coefficient of the j-th layer. Similarly, since _LU ( ^Wj ) is calculated independently for each layer, script j is omitted in the remainder of this disclosure without loss of generality.

对于每个网络层，其权重系数W是大小为(c_i,k₁,k₂,k₃,c_o)的一般5维(5D)张量。层的输入是大小为(h_i,w_i,d_i,c_i)的4维(4D)张量A，并且层的输出是大小为(h_o,w_o,d_o,c_o)的4D张量B。大小c_i、k₁、k₂、k₃、c_o、h_i、w_i、d_i、h_o、w_o、d_o是大于或等于1的整数。当大小c_i、k₁、k₂、k₃、c_o、h_i、w_i、d_i、h_o、w_o、d_o中的任一个取值为数字1时，对应的张量减少到较低的维数。每个张量中的每个项都是一个浮点数。设M表示与W大小相同的5D二进制掩码，其中M中的每个项都是二进制数0/1，该二进制数0/1用于指示对应的权重系数是被修剪还是进行保留。引入M与W相关联以应对W来自修剪的DNN模型的情况，其中将网络中的神经元之间的一些连接从计算中去除。当W来自原始的未修剪的预训练模型时，M中的所有项取值为1。通过基于A、M和W的卷积运算⊙来计算输出B：For each network layer, its weight coefficients W are general 5D tensors of size ( _ci , _k1 , _k2 , _k3 , _co ). The input to the layer is a 4D tensor A of size ( _hi , _wi , _di , _ci ), and the output of the layer is a 4D tensor B of size ( _ho , _wo , _do , _co ). Sizes ci, _k1 _{, k2} _, _k3 , _co , _hi , _wi , _di , _ho , _wo , and _do are integers greater than or equal to 1. When any of the sizes _ci , _k1 , _k2 , _k3 , _co , hi, _wi , _di , _ho , _wo , _and _do are 1, the corresponding tensor is reduced to a lower dimension. Each item in each tensor is a floating-point number. Let M represent a 5D binary mask of the same size as W, where each term in M is a binary number 0 or 1, indicating whether the corresponding weight coefficient is pruned or retained. M is introduced in association with W to address the case where W comes from a pruned DNN model, where some connections between neurons in the network are removed from computation. When W comes from the original, unpruned pre-trained model, all terms in M are set to 1. The output B is computed via a convolution operation ⊙ based on A, M, and W.

参数h_i、w_i和d_i(h₀、w_o和d_o)是输入张量A(输出张量B)的高度、权重和深度。参数c_i(c_o)是输入(输出)通道的数量。参数k₁、k₂和k₃是分别对应于高度轴、重量轴和深度轴的卷积核的大小。也就是说，对于每个输出通道v＝1，…，c_o，该操作可以被视为与输入A进行卷积的大小为(c_i,k₁,k₂,k₃)的4D权重张量Wv。Parameters h _i , w _i , and _di (h _0 , w _o , and do _o ) are the height, weight, and depth of the input tensor A (output tensor B). Parameter c_i(co_o ) is the number of input (output) channels. Parameters k _1 , k _2 , and k_3 are the sizes of the convolution kernels corresponding to the height, weight, and depth axes, respectively. That is, for each output channel v = 1, ..., co_o , this operation can be viewed as convolving with the input A into a 4D weighted tensor Wv of size (c _i _ , k1, k _2 , k_3 ).

求和操作的次序可以改变。可以将5D权重张量重塑(reshape)为大小为(c_i,c_o,k)的3D张量，其中k＝k₁·k₂·k₃。通过重塑处理中的重塑算法确定沿k轴的重塑索引的次序，这将在后面详细描述。The order of the summation operations can be changed. A 5D weighted tensor can be reshaped into a 3D tensor of size ( _ci , _co , k), where k = _k1 · _k2 · _k3 . The order of the reshaping indices along the k-axis is determined by the reshaping algorithm during the reshaping process, which will be described in detail later.

权重系数的期望结构可以通过考虑两个方面来设计。第一，权重系数的结构与如何实现卷积运算的基本GEMM矩阵乘法过程保持一致，以便加快使用所学习的权重系数的推理计算。第二，权重系数的结构可以有助于提高进一步压缩的量化和熵编解码效率。在至少一个实施例中，可以在3D重塑权重张量中使用每层中的权重系数的逐块结构(block-wisestructure)。具体地，可以将3D张量分区成大小为(g_i,g_o,g_k)的块，并且可以统一块内的所有系数。将块中的统一权重设置为遵循预定义的统一规则，例如，将所有值设置为相同，使得一个值可用于表示产生高效率的量化过程中的整个块。可以有多个统一权重的规则，每个规则与统一失真损失相关联，统一失真损失用于测量由于采用该规则所引入的误差。例如，不是将权重设置为相同，而是可以将权重设置为具有相同的绝对值，同时保持它们的原始符号。给定该设计的结构，在迭代期间，可以通过考虑统一失真损失、估计的压缩率损失和估计的速度损失来确定要固定的部分权重系数。执行神经网络训练过程以通过反向传播机制更新剩余的非固定权重系数。The desired structure of the weight coefficients can be designed by considering two aspects. First, the structure of the weight coefficients should be consistent with the basic GEMM matrix multiplication process used to implement convolution operations, in order to accelerate inference computation using the learned weight coefficients. Second, the structure of the weight coefficients can contribute to improving the efficiency of quantization and entropy encoding/decoding for further compression. In at least one embodiment, a block-wise structure of weight coefficients in each layer can be used in the 3D reshaping of the weight tensor. Specifically, the 3D tensor can be partitioned into blocks of size ( _gi , _go , _gk ), and all coefficients within a block can be unified. The unified weights within a block are set to follow a predefined unification rule, for example, setting all values to be the same, such that a single value can be used to represent the entire block in a highly efficient quantization process. There can be multiple rules for unifying weights, each associated with a unified distortion loss, which measures the error introduced by adopting that rule. For example, instead of setting the weights to be the same, the weights can be set to have the same absolute value while preserving their original signs. Given the structure of the design, during iteration, the partial weight coefficients to be fixed can be determined by considering the uniform distortion loss, the estimated compression ratio loss, and the estimated velocity loss. A neural network training process is then performed to update the remaining non-fixed weight coefficients via backpropagation.

迭代再训练/微调过程的总体框架可以迭代地交替选择步骤以逐渐优化联合损失。给定具有权重系数W和掩码M的预训练DNN模型，其可以是修剪的稀疏模型或未修剪的非稀疏模型，在第一步骤中，可以通过统一索引次序和方法选择模块202来确定索引I(W)＝[i₀,…,i_k]的次序以对权重系数W(和对应的掩码M)进行重塑，其中k＝k₁·k₂·k₃是权重W的重塑的3D张量。具体地，可以将权重W的重塑的3D张量分区成大小为(g_i,g_o,g_k)的超级块。设S表示超级块。基于超级块S内权重系数的权重统一损失，即基于权重统一损失￡_T(Θ)，为每个超级块S单独地确定I(W)。通常根据稍后的压缩方法来选择超级块的大小。例如，在至少一个实施例中，可以选择大小为(64,64,2)的超级块，以与稍后的压缩过程所使用的3维编码树单元(3-Dimension Coding Tree Unit，CTU3D)一致。The overall framework of the iterative retraining/fine-tuning process can iteratively alternate steps to gradually optimize the joint loss. Given a pre-trained DNN model with weight coefficients W and a mask M, which can be a pruned sparse model or an unpruned non-sparse model, in the first step, the order of indices I(W) = [i _0 , ..., i_k ] can be determined by a unified index ordering and method selection module 202 to reshape the weight coefficients W (and the corresponding mask M), where k = k _1 · k _2 · k_3 is the 3D tensor of the reshaped weights W. Specifically, the 3D tensor of the reshaped weights W can be partitioned into superblocks of size (g _i , g _o , g_k ). Let S denote a superblock. _{I(W) is determined individually for each superblock S based on the weight unified loss of the weight coefficients within the superblock S, i.e., based on the weight unified loss εT} (Θ). The size of the superblock is typically selected according to a compression method later. For example, in at least one embodiment, a superblock of size (64,64,2) can be selected to conform to the 3-Dimension Coding Tree Unit (CTU3D) used in the subsequent compression process.

每个超级块S被进一步划分为大小为(d_i,d_o,d_k)的块。块内发生权重统一。对于每个超级块S，使用权重统一器来统一S的块内的权重系数。设b表示S中的块，可以有不同的方式来统一b中的权重系数。例如，权重统一器可以将b中的所有权重设置为相同，例如，设置为b中所有权重的平均值。在这种情况下，b中的权重系数的L_N范数(例如，作为b中的权重的方差的L₂范数)反映了使用平均值来表示整个块的统一失真损失￡_I(b)。此外，权重统一器可以设置所有权重具有相同的绝对值，同时保持原始符号。在这种情况下，b中权重绝对值的L_N范数可用于测量L_I(b)。换句话说，给定权重统一方法u，权重统一器可以使用具有相关统一失真损失L_I(u,b)的方法u来统一b中的权重。整个超级块S的统一失真损失￡_I(u,S)可以通过在S中的所有块上对L_I(u,b)求平均值来确定，即L_I(u,S)＝average_b(L_I(u,b))。Each superblock S is further divided into blocks of size (d _i , d_o , d_k ). Weight unification occurs within each block. For each superblock S, a weight unifier is used to unify the weight coefficients within the blocks of S. Let b denote a block in S; there are different ways to unify the weight coefficients in b. For example, the weight unifier can set all weights in b to be the same, for example, set to the average of all weights in b. In this case, the L _N norm of the weight coefficients in b (e.g., the L ₂ norm as the variance of the weights in b) reflects the unification distortion loss ε _I (b) of representing the entire block using the average. Alternatively, the weight unifier can set all weights to have the same absolute value while preserving the original sign. In this case, the L _N norm of the absolute values of the weights in b can be used to measure _LI (b). In other words, given a weight unification method u, the weight unifier can unify the weights in b using a method u with a corresponding unification distortion loss _LI (u,b). The uniform distortion loss £ _I (u,S) of the entire superblock S can be determined by averaging _LI (u,b) over all blocks in S, i.e., _LI (u,S)＝average _b ( _LI (u,b)).

类似地，压缩率损失￡_C(u,S)反映了使用方法u在超级块S中统一权重的压缩效率。例如，当所有权重被设置为相同时，仅使用一个数字来表示整个块，并且压缩率是r_compression＝g_i·g_o·g_k。￡_C(u,S)可被定义为1/r_compression。Similarly, the compression ratio loss £ _C (u,S) reflects the compression efficiency of uniform weights in superblock S using method u. For example, when all weights are set to be the same, only one number is used to represent the entire block, and the compression ratio is r _compression = _gi · _go · g _k . £ _C (u,S) can be defined as 1/r _compression .

速度损失￡_S(u,S)反映了使用方法u在S中使用统一权重系数的估计计算速度，￡_S(u,S)是使用统一权重系数的计算中乘法运算的次数的函数。The speed loss £ _S (u,S) reflects the estimated computation speed of using method u with uniform weighting coefficients in S, and £ _S (u,S) is a function of the number of multiplication operations in the computation using uniform weighting coefficients.

到现在为止，对于对索引进行重新排序以生成权重W的3D张量的每个可能方式，以及对于由权重统一器统一权重的每个可能方法u，可以基于￡_I(u,S)、￡_C(u,S)、￡_S(u,S)来计算权重统一损失￡_U(u,S)。可以选择最优权重统一方法u*和最优重新排序索引I*(W)，它们的组合具有最小的权重统一损失￡_U*(u,S)。当k小时，可以彻底搜索最佳I*(W)和u*。对于大的k，可以使用其它方法来找到次优I*(W)和u*。本发明不对确定I*(W)和u*的具体方式进行任何限制。Up to this point, for each possible way of reordering the index to generate the 3D tensor of weights W, and for each possible method u of unifying the weights by the weight unifier, the weight unification loss £U(u,S) can be calculated based on £ _I (u,S), £ _C (u,S), and £ _S (u, _S ). An optimal weight unification method u* and an optimal reordered index I*(W) can be chosen, whose combination has the minimum weight unification loss £ _U *(u,S). When k is small, the optimal I*(W) and u* can be thoroughly searched. For large k, other methods can be used to find suboptimal I*(W) and u*. This invention does not impose any restrictions on the specific way of determining I*(W) and u*.

一旦为每个超级块S确定了索引I*(W)的次序和权重统一方法u*，目标就转向通过迭代地将所述联合损失最小化来找到更新的最优权重系数集W*和对应的权重掩码M*。具体地，对于第t次迭代，可以使用当前的权重系数W(t-1)和掩码M(t-1)。此外，可以在整个训练过程中保持权重统一掩码Q(t-1)。权重统一掩码Q(t-1)具有与W(t-1)相同的形状，W(t-1)记录对应的权重系数是否统一。然后，通过权重统一模块204计算统一权重系数W_U(t-1)和新的统一掩码Q(t-1)。在权重统一模块204中，可以根据所确定的索引I*(W)的次序对S中的权重系数进行重新排序，并且可以基于超级块的统一损失￡_U(u*,S)对超级块进行升序排序。给定超参数q，选择顶部的q个超级块来进行统一。并且，权重统一器使用对应确定的方法u*统一所选择的超级块S中的块，得到统一的权重W_U(t-1)和权重掩码M_U(t-1)。将统一掩码Q(t-1)中的对应条目标记为统一的。在至少一个实施例中，M_U(t-1)不同于M(t-1)，其中对于具有修剪和未修剪的权重系数的块，权重统一器将最初修剪的权重系数再次设置为具有非零值，并且将改变M_U(t-1)中的对应项。对于其它类型的块，M_U(t-1)自然保持不变。Once the order of indices I*(W) and the weight unification method u* are determined for each superblock S, the objective shifts to finding the updated optimal set of weight coefficients W* and the corresponding weight mask M* by iteratively minimizing the joint loss. Specifically, for the t-th iteration, the current weight coefficients W(t-1) and mask M(t-1) can be used. Furthermore, a weight unification mask Q(t-1) can be maintained throughout the training process. The weight unification mask Q(t-1) has the same shape as W(t-1), which records whether the corresponding weight coefficients are unified. Then, the unified weight coefficients W _U (t-1) and the new unification mask Q(t-1) are calculated by the weight unification module 204. In the weight unification module 204, the weight coefficients in S can be reordered according to the determined order of indices I*(W), and the superblocks can be sorted in ascending order based on the unification loss ￡ _U (u*,S) of the superblocks. Given a hyperparameter q, the top q superblocks are selected for unification. Furthermore, the weight unifier unifies the blocks in the selected superblock S using the corresponding determined method u*, obtaining unified weights W _U (t-1) and weight masks _MU (t-1). The corresponding entries in the unification mask Q(t-1) are marked as unified. In at least one embodiment, _MU (t-1) differs from M(t-1), wherein for blocks with both pruned and unpruned weight coefficients, the weight unifier resets the initially pruned weight coefficients to have non-zero values and modifies the corresponding entries in _MU (t-1). For other types of blocks, _MU (t-1) naturally remains unchanged.

在Q(t-1)中标记为统一的权重系数可以是固定的，并且W(t-1)的其余未固定权重系数可以通过神经网络训练过程来更新，从而得到更新的W(t)和M(t)。The uniform weight coefficients in Q(t-1) can be fixed, and the remaining unfixed weight coefficients in W(t-1) can be updated through the neural network training process to obtain updated W(t) and M(t).

设表示训练数据集，其中可以与原始数据集相同，基于该原始数据集获得预训练的权重系数W。还可以是不同于的数据集，但是具有与原始数据集相同的数据分布。在第二步骤中，经由网络转发计算模块206使用当前的统一权重系数W_U(t-1)和掩码M_U(t-1)，将每个输入x传递通过当前的网络，，产生估计的输出基于地面真实注释y和估计的输出计算目标损失模块208来计算目标训练损失然后可由计算梯度模块210来计算目标损失G(W_U(t-1))的梯度。深度学习框架(诸如tensorflow或pytorch)使用的自动梯度计算方法可以用来计算G(W_U(t-1))。基于梯度G(W_U(t-1))和统一掩码Q(t-1)，可以使用反向传播和权重更新模块212通过反向传播来更新W_U(t-1)的非固定权重系数和对应的掩码M_U(t-1)。再训练过程可以是迭代过程。一般进行多次迭代以更新W_U(t-1)的非固定部分和对应的M(t-1)，例如，直到目标损失收敛。然后系统进入下一次迭代t，其中给定新的超参数q(t)，根据W_U(t-1)、u*和I*(W)，通过权重统一过程来计算新的统一权重系数W_U(t)、掩码M_U(t)，以及对应的统一掩码Q(t)。Let represent the training dataset, which can be the same as the original dataset, based on which pre-trained weight coefficients W are obtained. It can also be a different dataset, but with the same data distribution as the original dataset. In the second step, each input x is passed through the current network via the network forwarding computation module 206 using the current unified weight coefficients W _U (t-1) and mask _MU (t-1), producing an estimated output. Based on the ground truth annotation y and the estimated output, the target loss computation module 208 calculates the target training loss. The gradient of the target loss G(W _U (t-1)) can then be calculated by the gradient computation module 210. Automatic gradient computation methods used by deep learning frameworks (such as TensorFlow or PyTorch) can be used to compute G(W _U (t-1)). Based on the gradient G(W_U (t-1)) and the unified mask Q(t-1), the non-fixed weight coefficients of W _U (t-1) and the corresponding mask _MU (t-1) can be updated via backpropagation using the backpropagation and weight update module 212. The retraining process can be an iterative process. Generally, multiple iterations are performed to update the non-fixed part of W _U (t-1) and the corresponding M(t-1), for example, until the target loss converges. Then the system enters the next iteration t, where given new hyperparameters q(t), new unified weight coefficients _W U(t), mask MU(t), and the corresponding unified mask Q(t) are calculated through a weight unification process based on W _U ( _t -1), u*, and I*(W).

在至少一个实施例中，在每次迭代期间超参数q(t)的值随着t的增加而增加，使得在整个迭代学习过程中统一和固定越来越多的权重系数。In at least one embodiment, the value of the hyperparameter q(t) increases with t during each iteration, thereby unifying and fixing more and more weight coefficients throughout the iterative learning process.

现在参考图3，描绘了图示压缩神经网络模型的方法300的步骤的操作流程图。在一些实施方案中，可由计算机102(图1)和服务器计算机114(图1)来执行图3的至少一个过程块。在一些实施方案中，可由另一个设备或一组设备来执行图3的至少一个过程块，所述另一个设备或一组设备与计算机102和服务器计算机114分离或包括计算机102和服务器计算机114。Referring now to Figure 3, an operational flowchart illustrating the steps of the method 300 for compressing a neural network model is depicted. In some embodiments, at least one process block of Figure 3 may be executed by computer 102 (Figure 1) and server computer 114 (Figure 1). In some embodiments, at least one process block of Figure 3 may be executed by another device or a set of devices, said other device or set of devices being separate from or including computer 102 and server computer 114.

在302处，方法300包括将对应于多维张量的至少一个索引进行重新排序，所述多维张量与神经网络相关联。At 302, method 300 includes reordering at least one index corresponding to a multidimensional tensor associated with a neural network.

在304处，方法300包括确定与至少一个重新排序的索引相关联的权重系数集。At 304, method 300 includes determining a set of weight coefficients associated with at least one reordered index.

在一些实施例中，确定与至少一个重新排序的索引相关联的所述权重系数集包括：对权重系数进行量化；以及选择使得统一损失值最小化的权重系数，其中，统一损失值与权重系数相关联。In some embodiments, determining the set of weight coefficients associated with at least one reordered index includes: quantizing the weight coefficients; and selecting weight coefficients that minimize the unified loss value, wherein the unified loss value is associated with the weight coefficients.

在一些实施例中，反向传播所述最小化的统一损失值，根据所述反向传播的最小化的统一损失值来训练所述神经网络。In some embodiments, the minimized uniform loss value is backpropagated, and the neural network is trained based on the minimized uniform loss value obtained through backpropagation.

在一些实施例中，反向传播所述最小化的统一损失值，根据所述反向传播的最小化的统一损失值，将所述权重系数中的至少一个权重系数固定。In some embodiments, the minimized unified loss value is backpropagated, and at least one of the weighting coefficients is fixed based on the minimized unified loss value obtained from the backpropagation.

在一些实施例中，确定与所述权重系数集相关联的梯度和统一掩码，根据所述梯度和所述统一掩码，更新所述权重系数中的至少一个非固定权重系数。In some embodiments, a gradient and a uniform mask associated with the set of weight coefficients are determined, and at least one non-fixed weight coefficient among the weight coefficients is updated based on the gradient and the uniform mask.

在一些实施例中，通过对所述权重系数进行量化和熵编解码来压缩所述权重系数集。In some embodiments, the set of weight coefficients is compressed by quantizing and entropy encoding/decoding the weight coefficients.

在一些实施例中，所述统一的权重系数集包括具有相同绝对值的至少一个权重系数。In some embodiments, the uniform set of weight coefficients includes at least one weight coefficient having the same absolute value.

在306处，方法300包括根据确定的权重系数集来压缩神经网络的模型。At 306, method 300 includes compressing a model of a neural network based on a determined set of weight coefficients.

可以认识到，图3仅提供了一种实施方案的图示，并不意味着对如何实现不同实施例的任何限制。可基于设计和实现需求对所描绘的环境进行许多修改。It is understood that Figure 3 is merely an illustration of one implementation scheme and does not imply any limitation on how different embodiments can be implemented. Many modifications can be made to the depicted environment based on design and implementation requirements.

通过本公开实施例提供的压缩神经网络模型的方法，提高压缩学习的权重系数的效率，从而加速使用优化的权重系数的计算，这可以实现显著压缩神经网络模型，并提高神经网络模型的计算效率。The method for compressing neural network models provided in this disclosure improves the efficiency of compressing learning weight coefficients, thereby accelerating the calculation of optimized weight coefficients. This can significantly compress neural network models and improve the computational efficiency of neural network models.

图4是根据示例性实施例的图1中描绘的计算机的内部组件和外部组件的框图400。应当理解，图4仅提供一种实施方案的图示，并不意味着对可实现不同实施例的环境的任何限制。可基于设计和实现需求对所描绘的环境进行许多修改。Figure 4 is a block diagram 400 of the internal and external components of the computer depicted in Figure 1 according to an exemplary embodiment. It should be understood that Figure 4 is merely an illustration of one embodiment and does not imply any limitation on the environment in which different embodiments may be implemented. Many modifications can be made to the depicted environment based on design and implementation requirements.

计算机102(图1)和服务器计算机114(图1)可以包括图4所示的相应的各组内部组件800A、800B和外部组件900A、900B。各组内部组件800中的每一个包括至少一个总线826上、至少一个操作系统828上以及至少一个计算机可读有形存储设备830上的至少一个处理器820、至少一个计算机可读RAM822以及至少一个计算机可读ROM824。Computer 102 (FIG. 1) and server computer 114 (FIG. 1) may include corresponding groups of internal components 800A, 800B and external components 900A, 900B as shown in FIG. 4. Each of the groups of internal components 800 includes at least one processor 820, at least one computer-readable RAM 822, and at least one computer-readable ROM 824 on at least one bus 826, at least one operating system 828, and at least one computer-readable tangible storage device 830.

处理器820以硬件、固件、或者硬件和软件的组合来实现。处理器820是中央处理单元(Central Processing Unit，CPU)、图形处理单元(Graphics Processing Unit，GPU)、加速处理单元(Accelerated Processing Unit，APU)、微处理器、微控制器、数字信号处理器(Digital Signal Processor，DSP)、现场可编程门阵列(Field-Programmable GateArray，FPGA)、专用集成电路(Application-Specific Integrated Circuit ASIC)或另一种类型的处理部件。在一些实施方案中，处理器820包括能够被编程以执行函数的至少一个处理器。总线826包括允许在内部组件800A、800B之间进行通信的组件。Processor 820 is implemented in hardware, firmware, or a combination of hardware and software. Processor 820 is a Central Processing Unit (CPU), Graphics Processing Unit (GPU), Accelerated Processing Unit (APU), microprocessor, microcontroller, Digital Signal Processor (DSP), Field-Programmable Gate Array (FPGA), Application-Specific Integrated Circuit (ASIC), or another type of processing unit. In some embodiments, processor 820 includes at least one processor capable of being programmed to execute functions. Bus 826 includes components that allow communication between internal components 800A, 800B.

服务器计算机114(图1)上的至少一个操作系统828、软件程序108(图1)和神经网络模型压缩程序116(图1)存储在至少一个相应的计算机可读有形存储设备830中，以由至少一个相应的处理器820经由至少一个相应的RAM822(其一般包括高速缓冲存储器)来执行。在图4所示的实施例中，每一个计算机可读有形存储设备830是内部硬盘驱动器的磁盘存储设备。可替代地，每个计算机可读有形存储设备830是半导体存储设备，诸如ROM824、EPROM、闪存、光盘、磁光盘、固态盘、压缩盘(Compact Disc，CD)、数字多功能盘(DigitalVersatile Disc，DVD)、软盘、盒式磁盘、磁带和/或可存储计算机程序和数字信息的另一种类型的非易失性计算机可读有形存储设备。At least one operating system 828, software program 108 (FIG. 1), and neural network model compression program 116 (FIG. 1) on server computer 114 (FIG. 1) are stored in at least one corresponding computer-readable tangible storage device 830 for execution by at least one corresponding processor 820 via at least one corresponding RAM 822 (which generally includes cache memory). In the embodiment shown in FIG. 4, each computer-readable tangible storage device 830 is a disk storage device of an internal hard disk drive. Alternatively, each computer-readable tangible storage device 830 is a semiconductor storage device, such as ROM 824, EPROM, flash memory, optical disc, magneto-optical disc, solid-state drive, compact disc (CD), digital versatile disc (DVD), floppy disk, cassette disk, magnetic tape, and/or another type of non-volatile computer-readable tangible storage device capable of storing computer programs and digital information.

每组内部组件800A、800B还包括R/W驱动器或接口832，以从至少一个便携式计算机可读有形存储设备936(例如CD-ROM、DVD、记忆棒、磁带、磁盘、光盘或半导体存储设备)读取或向其写入。软件程序(诸如软件程序108(图1)和神经网络模型压缩程序116(图1))可存储在至少一个相应的便携式计算机可读有形存储设备936上、经由相应的R/W驱动器或接口832进行读取并且被加载到相应的硬盘驱动器830中。Each set of internal components 800A, 800B also includes an R/W drive or interface 832 for reading from or writing to at least one portable computer-readable physical storage device 936 (e.g., CD-ROM, DVD, Memory Stick, magnetic tape, disk, optical disc, or semiconductor storage device). Software programs (such as software program 108 (FIG. 1) and neural network model compression program 116 (FIG. 1)) may be stored on at least one corresponding portable computer-readable physical storage device 936, read via the corresponding R/W drive or interface 832, and loaded into the corresponding hard disk drive 830.

每组内部组件800A、800B还包括网络适配器或接口836(例如TCP/IP适配卡)、无线Wi-Fi接口卡；或者3G、4G或5G无线接口卡或其它有线或无线通信链路。可经由网络(例如，因特网、局域网或其它广域网)和相应的网络适配器或接口836从外部计算机将服务器计算机114(图1)上的软件程序108(图1)和神经网络模型压缩程序116(图1)下载到计算机102(图1)。从网络适配器或接口836，服务器计算机114上的软件程序108和神经网络模型压缩程序116被加载到相应的硬盘驱动器830中。网络可以包括铜线、光纤、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。Each set of internal components 800A, 800B also includes a network adapter or interface 836 (e.g., a TCP/IP adapter card), a wireless Wi-Fi interface card; or a 3G, 4G, or 5G wireless interface card or other wired or wireless communication links. Software program 108 (Figure 1) and neural network model compression program 116 (Figure 1) on server computer 114 (Figure 1) can be downloaded to computer 102 (Figure 1) from an external computer via a network (e.g., the Internet, a local area network, or other wide area network) and the corresponding network adapter or interface 836. From the network adapter or interface 836, software program 108 and neural network model compression program 116 on server computer 114 are loaded into the corresponding hard disk drive 830. The network may include copper wire, fiber optic, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.

每组外部组件900A、900B可以包括计算机显示监视器920、键盘930和计算机鼠标934。外部组件900A、900B还可以包括触摸屏、虚拟键盘、触摸板、定点设备和其它人机接口设备。每组内部组件800A、800B还包括设备驱动器840，以与计算机显示监视器920、键盘930和计算机鼠标934进行接口连接。设备驱动器840、R/W驱动器或接口832、以及网络适配器或接口836包括硬件和软件(存储在存储设备830和/或ROM824中)。Each set of external components 900A, 900B may include a computer display monitor 920, a keyboard 930, and a computer mouse 934. External components 900A, 900B may also include a touchscreen, virtual keyboard, touchpad, pointing device, and other human-machine interface devices. Each set of internal components 800A, 800B also includes a device driver 840 for interfacing with the computer display monitor 920, keyboard 930, and computer mouse 934. Device driver 840, R/W driver or interface 832, and network adapter or interface 836 include hardware and software (stored in storage device 830 and/or ROM 824).

应预先理解，尽管本公开包括对云计算的详细描述，但本文所引用的教导的实施方案不限于云计算环境。相反，一些实施例能够结合现在已知的或以后开发的任何其它类型的计算环境来实现。It should be understood in advance that although this disclosure includes a detailed description of cloud computing, the implementations of the teachings cited herein are not limited to cloud computing environments. Rather, some embodiments can be implemented in conjunction with any other type of computing environment now known or developed hereafter.

云计算是服务交付模型，该服务交付模型用于实现对可配置计算资源(例如，网络、网络带宽、服务器、处理、存储器、存储、应用、虚拟机和服务)的共享池的方便的、按需的网络访问，该可配置计算资源可以通过最少的管理工作或与服务提供者的交互而快速地调配和释放。该云模型可以包括至少五个特性、至少三个服务模型和至少四个部署模型。Cloud computing is a service delivery model that enables convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing power, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with service providers. This cloud model may include at least five features, at least three service models, and at least four deployment models.

特性如下：The features are as follows:

按需自助服务(On-demand self-service)：云用户可根据需要自动地单方面提供计算功能，诸如服务器时间和网络存储，而无需与服务提供商进行人工交互。On-demand self-service: Cloud users can automatically and unilaterally provide computing functions, such as server time and network storage, as needed, without any manual interaction with the service provider.

广泛的网络访问(Broad network access)：功能可通过网络使用，并通过标准机制进行访问，该标准机制促进了异构瘦客户端平台或厚客户端平台(例如，移动电话、膝上型电脑和PDA)的使用。Broad network access: Functionality is available over a network and accessed through standard mechanisms that facilitate the use of heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).

资源池化(Resource pooling)：提供者的计算资源可以通过多租户模型集中起来，以为多个消费者提供服务，并根据需求动态分配和重新分配不同的物理和虚拟资源。一般情况下，消费者不能控制或并不知晓所提供的资源的确切位置，但可以能够指定较高抽象级别(例如，国家、州或数据中心)的位置，因此具有位置独立性。Resource pooling: A provider's computing resources can be pooled through a multi-tenant model to serve multiple consumers, dynamically allocating and reallocating different physical and virtual resources according to demand. Typically, consumers cannot control or know the exact location of the provided resources, but can specify locations at a higher level of abstraction (e.g., country, state, or data center), thus exhibiting location independence.

快速弹性(Rapid elasticity)：在一些情况下，可以快速且弹性地调配功能，在某些情况下，自动地快速向外扩展和快速释放以快速向内扩展。对于用户来说，可用于调配的的功能通常看起来是无限的，并且可在任何时间按任何数量购买。Rapid elasticity: In some cases, features can be rapidly and elastically scaled up; in others, they can be automatically and quickly scaled out and quickly released to scale in. To the user, the available features often appear unlimited and can be purchased in any quantity at any time.

已测量服务(Measured service)：云系统通过利用适于服务类型(例如存储、处理、带宽和活跃用户帐号)的某种抽象级别的计量能力，自动地控制和优化资源使用。可监视、控制和报告资源使用，从而为所使用服务的提供者和使用者双方提供透明度。Measured service: Cloud systems automatically control and optimize resource usage by leveraging metering capabilities at a certain level of abstraction appropriate to the service type (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency to both service providers and users.

服务模型如下：The service model is as follows:

软件即服务(Software as a Service，SaaS)：提供给消费者的功能是使用提供商在云基础架构上运行的应用程序。可通过诸如网络浏览器(例如基于web的电子邮件)之类的瘦客户端接口从各种客户端设备访问应用程序。除了可能会限制用户特定的应用配置设置外，用户既不管理也不控制包括网络、服务器、操作系统、存储、或甚至单个应用程序功能的底层云基础架构。Software as a Service (SaaS): This provides consumers with the functionality to use applications running on a provider's cloud infrastructure. These applications can be accessed from various client devices via thin client interfaces such as web browsers (e.g., web-based email). Aside from potentially limiting user-specific application configuration settings, users neither manage nor control the underlying cloud infrastructure, including the network, servers, operating system, storage, or even individual application functionalities.

平台即服务(Platform as a Service，PaaS)：提供给消费者的功能是将消费者创建的或获取的应用程序部署到云基础架构上，这些应用程序是使用提供商支持的编程语言和工具创建的。用户既不管理也不控制包括网络、服务器、操作系统或存储在内的底层云基础架构，但对已部署的应用程序具有控制权，并且对应用托管环境配置可能也具有控制权。Platform as a Service (PaaS): This provides consumers with the ability to deploy applications created or acquired by the consumer onto cloud infrastructure using programming languages and tools supported by the provider. Users neither manage nor control the underlying cloud infrastructure, including networks, servers, operating systems, or storage, but they have control over the deployed applications and may also have control over the configuration of the application hosting environment.

基础架构即服务(Infrastructure as a Service，IaaS)：提供给消费者的功能是提供处理，存储，网络和其他基础计算资源，使消费者能够在其中部署和运行任意软件，所述任意软件包括操作系统和应用程序。用户既不管理也不控制底层云基础架构，但是对操作系统、存储、已部署的应用程序具有控制权，并且对选择网络部件(例如，主机防火墙)可能具有有限的控制权。Infrastructure as a Service (IaaS) provides consumers with processing, storage, networking, and other basic computing resources, enabling them to deploy and run arbitrary software, including operating systems and applications. Users neither manage nor control the underlying cloud infrastructure, but have control over the operating system, storage, deployed applications, and may have limited control over the selection of network components (e.g., host firewalls).

部署模型如下：The deployment model is as follows:

私有云：云基础架构单独为某个组织运行。它可由该组织或第三方管理，并且可存在于内部(on-premises)或外部(off-premises)。Private cloud: A cloud infrastructure that operates exclusively for a single organization. It can be managed by that organization or a third party, and can exist either on-premises or off-premises.

社区云：云基础架构由若干组织共享，并且支持具有共同关注点(例如，任务、安全要求、策略和合规性注意事项(compliance considerations))的特定社区。它可由所述组织或第三方管理，并且可存在于内部或外部。Community cloud: A cloud infrastructure shared by several organizations and supporting a specific community with common concerns (e.g., tasks, security requirements, policies, and compliance considerations). It may be managed by the organizations or third parties and may exist internally or externally.

公共云：云基础架构可供公众或大型产业群使用，并由出售云服务的组织拥有。Public cloud: Cloud infrastructure available to the public or large industrial groups and owned by organizations that sell cloud services.

混合云：云基础架构是由两个或更多个云(私有、社区或公共)组成的，这些云仍然是唯一的实体，但是通过标准化或专有技术将他们绑定在一起，该标准化或专有技术实现数据和应用可移植性(例如，云间负载平衡的云爆炸)。Hybrid cloud: A cloud infrastructure consisting of two or more clouds (private, community, or public) that remain a single entity but are bound together by a standardized or proprietary technology that enables data and application portability (e.g., cloud explosion with load balancing between clouds).

云计算环境是面向焦点的服务，该焦点为无状态、低耦合、模块化和语义互操作性。云计算的核心是包括相互连接的节点组成的网络的基础架构。Cloud computing environments are services focused on statelessness, loose coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure comprising a network of interconnected nodes.

参考图5，描绘了示例性云计算环境500。如图所示，云计算环境500包括至少一个云计算节点10，云用户使用的本地计算设备可以与所述至少一个云计算节点进行通信，该本地计算设备为诸如，例如个人数字助理(Personal Digital Assistant PDA)或蜂窝电话54A、台式计算机54B、膝上型计算机54C和/或汽车计算机系统54N。云计算节点10可彼此通信。它们可在至少一个网络中进行物理地或虚拟地分组(未示出)，所述网络为诸如如上所述的私有云、社区云、公共云或混合云或其组合。这允许云计算环境500提供基础架构即服务、平台即服务和/或软件即服务，云消费者不需要维护本地计算设备上的资源。应当理解，图5中所示的计算设备54A-N的类型仅旨在是示例性的，并且云计算节点10和云计算环境500可通过任何类型的网络和/或网络可寻址连接(例如，使用web浏览器)与任何类型的计算机化设备进行通信。参考图6，示出了由云计算环境500(图5)提供的一组功能抽象层600。应当预先理解，图6中所示的部件、层和功能仅仅是示例性的，并且实施例不限于此。如图所示，提供以下层和对应功能：Referring to Figure 5, an exemplary cloud computing environment 500 is depicted. As shown, the cloud computing environment 500 includes at least one cloud computing node 10, and local computing devices used by cloud users can communicate with the at least one cloud computing node. These local computing devices are, for example, a Personal Digital Assistant (PDA) or cellular phone 54A, a desktop computer 54B, a laptop computer 54C, and/or an automotive computer system 54N. The cloud computing nodes 10 can communicate with each other. They can be physically or virtually grouped in at least one network (not shown), such as a private cloud, community cloud, public cloud, or hybrid cloud, or a combination thereof, as described above. This allows the cloud computing environment 500 to provide Infrastructure as a Service, Platform as a Service, and/or Software as a Service, without requiring cloud consumers to maintain resources on their local computing devices. It should be understood that the types of computing devices 54A-N shown in Figure 5 are intended to be exemplary only, and the cloud computing nodes 10 and the cloud computing environment 500 can communicate with any type of computerized device via any type of network and/or network-addressable connectivity (e.g., using a web browser). Referring to Figure 6, a set of functional abstraction layers 600 provided by the cloud computing environment 500 (Figure 5) is shown. It should be understood beforehand that the components, layers, and functions shown in Figure 6 are merely exemplary, and the embodiments are not limited thereto. As shown, the following layers and corresponding functions are provided:

硬件和软件层60包括硬件部件和软件部件。硬件部件的示例包括：主机61；基于精简指令集计算机(Reduced Instruction Set Computer，RISC)体系结构的服务器62；服务器63；刀片服务器64；存储设备65；以及网络和联网部件66。在一些实施例中，软件部件包括网络应用服务器软件67和数据库软件68。The hardware and software layer 60 includes hardware components and software components. Examples of hardware components include: a host 61; a server 62 based on a Reduced Instruction Set Computer (RISC) architecture; a server 63; a blade server 64; a storage device 65; and a network and networking component 66. In some embodiments, the software components include network application server software 67 and database software 68.

虚拟化层70提供抽象层，从该抽象层可提供虚拟实体的以下示例：虚拟服务器71；虚拟存储器72；虚拟网络73，包括虚拟专用网络；虚拟应用程序和操作系统74；以及虚拟客户端75。The virtualization layer 70 provides an abstraction layer from which the following examples of virtual entities can be provided: virtual server 71; virtual storage 72; virtual network 73, including virtual private network; virtual application and operating system 74; and virtual client 75.

在一个示例中，管理层80可提供下面描述的功能。资源供应81提供计算资源和用于在云计算环境内执行任务的其它资源的动态获取。当在云计算环境中利用资源时，计量和定价82对资源的使用进行成本跟踪，并且为这些资源的消耗提供帐单和发票。在一个示例中，这些资源可以包括应用软件许可证。安全性为云用户和任务提供身份验证，以及对数据和其他资源的保护。用户门户83为用户和系统管理员提供对云计算环境的访问。服务级别管理84提供云计算资源分配和管理，从而满足所需的服务级别。服务水平协议(ServiceLevel Agreement，SLA)规划和实现85为根据SLA预期未来需求的云计算资源提供了预安排和采购。In one example, management layer 80 provides the functionality described below. Resource provisioning 81 provides dynamic acquisition of computing resources and other resources used to perform tasks within the cloud computing environment. Metering and pricing 82 tracks the cost of resource usage and provides billing and invoicing for the consumption of these resources when they are utilized in the cloud computing environment. In one example, these resources may include application software licenses. Security provides authentication for cloud users and tasks, as well as protection for data and other resources. User portal 83 provides access to the cloud computing environment for users and system administrators. Service level management 84 provides cloud resource allocation and management to meet required service levels. Service Level Agreement (SLA) planning and implementation 85 provides pre-scheduling and procurement of cloud resources for anticipated future needs according to the SLA.

工作负载层90提供可以利用云计算环境的功能的示例。可以从该层提供的工作负载和功能的示例包括：映射(mapping)及导航91；软件开发及生命周期管理92；虚拟课堂教学交付93；数据分析处理94；交易处理95；以及神经网络模型压缩96。神经网络模型压缩96可以压缩神经网络模型。Workload layer 90 provides examples of functionalities that can leverage cloud computing environments. Examples of workloads and functionalities that can be provided from this layer include: mapping and navigation 91; software development and lifecycle management 92; virtual classroom delivery 93; data analysis and processing 94; transaction processing 95; and neural network model compression 96. Neural network model compression 96 can compress neural network models.

一些实施例可涉及在任何可能的技术细节集成水平的系统、方法和/或计算机可读介质。计算机可读介质可以包括计算机可读非易失性存储介质，计算机可读非易失性存储介质具有用于使处理器执行操作的计算机可读程序指令。Some embodiments may relate to systems, methods, and/or computer-readable media at any possible level of technical detail integration. A computer-readable medium may include a computer-readable non-volatile storage medium having computer-readable program instructions for causing a processor to perform operations.

计算机可读存储介质可为有形设备，有形设备可保存和存储由指令执行设备使用的指令。计算机可读存储介质可为，例如但不限于，电子存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或前述的任何合适的组合。计算机可读存储介质的更具体示例的非穷尽列表包括以下内容：便携式计算机软盘、硬盘、随机存取存储器(RandomAccess Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、静态随机存取存储器(Static Random Access Memory,SRAM)、便携式光盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字通用盘(Digital Versatile Disk,DVD)、记忆棒、软盘、机械编码装置，例如其上记录有指令的凹槽中的打孔卡片或凸起结构，以及前述的任何合适的组合。如在本文使用的，计算机可读存储介质不被解释为是瞬时信号本身，例如无线电波或其它自由传播的电磁波、通过波导或其它传输介质传播的电磁波(例如，通过光缆的光脉冲)，或通过导线传输的电信号。Computer-readable storage media can be tangible devices that hold and store instructions for use by an instruction execution device. Computer-readable storage media can be, for example, but not limited to, electronic storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of computer-readable storage media includes the following: portable computer floppy disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), compact disc read-only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punched cards or raised structures in recesses on which instructions are recorded, and any suitable combination of the foregoing. As used herein, a computer-readable storage medium is not to be construed as a transient signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through optical fibers), or electrical signals transmitted through wires.

可将本文描述的计算机可读程序指令从计算机可读存储介质下载到相应的计算/处理设备，或者经由网络(例如，因特网、局域网、广域网和/或无线网络)下载到外部计算机或外部存储设备。网络可以包括铜传输缆线、光传输纤维、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或网络接口从网络接收计算机可读程序指令，并且转发计算机可读程序指令以存储在相应的计算/处理设备内的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a suitable computing/processing device, or downloaded to an external computer or external storage device via a network (e.g., the Internet, a local area network, a wide area network, and/or a wireless network). The network may include copper cables, optical fibers, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards them to a computer-readable storage medium within the suitable computing/processing device.

执行操作的计算机可读程序代码/指令可为汇编指令、指令集体系结构(Instruction-Set-Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、集成电路的配置数据，或者以一种或多种编程语言的任何组合编写的源代码或目标代码，该编程语言包括面向对象的编程语言(诸如SmallTalk、C++等)以及过程编程语言(诸如“C”编程语言或类似的编程语言)。计算机可读程序指令可完全在用户的计算机上执行、部分地在用户的计算机上执行、作为独立软件包部分地在用户的计算机上并且部分地在远程计算机上、或者完全在远程计算机或服务器上执行。在后一种情况下，远程计算机可通过任何类型的网络(包括局域网(Local Area Network,LAN)或广域网(Wide AreaNetwork,WAN))连接到用户的计算机，或者可连接到外部计算机(例如，通过使用因特网服务提供商的因特网)。在一些实施例中，包括(例如)可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Arrays,FPGA)或可编程逻辑阵列(Programmable LogicArrays,PLA)的电子电路可通过利用计算机可读程序指令的状态信息来执行计算机可读程序指令以使电子电路个性化，进而执行本公开的各方面或操作。The computer-readable program code/instructions that perform the operation can be assembly instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, integrated circuit configuration data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages (such as Smalltalk, C++, etc.) and procedural programming languages (such as the "C" programming language or similar programming languages). The computer-readable program instructions can execute entirely on the user's computer, partially on the user's computer, as a standalone software package partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter case, the remote computer can be connected to the user's computer via any type of network (including local area network (LAN) or wide area network (WAN)) or can be connected to an external computer (e.g., via the Internet provided by an Internet service provider). In some embodiments, electronic circuits including, for example, programmable logic circuits, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs) can execute computer-readable program instructions to personalize the electronic circuits by utilizing state information of computer-readable program instructions, thereby performing aspects or operations of this disclosure.

可以将这些计算机可读程序指令提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器以产生机器，使得经由计算机或其它可编程数据处理装置的处理器执行的指令，创建用于实现流程图和/或框图的框或多个框中指定的功能/动作的方法。这些计算机可读程序指令还可被存储在计算机可读存储介质中，该计算机可读存储介质可指导计算机、可编程数据处理装置和/或其它设备以特定方式运行，以使存储有指令的计算机可读存储介质包括这样的制造品，该制造品包括实现在流程图和/或框图的框或多个框中指定的功能/操作的各方面的指令。These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create methods for implementing the functions/operations specified in the blocks or blocks of the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that can instruct a computer, programmable data processing apparatus, and/or other device to operate in a particular manner, such that the computer-readable storage medium storing the instructions includes an article of manufacture comprising instructions that implement aspects of the functions/operations specified in the blocks or blocks of the flowchart and/or block diagram.

还可以将计算机可读程序指令加载到计算机、其他可编程数据处理装置或其他设备上，以在计算机、其他可编程装置或其他设备上执行一系列操作步骤，以产生计算机实现的处理，使得在计算机、其他可编程装置或其他设备上执行的指令实现在流程图和/或框图的框或多个框中指定的功能/操作。Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus or other device to perform a series of operational steps on the computer, other programmable apparatus or other device to produce computer-implemented processing, such that the instructions executed on the computer, other programmable apparatus or other device implement the functions/operations specified in the boxes or blocks of the flowchart and/or block diagram.

附图中的流程图和框图图示了根据各种实施例的系统、方法和计算机可读介质的可能实现的体系结构、功能和操作。在这点上，流程图或框图中的每个框可表示指令的模块、片段或部分，该指令包括用于实现指定的逻辑功能的至少一个可执行指令。方法、计算机系统和计算机可读介质可以包括附加框、更少的框、不同的框，或者与附图中所描绘的那些框不同布置的框。在一些可替换实施方案中，在框中指出的功能可以不按照附图中指出的顺序执行。例如，连续示出的两个框实际上可以同时或基本上同时执行，或者这些框有时可以以相反的顺序执行，这取决于所涉及的功能。还应注意，框图和/或流程图图示中的每个框，以及框图和/或流程图图示中的框的组合可通过基于硬件的专用系统来实现，所述基于硬件的专用系统执行指定功能或操作、或者执行专用硬件和计算机指令的组合。The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer-readable media according to various embodiments. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of instructions, including at least one executable instruction for implementing a specified logical function. Methods, computer systems, and computer-readable media may include additional blocks, fewer blocks, different blocks, or blocks arranged differently from those depicted in the drawings. In some alternative embodiments, the functions indicated in the blocks may not be performed in the order shown in the drawings. For example, two blocks shown consecutively may actually be performed simultaneously or substantially simultaneously, or these blocks may sometimes be performed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, may be implemented by a hardware-based dedicated system that performs the specified function or operation, or performs a combination of dedicated hardware and computer instructions.

显然，本文描述的系统和/或方法可以以不同形式的硬件、固件、或硬件和软件的组合来实现。实现这些系统和/或方法的实际专用控制硬件或软件代码不限于这些实施方案。因此，在不参考特定软件代码的情况下在本文描述了系统和/或方法的操作和行为——应当理解，可基于在本文的描述来设计软件和硬件以实现系统和/或方法。Clearly, the systems and/or methods described herein can be implemented in various forms of hardware, firmware, or combinations of hardware and software. The actual dedicated control hardware or software code for implementing these systems and/or methods is not limited to these implementations. Therefore, while the operation and behavior of the systems and/or methods are described herein without reference to specific software code—it should be understood that software and hardware can be designed to implement the systems and/or methods based on the description herein.

本文中使用的元件、动作或指令都不应被解释为关键的或必要的，除非明确地如此描述。此外，如本文所用，冠词“一种(a)”和“一种(an)”旨在包括至少一个项目，并且可与“至少一个”互换使用。此外，如本文所用，术语“组”旨在包括至少一个项目(例如，相关项目、不相关项目、相关项目和不相关项目的组合等)，并且可与“至少一个”互换使用。当仅意指一个项目时，使用术语“一个”或类似语言。此外，如本文所用，术语“具有(has)”、“具有(have)”、“具有(having)”等旨在是开放式术语。进一步地，短语“根据”旨在表示“至少部分地根据”，除非另外明确声明。Elements, actions, or instructions used herein should not be construed as critical or essential unless explicitly stated otherwise. Furthermore, as used herein, the articles “a” and “an” are intended to include at least one item and are interchangeable with “at least one.” Additionally, as used herein, the term “group” is intended to include at least one item (e.g., related items, unrelated items, a combination of related and unrelated items, etc.) and is interchangeable with “at least one.” The term “a” or similar language is used when referring to only one item. Furthermore, as used herein, the terms “has,” “have,” “having,” etc., are intended to be open-ended terms. Further, the phrase “according to” is intended to mean “at least partially according to,” unless otherwise explicitly stated.

已出于图示的目的呈现了对各个方面和实施例的描述，但该描述并非旨在是详尽的或限于所公开的实施例。虽然特征的组合在权利要求中引用和/或在说明书中公开，但是这些组合并不旨在对可能的实施方案的公开进行限制。事实上，这些特征中的许多特征可以以权利要求中未具体列举和/或说明书中未公开的方式组合。虽然下面列出的每个从属权利要求可直接从属于仅一个权利要求，但是可能的实施方式的公开包括与权利要求组中的每个其它权利要求进行组合的每个从属权利要求。在不脱离所描述的实施例的范围的情况下，许多修改和变化对于本领域普通技术人员是显而易见的。本文使用的术语被选择为最佳地解释实施例的原理、实际应用或相对于市场中发现的技术的改进，或者使得本领域普通技术人员能够理解本文公开的实施例。Descriptions of various aspects and embodiments have been presented for illustrative purposes, but such description is not intended to be exhaustive or limited to the disclosed embodiments. While combinations of features are referenced in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of possible embodiments. In fact, many of these features can be combined in ways not specifically listed in the claims and/or not disclosed in the specification. While each dependent claim listed below may be directly subordinated to only one claim, the disclosure of possible embodiments includes each dependent claim combined with each other claim in the group of claims. Many modifications and variations will be apparent to those skilled in the art without departing from the scope of the described embodiments. The terminology used herein has been chosen to best explain the principles of the embodiments, their practical application, or improvements relative to technology found in the market, or to enable those skilled in the art to understand the embodiments disclosed herein.

Claims

1. A method for compressing a neural network model used for video processing, characterized in that the input and output of the neural network model are video or images, the original pre-trained neural network model is compressed through weight uniform regularization, and feature descriptors are extracted using the compressed neural network model, the method comprising:

At least one index corresponding to a multidimensional tensor is reordered, where the multidimensional tensor is the multidimensional tensor corresponding to the weight coefficients of each network layer of the neural network; the at least one index is determined based on the weight uniform loss of the weight coefficients.

The weight coefficients are quantized, and the weight coefficients that minimize the unified loss value are selected to determine the set of weight coefficients associated with at least one reordered index, wherein the unified loss value is associated with the weight coefficients; and

The model of the neural network is compressed according to the determined set of weight coefficients;

The method further includes:

The minimized uniform loss value is backpropagated, and at least one of the weight coefficients is fixed based on the minimized uniform loss value obtained from the backpropagation. The uniform loss value includes compression ratio loss, uniform distortion loss, and computation speed loss.

Determine the gradient and uniform mask associated with the set of weight coefficients, and update at least one non-fixed weight coefficient among the weight coefficients based on the gradient and the uniform mask.

2. The method according to claim 1, further comprising backpropagating the minimized uniform loss value and training the neural network based on the minimized uniform loss value obtained from the backpropagation.

3. The method according to any one of claims 1-2, characterized in that it further includes compressing the set of weight coefficients by quantizing and entropy encoding/decoding the weight coefficients.

4. The method according to any one of claims 1-2, characterized in that the uniform set of weight coefficients includes at least one weight coefficient having the same absolute value.

5. A computer system for compressing a neural network model used for video processing, characterized in that the input and output of the neural network model are video or images, the original pre-trained neural network model is compressed through weight uniform regularization, and feature descriptors are extracted using the compressed neural network model, the computer system comprising:

A reordering module is used to reorder at least one index corresponding to a multidimensional tensor, wherein the multidimensional tensor is the multidimensional tensor corresponding to the weight coefficients of each network layer of the neural network; the at least one index is determined based on the weight uniform loss of the weight coefficients.

A unification module for determining a set of weight coefficients associated with at least one reordered index, comprising a quantization module for quantizing the weight coefficients; and a selection module for selecting weight coefficients that minimize a unification loss value, wherein the unification loss value is associated with the weight coefficients; and

A compression module is used to compress the model of the neural network according to a determined set of weight coefficients;

The computer system further includes an update module, which is used to backpropagate the minimized uniform loss value and fix at least one of the weight coefficients based on the backpropagated minimized uniform loss value. The uniform loss value includes compression ratio loss, uniform distortion loss and computation speed loss.

The update module is further configured to determine the gradient and uniform mask associated with the set of weight coefficients, and update at least one non-fixed weight coefficient among the weight coefficients based on the gradient and the uniform mask.

6. The computer system according to claim 5, further comprising a training module, the training module being used to backpropagate the minimized uniform loss value and train the neural network based on the minimized uniform loss value obtained from the backpropagation.

7. The computer system according to claim 5, further comprising a compression module, the compression module being used to compress the set of weight coefficients by quantizing and entropy encoding/decoding the weight coefficients.

8. A non-volatile computer-readable medium, characterized in that it stores thereon a computer program for compressing a neural network model, the computer program being used to cause at least one computer processor to perform the method as described in any one of claims 1-4.

9. A computing device, characterized in that it comprises a processor and a memory; the memory stores a computer program, which, when executed by the processor, causes the processor to perform the method as described in any one of claims 1 to 4.