HK40087723B

HK40087723B - Method and apparatus for video decoding, electronic device and storage medium

Info

Publication number: HK40087723B
Application number: HK62023076322.4A
Authority: HK
Inventors: 丁鼎; 蒋薇; 王炜; 刘杉
Original assignee: 腾讯美国有限责任公司
Priority date: 2021-06-16
Filing date: 2022-06-03
Publication date: 2024-09-27

Description

Video decoding methods and apparatus, electronic devices and storage media

通过引用并入By incorporating via reference

本申请要求2022年5月27日提交的美国专利申请第17/826,806号，“CONTENT-ADAPTIVE ONLINE TRAINING METHOD AND APPARATUS FOR DEBLOCKING IN BLOCK-WISEIMAGE COMPRESSION”的优先权权益，该美国专利申请要求2021年6月16日提交的美国临时申请第63/211,408号，“Content Adaptive Online Training Deblocking for Block-Wise Image Compression”的优先权权益。在先申请的公开内容在此通过引用整体并入。This application claims priority to U.S. Patent Application No. 17/826,806, filed May 27, 2022, entitled “CONTENT-ADAPTIVE ONLINE TRAINING METHOD AND APPARATUS FOR DEBLOCKING IN BLOCK-WISEIMAGE COMPRESSION”, and U.S. Provisional Application No. 63/211,408, filed June 16, 2021, entitled “Content Adaptive Online Training Deblocking for Block-Wise Image Compression”. The disclosure of the earlier applications is incorporated herein by reference in its entirety.

技术领域Technical Field

本申请涉及视频编解码技术领域，尤其涉及一种视频解码方法和装置、电子设备和计算机可读存储介质。This application relates to the field of video encoding and decoding technology, and in particular to a video decoding method and apparatus, electronic device and computer-readable storage medium.

背景技术Background Technology

本文中提供的背景技术描述的目的在于总体上呈现本公开内容的背景。就在该背景技术部分中描述的工作的程度而言，目前署名的发明人的工作以及在提交时可能未以其他方式描述为现有技术的描述的各方面既没有明确地也没有隐含地被承认为针对本公开内容的现有技术。The purpose of the background description provided herein is to provide a general overview of the context of this disclosure. To the extent that the work described in this background section is intended, neither the work of the currently named inventors nor any aspect of the description which at the time of filing may not otherwise be described as prior art is expressly or implicitly acknowledged as prior art to this disclosure.

可以使用带有运动补偿的图片间预测来执行图像和/或视频编码和解码。未压缩的数字图像和/或视频可以包括一系列图片，每个图片具有例如，1920×1080的亮度样本和相关联的色度样本的空间维度。这一系列图片可以具有例如，每秒60幅图片或60Hz的固定或可变的图片速率(非正式地也称为帧速率)。未压缩的图像和/或视频具有特定的比特率要求。例如，每个样本8比特的1080p60 4:2:0视频(60Hz帧速率下的1920×1080亮度样本分辨率)需要接近1.5Gbit/s的带宽。一小时这样的视频需要超过600千兆字节的存储空间。Image and/or video encoding and decoding can be performed using inter-picture prediction with motion compensation. Uncompressed digital images and/or video can comprise a series of pictures, each with a spatial dimension of, for example, 1920×1080 luminance samples and associated chrominance samples. This series of pictures can have, for example, a fixed or variable picture rate (also informally referred to as frame rate) of 60 pictures per second or 60 Hz. Uncompressed images and/or video have specific bit rate requirements. For example, 1080p60 4:2:0 video with 8 bits per sample (1920×1080 luminance sample resolution at a frame rate of 60 Hz) requires close to 1.5 Gbit/s of bandwidth. One hour of such video would require more than 600 gigabytes of storage space.

图像和/或视频编码和解码的一个目的可以是通过压缩减少输入图像和/或视频信号中的冗余。压缩可以有助于降低前面提到的带宽和/或存储空间要求，在一些情况下可以降低两个数量级或更多。尽管本文中的描述使用视频编码/解码作为说明性示例，但是在不脱离本公开内容的精神的情况下，相同的技术可以以类似的方式应用于图像编码/解码。可以采用无损压缩和有损压缩两者，以及其组合。无损压缩是指可以根据压缩的原始信号重构原始信号的精确副本的技术。当使用有损压缩时，重构的信号可能与原始信号不同，但是原始信号与重构的信号之间的失真足够小，以使得重构的信号能够用于预期应用。在视频的情况下，广泛地采用有损压缩。容忍的失真量取决于应用；例如，某些消费者流媒体应用的用户可能比电视分配应用的用户容忍更高的失真。可实现的压缩比可以反映出：越高的可允许/可容忍的失真可以产生越高的压缩比。One objective of image and/or video encoding and decoding can be to reduce redundancy in the input image and/or video signal through compression. Compression can help reduce the aforementioned bandwidth and/or storage requirements, in some cases by two orders of magnitude or more. Although the description herein uses video encoding/decoding as an illustrative example, the same techniques can be applied to image encoding/decoding in a similar manner without departing from the spirit of this disclosure. Both lossless and lossy compression, as well as combinations thereof, can be employed. Lossless compression refers to a technique that reconstructs an exact copy of the original signal from the compressed original signal. When lossy compression is used, the reconstructed signal may differ from the original signal, but the distortion between the original and reconstructed signals is small enough that the reconstructed signal can be used for the intended application. In the case of video, lossy compression is widely used. The amount of distortion tolerated depends on the application; for example, users of some consumer streaming applications may tolerate higher distortion than users of television distribution applications. The achievable compression ratio can be reflected in the fact that higher allowable/tolerable distortion can result in a higher compression ratio.

视频编码器和解码器可以利用来自包括例如，运动补偿、变换、量化和熵编码的若干大类的技术。Video encoders and decoders can utilize techniques from several broad categories, including, for example, motion compensation, transform, quantization, and entropy coding.

视频编解码器技术可以包括被称为帧内编码的技术。在帧内编码中，在不参考来自先前重构的参考图片的样本或其他数据的情况下表示样本值。在一些视频编解码器中，图片在空间上被细分为样本块。当所有的样本块都以帧内模式编码时，该图片可以是帧内图片。帧内图片及其派生(例如，独立解码器刷新图片)可以用于重置解码器状态，并且因此可以用作编码视频比特流和视频会话中的第一幅图片，或用作静止图像。可以使帧内块的样本经受变换，并且可以在熵编码之前对变换系数进行量化。帧内预测可以是使预变换域中的样本值最小化的技术。在一些情况下，变换之后的DC值越小并且AC系数越小，在给定量化步长下表示熵编码之后的块所需的比特就越少。Video codec techniques can include techniques known as intra-frame coding. In intra-frame coding, sample values are represented without reference to samples or other data from previously reconstructed reference images. In some video codecs, an image is spatially subdivided into sample blocks. When all sample blocks are encoded in intra-frame mode, the image can be an intra-frame image. Intra-frame images and their derivatives (e.g., independent decoder refresh images) can be used to reset the decoder state and therefore can be used as the first image in the encoded video bitstream and video session, or as a still image. Samples of an intra-frame block can be transformed, and the transform coefficients can be quantized before entropy coding. Intra-frame prediction can be a technique that minimizes the sample values in the pre-transform domain. In some cases, the smaller the transformed DC value and the smaller the AC coefficients, the fewer bits are needed to represent the entropy-coded block at a given quantization step size.

传统的帧内编码，例如从例如MPEG-2代编码技术中已知的帧内编码，不使用帧内预测。然而，一些较新的视频压缩技术包括根据周围样本数据和/或例如在对空间上邻近并且解码顺序在先的数据块的编码和/或解码期间获得的元数据进行尝试的技术。这样的技术在下文中称为“帧内预测”技术。注意，在至少一些情况下，帧内预测使用仅来自重构下的当前图片的参考数据，而不使用来自参考图片的参考数据。Traditional intra-frame coding, such as intra-frame coding techniques known from, for example, MPEG-2 generation coding techniques, does not use intra-frame prediction. However, some newer video compression techniques include those that attempt to predict based on surrounding sample data and/or metadata obtained, for example, during the encoding and/or decoding of spatially adjacent and first-decoding data blocks. Such techniques are referred to below as "intra-frame prediction" techniques. Note that in at least some cases, intra-frame prediction uses reference data only from the current picture under reconstruction, and not reference data from a reference picture.

可以存在许多不同形式的帧内预测。当在给定视频编码技术中可以使用多于一种的这样的技术时，使用的技术可以在帧内预测模式下进行编码。在某些情况下，模式可以具有子模式和/或参数，并且这些子模式和/或参数可以被单独编码或被包括在模式码字中。针对给定模式、子模式和/或参数组合使用哪个码字可以通过帧内预测影响编码效率增益，并且因此用于将码字转换成比特流的熵编码技术也可以通过帧内预测影响编码效率增益。There can be many different forms of intra-prediction. When more than one such technique can be used in a given video coding technique, the technique used can be encoded in an intra-prediction mode. In some cases, a mode can have sub-modes and/or parameters, and these sub-modes and/or parameters can be encoded individually or included in the mode codeword. Which codeword is used for a given combination of modes, sub-modes, and/or parameters can affect the coding efficiency gain through intra-prediction, and therefore the entropy coding technique used to convert the codeword into a bitstream can also affect the coding efficiency gain through intra-prediction.

帧内预测的某些模式通过H.264引入、在H.265中被细化，并且在诸如联合开发模型(joint exploration model，JEM)、通用视频编码(versatile video coding，VVC)和基准集(benchmark set，BMS)的较新编码技术中被进一步细化。使用属于已可用样本的邻近样本值，可以形成预测器块。根据方向将邻近样本的样本值复制到预测器块中。可以将对使用的方向的参考编码在比特流中，或者可以自己预测对使用的方向的参考。Some intra-frame prediction modes were introduced in H.264, refined in H.265, and further refined in newer coding techniques such as the Joint Exploration Model (JEM), Versatile Video Coding (VVC), and Benchmark Sets (BMS). Predictor blocks can be formed using neighboring sample values belonging to an already available sample. The sample values of neighboring samples are copied into the predictor block according to the direction. The reference for the direction used can be encoded in the bitstream, or the reference for the direction used can be predicted manually.

在现有技术中，混合视频编解码器中单个模块(例如，编码器)的改进无法传递至另一模块(例如，解码器)，因此有可能无法产生整体性能上的增益。In the prior art, improvements in a single module (e.g., encoder) in a hybrid video codec cannot be passed on to another module (e.g., decoder), and therefore may not result in an overall performance gain.

发明内容Summary of the Invention

本申请实施例还提供一种视频解码方法，包括：对要从编码视频比特流重构的图像的块进行重构；对编码视频比特流中的第一去块信息进行解码，该第一去块信息包括视频解码器中的深度神经网络(deep neural network，DNN)的第一去块参数，DNN的第一去块参数是先前已经通过内容自适应训练过程确定的更新参数；基于包括在第一去块信息中的第一去块参数针对包括重构块中的样本子集的第一边界区域确定视频解码器中的DNN；基于对应于第一去块参数的所确定的DNN来对包括重构块中的样本子集的第一边界区域进行去块。This application also provides a video decoding method, comprising: reconstructing blocks of an image to be reconstructed from an encoded video bitstream; decoding first deblocking information in the encoded video bitstream, the first deblocking information including first deblocking parameters of a deep neural network (DNN) in a video decoder, the first deblocking parameters of the DNN being update parameters previously determined through a content-adaptive training process; determining a DNN in the video decoder for a first boundary region including a subset of samples in the reconstructed blocks based on the first deblocking parameters included in the first deblocking information; and deblocking the first boundary region including a subset of samples in the reconstructed blocks based on the determined DNN corresponding to the first deblocking parameters.

在实施方式中，重构块包括第一相邻重构块，第一相邻重构块具有第一共享边界并且包括在第一共享边界的两侧上的样本的第一边界区域。第一相邻重构块还包括在第一边界区域之外的非边界区域。用去块后的第一边界区域替换第一相邻重构块中的第一边界区域。In this implementation, the reconstructed block includes a first adjacent reconstructed block, which has a first shared boundary and includes a first boundary region of samples on both sides of the first shared boundary. The first adjacent reconstructed block also includes a non-boundary region outside the first boundary region. The first boundary region in the first adjacent reconstructed block is replaced with the first boundary region after deblocking.

在实施方式中，第一去块参数是DNN中的偏置项或权重系数。In the implementation, the first deblocking parameter is a bias term or weight coefficient in the DNN.

在实施方式中，DNN配置有初始参数。确定视频解码器中的DNN包括：基于第一去块参数更新初始参数中的一个初始参数。In this implementation, the DNN is configured with initial parameters. Determining the DNN in the video decoder includes updating one of the initial parameters based on the first deblocking parameters.

在实施方式中，第一去块信息指示第一去块参数与初始参数中的所述一个初始参数之间的差。视频解码方法还包括：根据该差和初始参数中的所述一个初始参数的和来确定第一去块参数。In one implementation, the first deblocking information indicates the difference between the first deblocking parameter and one of the initial parameters. The video decoding method further includes determining the first deblocking parameter based on the difference and the sum of the initial parameters.

在实施方式中，重构块包括第二相邻重构块，第二相邻重构块具有第二共享边界并且包括在第二共享边界的两侧上的样本的第二边界区域。视频解码方法还包括：对与第二边界区域对应的编码视频比特流中的第二去块信息进行解码，第二去块信息指示先前已经通过内容自适应训练过程确定的第二去块参数，第二边界区域可以不同于第一边界区域；基于第一去块参数和第二去块参数更新DNN，更新的DNN对应于第二边界区域，并且被配置有第一去块参数和第二去块参数以及，基于对应于第二边界区域的更新的DNN来对第二边界区域进行去块。In an implementation, the reconstructed block includes a second adjacent reconstructed block, the second adjacent reconstructed block having a second shared boundary and including a second boundary region of samples on both sides of the second shared boundary. The video decoding method further includes: decoding second deblocking information in the encoded video bitstream corresponding to the second boundary region, the second deblocking information indicating second deblocking parameters previously determined through a content-adaptive training process, the second boundary region being different from the first boundary region; updating a DNN based on the first and second deblocking parameters, the updated DNN corresponding to the second boundary region and configured with the first and second deblocking parameters; and deblocking the second boundary region based on the updated DNN corresponding to the second boundary region.

在实施方式中，重构块包括重构块的第二相邻重构块，第二相邻重构块具有第二共享边界并且包括具有在第二共享边界的两侧上的样本的第二边界区域。视频解码方法还包括：基于对应于第一边界区域的所确定的DNN来对第二边界区域进行去块。In one implementation, the reconstructed block includes a second adjacent reconstructed block, the second adjacent reconstructed block having a second shared boundary and including a second boundary region having samples on both sides of the second shared boundary. The video decoding method further includes deblocking the second boundary region based on a determined DNN corresponding to the first boundary region.

在实施方式中，DNN的层的数目取决于第一边界区域的大小。In this implementation, the number of layers in the DNN depends on the size of the first boundary region.

在实施方式中，第一边界区域还包括在第三共享边界的两侧上的样本，第三共享边界在包括在重构块中的第三相邻重构块之间。第一相邻重构块不同于第三相邻重构块。In one implementation, the first boundary region also includes samples on both sides of a third shared boundary, which lies between third adjacent reconstruction blocks included in the reconstruction block. The first adjacent reconstruction block is different from the third adjacent reconstruction block.

本申请实施例还提供了一种视频解码装置包括处理电路系统，包括：重构模块，用于对要从编码视频比特流重构的图像的块进行重构；第一解码模块，用于对编码视频比特流中的第一去块信息进行解码，该第一去块信息包括视频解码器中的深度神经网络(deepneural network，DNN)的第一去块参数，DNN的第一去块参数是先前已经通过内容自适应训练过程确定的更新参数；第一确定模块，用于基于包括在第一去块信息中的第一去块参数针对包括重构块中的样本子集的第一边界区域确定视频解码器中的DNN；第一去块模块，用于基于对应于第一去块参数的所确定的DNN来对包括重构块中的样本子集的第一边界区域进行去块。This application embodiment also provides a video decoding apparatus including a processing circuit system, comprising: a reconstruction module for reconstructing blocks of an image to be reconstructed from an encoded video bitstream; a first decoding module for decoding first deblocking information in the encoded video bitstream, the first deblocking information including first deblocking parameters of a deep neural network (DNN) in a video decoder, the first deblocking parameters of the DNN being update parameters previously determined through a content adaptive training process; a first determination module for determining a DNN in the video decoder for a first boundary region including a subset of samples in the reconstructed blocks based on the first deblocking parameters included in the first deblocking information; and a first deblocking module for deblocking the first boundary region including a subset of samples in the reconstructed blocks based on the determined DNN corresponding to the first deblocking parameters.

本申请实施例还还提供了一种电子设备，包括存储器和处理器，存储器存储有计算机可读指令，计算机可读指令被处理器运行时使得电子设备执行上述视频解码方法。This application also provides an electronic device, including a memory and a processor. The memory stores computer-readable instructions, which, when executed by the processor, cause the electronic device to perform the video decoding method described above.

本申请实施例还提供了一种存储程序的非暂态计算机可读存储介质，所述程序能够由至少一个处理器执行以执行上述视频解码方法。This application also provides a non-transitory computer-readable storage medium that stores a program, which can be executed by at least one processor to perform the video decoding method described above.

由此可见，本申请实施例提供了一种适用神经图像压缩(NIC)中内容自适应在线训练的视频编解码方法。在基于NN的视频编码框架中，不同的模块可以从输入到输出进行联合优化，以通过执行学习过程或训练过程改进最终目标，从而产生端到端优化的NIC。Therefore, this application provides a video encoding/decoding method for content-adaptive online training in neural image compression (NIC). In an NN-based video coding framework, different modules can be jointly optimized from input to output to improve the final goal through a learning or training process, thereby producing an end-to-end optimized NIC.

附图说明Attached Figure Description

根据以下详细描述和附图，所公开的主题的进一步特征、性质和各种优点将更加明显，在附图中：Further features, properties, and various advantages of the disclosed subject matter will become more apparent from the following detailed description and accompanying drawings, in which:

图1A是根据本申请一实施例的帧内预测模式的示例性子集的示意性图示。Figure 1A is a schematic illustration of an exemplary subset of intra-prediction modes according to an embodiment of this application.

图1B是根据本申请一实施例的示例性帧内预测方向的图示。Figure 1B is an illustration of an exemplary intra-frame prediction direction according to an embodiment of this application.

图2示出了根据本申请一实施例的当前块和周围样本。Figure 2 shows the current block and surrounding samples according to an embodiment of this application.

图3是根据本申请一实施例的通信系统的简化框图的示意性图示。Figure 3 is a schematic diagram of a simplified block diagram of a communication system according to an embodiment of the present application.

图4是根据本申请另一实施例的通信系统的简化框图的示意性图示。Figure 4 is a schematic illustration of a simplified block diagram of a communication system according to another embodiment of this application.

图5是根据本申请一实施例的解码器的简化框图的示意性图示。Figure 5 is a schematic illustration of a simplified block diagram of a decoder according to an embodiment of this application.

图6是根据本申请一实施例的编码器的简化框图的示意性图示。Figure 6 is a schematic diagram of a simplified block diagram of an encoder according to an embodiment of the present application.

图7示出了根据本申请另一实施例的编码器的框图。Figure 7 shows a block diagram of an encoder according to another embodiment of this application.

图8示出了根据本申请另一实施例的解码器的框图。Figure 8 shows a block diagram of a decoder according to another embodiment of this application.

图9A示出了根据本申请一实施例的分块图像编码的示例。Figure 9A shows an example of block image encoding according to an embodiment of this application.

图9B示出了根据本申请一实施例的示例性NIC框架。Figure 9B illustrates an exemplary NIC framework according to an embodiment of this application.

图10示出了根据本申请一实施例的主编码器网络的示例性卷积神经网络(convolution neural network，CNN)。Figure 10 illustrates an exemplary convolutional neural network (CNN) of a master encoder network according to an embodiment of this application.

图11示出了根据本申请一实施例的主解码器网络的示例性CNN。Figure 11 illustrates an exemplary CNN of a master decoder network according to an embodiment of this application.

图12示出了根据本申请一实施例的超编码器的示例性CNN。Figure 12 illustrates an exemplary CNN of a super encoder according to an embodiment of this application.

图13示出了根据本申请一实施例的超解码器的示例性CNN。Figure 13 illustrates an exemplary CNN of a super decoder according to an embodiment of this application.

图14示出了根据本申请一实施例的上下文模型网络的示例性CNN。Figure 14 illustrates an exemplary CNN of a context model network according to an embodiment of this application.

图15示出了根据本申请一实施例的熵参数网络的示例性CNN。Figure 15 illustrates an exemplary CNN of an entropy parameter network according to an embodiment of this application.

图16A示出了根据本申请一实施例的示例性视频编码器。Figure 16A illustrates an exemplary video encoder according to an embodiment of this application.

图16B示出了根据本申请一实施例的示例性视频解码器。Figure 16B illustrates an exemplary video decoder according to an embodiment of this application.

图17示出了根据本申请另一实施例的示例性视频编码器。Figure 17 illustrates an exemplary video encoder according to another embodiment of this application.

图18示出了根据本申请另一实施例的示例性视频解码器。Figure 18 illustrates an exemplary video decoder according to another embodiment of this application.

图19A至图19C示出了根据本申请实施例的示例性去块过程。Figures 19A to 19C illustrate exemplary deblocking processes according to embodiments of this application.

图20示出了根据本申请另一实施例的示例性去块过程。Figure 20 illustrates an exemplary deblocking process according to another embodiment of this application.

图21示出了根据本申请一实施例的基于多个去块模型的示例性去块过程。Figure 21 illustrates an exemplary deblocking process based on multiple deblocking models according to an embodiment of this application.

图22A至图22B示出了本申请其他实施例的示例性去块过程。Figures 22A and 22B illustrate exemplary deblocking processes of other embodiments of this application.

图23示出了概述根据本申请一实施例的编码过程的流程图。Figure 23 shows a flowchart outlining an encoding process according to an embodiment of this application.

图24示出了概述根据本申请一实施例的解码过程的流程图。Figure 24 shows a flowchart outlining a decoding process according to an embodiment of this application.

图25是根据本申请一实施例的计算机系统的示意图示。Figure 25 is a schematic diagram of a computer system according to an embodiment of the present application.

具体实施方式Detailed Implementation

参照图1A，右下方描绘的是从H.265的33个可能预测器方向(对应于35个帧内模式的33个角度模式)已知的九个预测器方向的子集。箭头相交的点(101)表示正被预测的样本。箭头表示对样本进行预测的方向。例如，箭头(102)指示根据右上方的与水平线成45度角的一个或多个样本对样本(101)进行预测。类似地，箭头(103)指示根据样本(101)左下方的与水平线成22.5度角的一个或多个样本对样本(101)进行预测。Referring to Figure 1A, the lower right corner depicts a subset of nine predictor directions known from the 33 possible predictor directions of H.265 (corresponding to 33 angular modes of 35 intra-frame modes). The point (101) where the arrows intersect represents the sample being predicted. The arrows indicate the direction in which the sample is predicted. For example, arrow (102) indicates that sample (101) is predicted based on one or more samples at a 45-degree angle to the horizontal line in the upper right. Similarly, arrow (103) indicates that sample (101) is predicted based on one or more samples at a 22.5-degree angle to the horizontal line in the lower left of sample (101).

仍然参照图1A，左上方描绘的是4×4个样本的正方形块(104)(由黑体虚线指示)。正方形块(104)包括16个样本，每个样本均用“S”、其在Y维度上的位置(例如，行索引)以及其在X维度上的位置(例如，列索引)来标记。例如，样本S21是Y维度上(从顶部起)的第二样本并且是X维度上(从左侧起)的第一样本。类似地，样本S44是在Y维度和X维度两者上块(104)中的第四个样本。由于块的大小是4×4个样本，因此S44在右下方。另外示出的是遵循类似的编号方案的参考样本。参考样本用R、其相对于块(104)的Y位置(例如，行索引)和X位置(列索引)来标记。在H.264和H.265二者中，预测样本与重构下的块相邻；因此不需要使用负值。Referring again to Figure 1A, the upper left corner depicts a 4×4 square block (104) of samples (indicated by a bold dashed line). This square block (104) comprises 16 samples, each labeled with “S”, its position in the Y dimension (e.g., row index), and its position in the X dimension (e.g., column index). For example, sample S21 is the second sample in the Y dimension (from top) and the first sample in the X dimension (from left). Similarly, sample S44 is the fourth sample in block (104) in both the Y and X dimensions. Since the block size is 4×4 samples, S44 is located in the lower right corner. Also shown are reference samples following a similar numbering scheme. Reference samples are labeled with R, their Y position (e.g., row index) and X position (column index) relative to block (104). In both H.264 and H.265, predicted samples are adjacent to the reconstructed block; therefore, negative values are not required.

帧内图片预测可以通过从沿着用信令通知的预测方向合适的相邻样本复制参考样本值来工作。例如，假设编码视频比特流包括信令，针对该块，该信令指示与箭头(102)一致的预测方向——即，根据与水平线成45度角的右上方的一个或多个预测样本来预测样本。在这种情况下，根据同一参考样本R05来预测样本S41、S32、S23和S14。然后，根据参考样本R08来预测样本S44。Intra-frame image prediction can work by copying reference sample values from appropriate neighboring samples along the prediction direction notified by signaling. For example, suppose the encoded video bitstream includes signaling for that block indicating a prediction direction consistent with arrow (102)—that is, predicting samples based on one or more prediction samples at a 45-degree angle to the upper right of the horizontal line. In this case, samples S41, S32, S23, and S14 are predicted based on the same reference sample R05. Then, sample S44 is predicted based on reference sample R08.

在某些情况下，可以例如，通过插值将多个参考样本的值进行组合以便计算参考样本；尤其是当方向不能以45度均匀可分割时。In some cases, for example, the values of multiple reference samples can be combined by interpolation to calculate the reference sample; especially when the orientation cannot be uniformly divided at 45 degrees.

随着视频编码技术的发展，可能的方向的数量也在增加。在H.264(2003年)中，可以表示九个不同的方向。在H.265(2013年)中，增加到33个，并且JEM/VVC/BMS在公开时可以支持多至65个方向。已经进行了实验来识别最可能的方向，并且熵编码中的某些技术被用于以少量的比特表示这些可能的方向，代价是较少的可能的方向。此外，有时可以根据在相邻的已解码的块中使用的相邻方向来预测方向本身。As video coding technologies have evolved, the number of possible directions has also increased. In H.264 (2003), nine different directions could be represented. In H.265 (2013), this increased to 33, and JEM/VVC/BMS, when publicly released, could support up to 65 directions. Experiments have been conducted to identify the most probable directions, and certain techniques in entropy coding have been used to represent these possible directions with a small number of bits, at the cost of fewer possible directions. Furthermore, the direction itself can sometimes be predicted based on adjacent directions used in adjacent decoded blocks.

图1B示出了示意图(110)，其描绘根据JEM的65个帧内预测方向以示出预测方向的数量随时间增加。Figure 1B shows a schematic diagram (110) depicting 65 intra-frame predicted directions based on JEM to illustrate the number of predicted directions increasing over time.

编码视频比特流中表示方向的帧内预测方向比特的映射可以根据不同的视频编码技术而不同；并且该映射的范围可以例如从预测方向的简单直接映射到帧内预测模式，码字，涉及最可能模式的复杂自适应方案以及类似技术。然而，在所有情况下，可能存在与某些其他方向相比统计上较不可能在视频内容中出现的某些方向。由于视频压缩的目标是减少冗余，因此在运转良好的视频编码技术中，与更可能的方向相比，那些较不可能的方向将由更大数量的比特来表示。The mapping of intra-predicted direction bits representing direction in a encoded video bitstream can vary depending on the video coding technique; and this mapping can range from, for example, a simple direct mapping of the predicted direction to intra-predicted modes, codewords, complex adaptive schemes involving the most probable modes, and similar techniques. However, in all cases, there may be certain directions that are statistically less likely to appear in the video content compared to some other directions. Since the goal of video compression is to reduce redundancy, in well-functioning video coding techniques, those less probable directions will be represented by a larger number of bits compared to the more probable directions.

运动补偿可以是有损压缩技术，并且可以涉及下述技术：在该技术中，来自先前重构的图片或其部分(参考图片)的样本数据的块，在由运动矢量(下文中为MV)指示的方向上空间移位之后，被用于预测重新重构的图片或图片部分。在一些情况下，参考图片可以与当前重构下的图片相同。MV可以具有两个维度X和Y，或者具有三个维度，第三个维度是使用中的参考图片的指示(间接地，第三个维度可以是时间维度)。Motion compensation can be a lossy compression technique and can involve techniques in which blocks of sample data from a previously reconstructed image or a portion thereof (the reference image), spatially shifted in a direction indicated by a motion vector (MV), are used to predict the reconstructed image or a portion thereof. In some cases, the reference image may be the same as the image under the current reconstruction. The MV may have two dimensions, X and Y, or three dimensions, the third dimension being an indication of the reference image in use (indirectly, the third dimension may be a temporal dimension).

在一些视频压缩技术中，可以根据其他MV预测适用于样本数据的特定区域的MV，例如根据与样本数据的与重构下的区域在空间上相邻的另一区域有关并且在解码顺序上在该MV之前的MV，来预测该MV。上述预测可以大幅减少对MV进行编码所需的数据量，从而消除冗余并且增加压缩。MV预测可以有效地工作，例如，原因是在对从摄像机得出的输入视频信号(称为自然视频)进行编码时，存在比可适用单个MV的区域更大的区域在相似方向上移动的统计上的可能性，并且因此在一些情况下可以使用从相邻区域的MV得出的相似运动矢量来预测所述更大的区域。这使得针对给定区域得到的MV与根据周围MV预测的MV相似或相同，并且又可以在熵编码之后以与在直接对MV进行编码的情况下将使用的比特相比更少数量的比特来表示MV。在一些情况下，MV预测可以是根据原始信号(即，样本流)得出的信号(即，MV)的无损压缩的示例。在其他情况下，MV预测本身可以是有损的，例如由于在根据若干周围MV计算预测器时的舍入误差而是有损的。In some video compression techniques, an MV applicable to a specific region of sample data can be predicted based on other MVs, for example, based on another MV that is spatially adjacent to the region in the reconstructed sample data and precedes that MV in the decoding order. This prediction can significantly reduce the amount of data required to encode the MV, thereby eliminating redundancy and increasing compression. MV prediction works effectively, for example, because when encoding the input video signal (called natural video) derived from the camera, there is a statistically likely area larger than the region to which a single MV is applicable has moved in similar directions, and therefore, in some cases, the larger area can be predicted using similar motion vectors derived from the MVs of adjacent regions. This makes the MV obtained for a given region similar to or the same as the MV predicted based on the surrounding MVs, and can again be represented by fewer bits after entropy encoding compared to the number of bits used in directly encoding the MV. In some cases, MV prediction can be an example of lossless compression of the signal (i.e., the MV) derived from the original signal (i.e., the sample stream). In other cases, MV prediction itself can be lossy, for example, due to rounding errors when calculating the predictor based on several surrounding MVs.

在H.265/HEVC(ITU-T H.265建议书，“High Efficiency Video Coding”，2016年12月)中描述了各种MV预测机制。在H.265提供的多种MV预测机制中，在此描述的是在下文中称为“空间合并”的技术。Various MV prediction mechanisms are described in H.265/HEVC (ITU-T H.265 Recommendation, “High Efficiency Video Coding”, December 2016). Among the various MV prediction mechanisms provided by H.265, the technique referred to here as “spatial merging” is described.

参照图2，当前块(201)包括能够根据已经空间移位的相同大小的先前块预测的在运动搜索过程期间由编码器得到的样本。代替直接对该MV进行编码，可以使用与用A0、A1和B0、B1、B2(分别对应202至206)表示的五个周围样本中的任一样本相关联的MV，从与一个或更多个参考图片相关联的例如从最近(在解码顺序上)的参考图片得出MV。在H.265中，MV预测可以使用来自相邻块正在使用的同一参考图片的预测器。Referring to Figure 2, the current block (201) includes samples obtained by the encoder during the motion search process, which can be predicted based on previous blocks of the same size that have been spatially shifted. Instead of directly encoding this MV, the MV can be derived from, for example, the most recent (in decoding order) reference image associated with one or more reference images, using the MV associated with any of five surrounding samples denoted by A0, A1 and B0, B1, B2 (corresponding to 202 to 206, respectively). In H.265, MV prediction can use a predictor from the same reference image being used by adjacent blocks.

图3示出了根据本公开内容的实施方式的通信系统(300)的简化框图。通信系统(300)包括可以经由例如网络(350)彼此通信的多个终端装置。例如，通信系统(300)包括经由网络(350)互连的第一对终端装置(310)和(320)。在图3的示例中，第一对终端装置(310)和(320)执行数据的单向传输。例如，终端装置(310)可以对视频数据(例如，由终端装置(310)捕获的视频图片流)进行编码，以供经由网络(350)传输至另一终端装置(320)。编码视频数据可以以一个或更多个编码视频比特流的形式传输。终端装置(320)可以从网络(350)接收编码视频数据，对编码视频数据进行解码以恢复视频图片，并且根据所恢复的视频数据显示视频图片。单向数据传输在媒体服务应用等中可以是常见的。Figure 3 shows a simplified block diagram of a communication system (300) according to an embodiment of the present disclosure. The communication system (300) includes a plurality of terminal devices that can communicate with each other via, for example, a network (350). For example, the communication system (300) includes a first pair of terminal devices (310) and (320) interconnected via the network (350). In the example of Figure 3, the first pair of terminal devices (310) and (320) perform unidirectional data transmission. For example, terminal device (310) may encode video data (e.g., a video image stream captured by terminal device (310)) for transmission via the network (350) to another terminal device (320). The encoded video data may be transmitted in the form of one or more encoded video bitstreams. Terminal device (320) may receive the encoded video data from the network (350), decode the encoded video data to recover the video images, and display the video images based on the recovered video data. Unidirectional data transmission can be common in media service applications, etc.

在另一示例中，通信系统(300)包括执行编码视频数据的双向传输的第二对终端装置(330)和(340)，该双向传输可能例如在视频会议期间发生。对于数据的双向传输，在示例中，终端装置(330)和(340)中的每个终端装置可以对视频数据(例如，由终端装置捕获的视频图片流)进行编码，以供经由网络(350)传输至终端装置(330)和(340)中的另一终端装置。终端装置(330)和(340)中的每个终端装置还可以接收由终端装置(330)和(340)中的另一终端装置传输的编码视频数据，并且可以对该编码视频数据进行解码以恢复视频图片，并且可以根据所恢复的视频数据在可访问的显示装置处显示视频图片。In another example, the communication system (300) includes a second pair of terminal devices (330) and (340) that perform bidirectional transmission of encoded video data, which may occur, for example, during a video conference. For the bidirectional transmission of data, in this example, each of the terminal devices (330) and (340) may encode video data (e.g., a stream of video images captured by the terminal device) for transmission via a network (350) to the other terminal device (330) and (340). Each of the terminal devices (330) and (340) may also receive encoded video data transmitted by the other terminal device (330) and (340), and may decode the encoded video data to recover video images, and may display the video images on an accessible display device based on the recovered video data.

在图3的示例中，终端装置(310)、(320)、(330)和(340)可以被示出为服务器、个人计算机和智能电话，但是本公开内容的原理可以不被这样限制。本公开内容的实施方式适用于膝上型计算机、平板电脑、媒体播放器和/或专用视频会议设备。网络(350)表示在终端装置(310)、(320)、(330)和(340)之间传送编码视频数据的任何数量的网络，包括例如有线(连线的)和/或无线通信网络。通信网络(350)可以在电路交换信道和/或分组交换信道中交换数据。代表性的网络包括电信网络、局域网、广域网和/或因特网。出于本讨论的目的，除非下面在本文中说明，否则网络(350)的架构和拓扑对于本公开内容的操作而言可能是无关紧要的。In the example of Figure 3, the terminal devices (310), (320), (330), and (340) may be shown as servers, personal computers, and smartphones, but the principles of this disclosure are not limited thereto. Embodiments of this disclosure are applicable to laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment. Network (350) refers to any number of networks that transmit encoded video data between the terminal devices (310), (320), (330), and (340), including, for example, wired (connected) and/or wireless communication networks. Communication networks (350) may exchange data in circuit-switched channels and/or packet-switched channels. Representative networks include telecommunications networks, local area networks (LANs), wide area networks (WANs), and/or the Internet. For the purposes of this discussion, the architecture and topology of the network (350) may be irrelevant to the operation of this disclosure unless explained herein.

作为所公开的主题的应用的示例，图4示出了视频编码器和视频解码器在流式传输环境中的放置。所公开的主题可以同样地适用于其他支持视频的应用，包括例如视频会议、数字TV、在包括CD、DVD、存储棒等的数字介质上存储压缩视频等等。As an example of the application of the disclosed subject matter, Figure 4 illustrates the placement of a video encoder and a video decoder in a streaming environment. The disclosed subject matter can also be applied to other video-enabled applications, including, for example, video conferencing, digital TV, storing compressed video on digital media including CDs, DVDs, memory sticks, etc.

流式传输系统可以包括捕获子系统(413)，该捕获子系统(413)可以包括创建例如未压缩的视频图片流(402)的视频源(401)，例如数字摄像机。在示例中，视频图片流(402)包括由数字摄像机拍摄的样本。视频图片流(402)被描绘为粗线以强调当与编码视频数据(404)(或编码视频比特流)比较时高的数据量，该视频图片流(402)可以由耦接至视频源(401)的包括视频编码器(403)的电子装置(420)进行处理。视频编码器(403)可以包括硬件、软件或其组合，以实现或实施如下更详细地描述的所公开的主题的各方面。编码视频数据(404)(或编码视频比特流)被描绘为细线以强调当与视频图片流(402)比较时较低的数据量，编码视频数据(404)(或编码视频比特流(404))可以存储在流式传输服务器(405)上以供将来使用。一个或更多个流式传输客户端子系统例如图4中的客户端子系统(406)和(408)可以访问流式传输服务器(405)以检索编码视频数据(404)的副本(407)和(409)。客户端子系统(406)可以包括例如电子装置(430)中的视频解码器(410)。视频解码器(410)对传入的编码视频数据的副本(407)进行解码，并且创建可以在显示器(412)(例如，显示屏)或另一呈现装置(未描绘)上呈现的传出的视频图片流(411)。在一些流式传输系统中，可以根据某些视频编码/压缩标准对编码视频数据(404)、(407)和(409)(例如，视频比特流)进行编码。这些标准的示例包括ITU-T H.265建议书。在示例中，开发中的视频编码标准被非正式地称为通用视频编码(Versatile Video Coding，VVC)。所公开的主题可以用于VVC的背景下。The streaming system may include a capture subsystem (413) that may include a video source (401), such as a digital camera, that creates, for example, an uncompressed video picture stream (402). In the example, the video picture stream (402) includes samples captured by the digital camera. The video picture stream (402) is depicted as a thick line to emphasize the high amount of data when compared to encoded video data (404) (or encoded video bitstream), which may be processed by an electronic device (420) including a video encoder (403) coupled to the video source (401). The video encoder (403) may include hardware, software, or a combination thereof to implement or carry out aspects of the disclosed subject matter as described in more detail below. Encoded video data (404) (or encoded video bitstream) is depicted as a thin line to emphasize its lower data volume when compared to the video picture stream (402). The encoded video data (404) (or encoded video bitstream (404)) may be stored on a streaming server (405) for future use. One or more streaming client subsystems, such as client subsystems (406) and (408) in Figure 4, may access the streaming server (405) to retrieve copies (407) and (409) of the encoded video data (404). Client subsystem (406) may include, for example, a video decoder (410) in an electronic device (430). The video decoder (410) decodes the incoming copy (407) of the encoded video data and creates an outgoing video picture stream (411) that can be displayed on a display (412) (e.g., a screen) or another presentation device (not depicted). In some streaming systems, encoded video data (404), (407), and (409) (e.g., video bitstreams) can be encoded according to certain video coding/compression standards. Examples of these standards include ITU-T Recommendation H.265. In this example, the video coding standard under development is informally referred to as Versatile Video Coding (VVC). The disclosed topics can be used in the context of VVC.

注意，电子装置(420)和(430)可以包括其他部件(未示出)。例如，电子装置(420)可以包括视频解码器(未示出)，并且电子装置(430)也可以包括视频编码器(未示出)。Note that electronic devices (420) and (430) may include other components (not shown). For example, electronic device (420) may include a video decoder (not shown), and electronic device (430) may also include a video encoder (not shown).

图5示出了根据本公开内容的实施方式的视频解码器(510)的框图。视频解码器(510)可以被包括在电子装置(530)中。电子装置(530)可以包括接收器(531)(例如，接收电路系统)。视频解码器(510)可以代替图4的示例中的视频解码器(410)使用。Figure 5 shows a block diagram of a video decoder (510) according to an embodiment of the present disclosure. The video decoder (510) may be included in an electronic device (530). The electronic device (530) may include a receiver (531) (e.g., a receiving circuitry system). The video decoder (510) may be used in place of the video decoder (410) in the example of Figure 4.

接收器(531)可以接收要由视频解码器(510)解码的一个或更多个编码视频序列；在同一实施方式或另一实施方式中，一次接收一个编码视频序列，其中，每个编码视频序列的解码独立于其他编码视频序列。可以从信道(501)接收编码视频序列，该信道(501)可以是到存储编码视频数据的存储装置的硬件/软件链路。接收器(531)可以接收编码视频数据以及其他数据，例如编码音频数据和/或辅助数据流，它们可以被转发至其各自的使用实体(未描绘)。接收器(531)可以将编码视频序列与其他数据分开。为了防止网络抖动，可以将缓冲存储器(515)耦接在接收器(531)与熵解码器/解析器(520)(此后称为“解析器(520)”)之间。在某些应用中，缓冲存储器(515)是视频解码器(510)的一部分。在其他应用中，缓冲存储器(515)可以在视频解码器(510)外部(未描绘)。在又一些其他应用中，在视频解码器(510)外部可以存在缓冲存储器(未描绘)以例如防止网络抖动，并且另外在视频解码器(510)内部可以存在另一缓冲存储器(515)以例如处理播出时序。当接收器(531)从具有足够带宽和可控性的存储/转发装置或从等时同步网络接收数据时，可能不需要缓冲存储器(515)，或者缓冲存储器(515)可以是小的。为了在诸如因特网的尽力型(best effort)分组网络上使用，可能需要缓冲存储器(515)，该缓冲存储器(515)可以是相对大的并且可以有利地具有自适应大小，并且可以至少部分地在视频解码器(510)外部的操作系统或类似元件(未描绘)中实现。The receiver (531) can receive one or more encoded video sequences to be decoded by the video decoder (510); in the same or another embodiment, one encoded video sequence is received at a time, wherein the decoding of each encoded video sequence is independent of the other encoded video sequences. The encoded video sequences can be received from a channel (501), which can be a hardware/software link to a storage device storing the encoded video data. The receiver (531) can receive encoded video data as well as other data, such as encoded audio data and/or auxiliary data streams, which can be forwarded to their respective user entities (not depicted). The receiver (531) can separate the encoded video sequences from other data. To prevent network jitter, a buffer memory (515) can be coupled between the receiver (531) and the entropy decoder/parser (520) (hereinafter referred to as "parser (520)"). In some applications, the buffer memory (515) is part of the video decoder (510). In other applications, the buffer memory (515) can be external to the video decoder (510) (not depicted). In some other applications, a buffer memory (not depicted) may exist outside the video decoder (510) to prevent network jitter, for example, and another buffer memory (515) may exist inside the video decoder (510) to handle broadcast timing, for example. The buffer memory (515) may not be needed when the receiver (531) receives data from a store/forward device with sufficient bandwidth and controllability or from an isochronous synchronization network, or the buffer memory (515) may be small. For use on best-effort packet networks such as the Internet, a buffer memory (515) may be required, which may be relatively large and advantageously have an adaptive size, and may be implemented at least partially outside the video decoder (510) in an operating system or similar component (not depicted).

视频解码器(510)可以包括解析器(520)以根据编码视频序列重构符号(521)。这些符号的类别包括：用于管理视频解码器(510)的操作的信息，以及可能地包括用于控制呈现装置诸如呈现装置(512)(例如，显示屏)的信息，该呈现装置(512)不是电子装置(530)的组成部分而是可以耦接至电子装置(530)，如图5中所示。呈现装置的控制信息可以呈补充增强信息(Supplemental Enhancement Information，SEI消息)或视频可用性信息(VideoUsability Information，VUI)参数集片段(未描绘)的形式。解析器(520)可以对接收到的编码视频序列进行解析/熵解码。编码视频序列的编码可以符合视频编码技术或标准，并且可以遵循各种原理，包括可变长度编码、霍夫曼编码、具有或不具有上下文敏感性的算术编码等。解析器(520)可以基于与组相对应的至少一个参数，从编码视频序列中提取针对视频解码器中的像素子组中的至少一个子组的子组参数集。子组可以包括：图片组(Group ofPictures，GOP)、图片、图块、切片、宏块、编码单元(Coding Unit，CU)、块、变换单元(Transform Unit，TU)、预测单元(Prediction Unit，PU)等。解析器(520)还可以从编码视频序列中提取诸如变换系数、量化器参数值、运动矢量等的信息。The video decoder (510) may include a parser (520) to reconstruct symbols (521) from the encoded video sequence. These symbols may include information for managing the operation of the video decoder (510), and possibly information for controlling a presentation device such as a presentation device (512) (e.g., a display screen), which is not part of the electronic device (530) but may be coupled to it, as shown in FIG. 5. The control information for the presentation device may be in the form of Supplemental Enhancement Information (SEI) messages or Video Usability Information (VUI) parameter set fragments (not depicted). The parser (520) may perform parsing/entropy decoding on the received encoded video sequence. The encoding of the encoded video sequence may conform to video coding techniques or standards and may follow various principles, including variable-length coding, Huffman coding, arithmetic coding with or without context sensitivity, etc. The parser (520) can extract a subgroup parameter set from the encoded video sequence for at least one subgroup of pixel subgroups in the video decoder, based on at least one parameter corresponding to the group. Subgroups may include: Group of Pictures (GOP), pictures, tiles, slices, macroblocks, coding units (CU), blocks, transform units (TU), prediction units (PU), etc. The parser (520) can also extract information such as transform coefficients, quantizer parameter values, motion vectors, etc., from the encoded video sequence.

解析器(520)可以对从缓冲存储器(515)接收的视频序列执行熵解码/解析操作，以创建符号(521)。The parser (520) can perform entropy decoding/parsing operations on the video sequence received from the buffer memory (515) to create symbols (521).

符号(521)的重构可以根据编码视频图片或其部分的类型(例如：帧间图片和帧内图片、帧间块和帧内块)以及其他因素而涉及多个不同的单元。涉及哪些单元以及涉及方式可以通过由解析器(520)从编码视频序列解析的子组控制信息控制。为了清楚起见，未描绘解析器(520)与下面的多个单元之间的这样的子组控制信息流。The reconstruction of the symbol (521) may involve multiple different units depending on the type of the encoded video picture or its parts (e.g., inter-frame picture and intra-frame picture, inter-frame block and intra-frame block) and other factors. Which units are involved and how they are involved can be controlled by subgroup control information parsed from the encoded video sequence by the parser (520). For clarity, such subgroup control information flow between the parser (520) and the following multiple units is not depicted.

除已经提到的功能块以外，视频解码器(510)可以在概念上被细分为如下所述的多个功能单元。在商业约束下操作的实际实现中，这些单元中的许多单元彼此紧密交互并且可以至少部分地彼此集成。然而，出于描述所公开的主题的目的，在概念上细分为下面的功能单元是适当的。In addition to the functional blocks already mentioned, the video decoder (510) can be conceptually subdivided into several functional units as described below. In a practical implementation operating under commercial constraints, many of these units interact closely with each other and can be at least partially integrated with each other. However, for the purposes of describing the disclosed subject matter, it is appropriate to conceptually subdivide it into the functional units described below.

第一单元是缩放器/逆变换单元(551)。缩放器/逆变换单元(551)从解析器(520)接收经量化的变换系数以及控制信息(包括要使用哪种变换、块大小、量化因子、量化缩放矩阵等)作为(一个或多个)符号(521)。缩放器/逆变换单元(551)可以输出包括样本值的块，所述块可以被输入到聚合器(555)中。The first unit is the scaler/inverse transform unit (551). The scaler/inverse transform unit (551) receives quantized transform coefficients and control information (including which transform to use, block size, quantization factor, quantization scaling matrix, etc.) as one or more symbols (521) from the parser (520). The scaler/inverse transform unit (551) can output a block containing sample values, which can be input into the aggregator (555).

在一些情况下，缩放器/逆变换单元(551)的输出样本可以属于帧内编码块；即：不使用来自先前重构的图片的预测性信息但可以使用来自当前图片的先前重构部分的预测性信息的块。这样的预测性信息可以由帧内图片预测单元(552)提供。在一些情况下，帧内图片预测单元(552)使用从当前图片缓冲器(558)获取的周围已重构信息生成大小和形状与重构中的块相同的块。例如，当前图片缓冲器(558)缓冲部分重构的当前图片和/或完全重构的当前图片。在一些情况下，聚合器(555)基于每个样本将帧内预测单元(552)已经生成的预测信息添加至如由缩放器/逆变换单元(551)提供的输出样本信息。In some cases, the output samples of the scaler/inverse transform unit (551) may belong to intra-coded blocks; that is, blocks that do not use predictive information from previously reconstructed images but can use predictive information from previously reconstructed portions of the current image. Such predictive information may be provided by the intra-picture prediction unit (552). In some cases, the intra-picture prediction unit (552) generates blocks of the same size and shape as the blocks in the reconstruction using surrounding reconstructed information obtained from the current picture buffer (558). For example, the current picture buffer (558) buffers partially reconstructed current images and/or fully reconstructed current images. In some cases, the aggregator (555) adds the predictive information already generated by the intra-picture prediction unit (552) to the output sample information provided by the scaler/inverse transform unit (551) based on each sample.

在其他情况下，缩放器/逆变换单元(551)的输出样本可以属于帧间编码并且可能经运动补偿的块。在这样的情况下，运动补偿预测单元(553)可以访问参考图片存储器(557)以获取用于预测的样本。在根据属于块的符号(521)对所获取的样本进行运动补偿之后，这些样本可以由聚合器(555)添加至缩放器/逆变换单元(551)的输出(在这种情况下被称作残差样本或残差信号)，从而生成输出样本信息。可以通过运动矢量来控制运动补偿预测单元(553)从其获取预测样本的参考图片存储器(557)内的地址，所述运动矢量可以以符号(521)的形式被运动补偿预测单元(553)获得，所述符号(521)可以具有例如X分量、Y分量和参考图片分量。运动补偿还可以包括当使用子样本精确运动矢量时对从参考图片存储器(557)中获取的样本值的插值、运动矢量预测机制等。In other cases, the output samples of the scaler/inverse transform unit (551) may belong to inter-frame coded and possibly motion-compensated blocks. In such cases, the motion compensation prediction unit (553) can access the reference image memory (557) to obtain samples for prediction. After motion compensation of the obtained samples according to the symbols (521) belonging to the block, these samples can be added by the aggregator (555) to the output of the scaler/inverse transform unit (551) (referred to in this case as residual samples or residual signals) to generate output sample information. The address in the reference image memory (557) from which the motion compensation prediction unit (553) obtains the predicted samples can be controlled by motion vectors, which can be obtained by the motion compensation prediction unit (553) in the form of symbols (521), which may have, for example, X components, Y components, and reference image components. Motion compensation may also include interpolation of sample values obtained from the reference image memory (557) when using subsample precise motion vectors, motion vector prediction mechanisms, etc.

聚合器(555)的输出样本可以在环路滤波器单元(556)中经受各种环路滤波技术。视频压缩技术可以包括环路内滤波器技术，所述环路内滤波器技术受控于编码视频序列(也被称为编码视频比特流)中包括的参数，所述参数可以作为来自解析器(520)的符号(521)被环路滤波器单元(556)获得，但是环路内滤波器技术也可以对在对编码图片或编码视频序列的(按解码顺序的)先前部分进行解码期间获得的元信息进行响应，以及对先前重构并且经环路滤波的样本值进行响应。The output samples of the aggregator (555) can undergo various loop filtering techniques in the loop filter unit (556). The video compression technique may include an in-loop filtering technique controlled by parameters included in the encoded video sequence (also known as the encoded video bitstream), which may be obtained by the loop filter unit (556) as symbols (521) from the parser (520). However, the in-loop filtering technique may also respond to metadata obtained during the decoding of previous portions of the encoded picture or encoded video sequence (in the decoding order), as well as to previously reconstructed and loop-filtered sample values.

环路滤波器单元(556)的输出可以是样本流，该样本流可以被输出至呈现装置(512)以及被存储在参考图片存储器(557)中以供在将来的帧间图片预测中使用。The output of the loop filter unit (556) can be a sample stream, which can be output to the presentation device (512) and stored in the reference image memory (557) for use in future inter-frame image prediction.

一旦被完全重构，某些编码图片就可以被用作参考图片以供在将来预测中使用。例如，一旦与当前图片对应的编码图片被完全重构，并且该编码图片(通过例如解析器(520))被识别为参考图片，则当前图片缓冲器(558)可以变为参考图片存储器(557)的一部分，并且可以在开始重构随后的编码图片之前重新分配新的当前图片缓冲器。Once fully reconstructed, certain encoded images can be used as reference images for future predictions. For example, once the encoded image corresponding to the current image has been fully reconstructed and that encoded image (by, for example, the parser (520)) is identified as a reference image, the current image buffer (558) can become part of the reference image memory (557), and a new current image buffer can be reallocated before reconstructing subsequent encoded images begins.

视频解码器(510)可以根据诸如ITU-T H.265建议书的标准中的预定的视频压缩技术执行解码操作。在编码视频序列遵循视频压缩技术或标准的语法以及视频压缩技术或标准中记录的配置文件两者的意义上，编码视频序列可以符合由所使用的视频压缩技术或标准指定的语法。具体地，配置文件可以从视频压缩技术或标准中可用的所有工具中选择某些工具作为仅在该配置文件下可供使用的工具。对于合规性，还需要编码视频序列的复杂度处于由视频压缩技术或标准的级别限定的界限内。在一些情况下，级别限制最大图片大小、最大帧速率、最大重构样本率(以例如每秒百万样本为单位进行测量)、最大参考图片大小等。在一些情况下，由级别设置的限制可以通过假想参考解码器(HypotheticalReference Decoder，HRD)规范以及在编码视频序列中用信令通知的HRD缓冲器管理的元数据来进一步限定。The video decoder (510) can perform decoding operations according to a predetermined video compression technique as specified in a standard such as ITU-T Recommendation H.265. The encoded video sequence can conform to the syntax specified by the video compression technique or standard used, in the sense that the encoded video sequence follows the syntax of the video compression technique or standard and in the sense that the configuration file recorded in the video compression technique or standard is available. Specifically, the configuration file can select certain tools from all available tools in the video compression technique or standard as tools available only under that configuration file. For compliance, the complexity of the encoded video sequence also needs to be within the limits defined by the level of the video compression technique or standard. In some cases, the level limits the maximum image size, maximum frame rate, maximum reconstruction sample rate (measured, for example, in megapixels per second), maximum reference image size, etc. In some cases, the limitations set by the level can be further limited by the Hypothetical Reference Decoder (HRD) specification and the metadata managed by the HRD buffer, which is notified by signaling in the encoded video sequence.

在实施方式中，接收器(531)可以连同编码视频一起接收附加(冗余)数据。附加数据可以被包括为(一个或多个)编码视频序列的一部分。附加数据可以由视频解码器(510)使用以对数据进行适当解码以及/或者更准确地重构原始视频数据。附加数据可以呈例如时间、空间或信噪比(signal noise ratio，SNR)增强层、冗余切片、冗余图片、前向纠错码等形式。In this implementation, the receiver (531) may receive additional (redundant) data along with the encoded video. The additional data may be included as part of one or more encoded video sequences. The additional data may be used by the video decoder (510) to properly decode the data and/or more accurately reconstruct the original video data. The additional data may take the form of, for example, temporal, spatial, or signal-to-noise ratio (SNR) enhancement layers, redundant slices, redundant images, forward error correction codes, etc.

图6示出了根据本公开内容的实施方式的视频编码器(603)的框图。视频编码器(603)被包括在电子装置(620)中。电子装置(620)包括传输器(640)(例如，传输电路系统)。视频编码器(603)可以代替图4的示例中的视频编码器(403)使用。Figure 6 shows a block diagram of a video encoder (603) according to an embodiment of the present disclosure. The video encoder (603) is included in an electronic device (620). The electronic device (620) includes a transmitter (640) (e.g., a transmission circuit system). The video encoder (603) can be used in place of the video encoder (403) in the example of Figure 4.

视频编码器(603)可以从视频源(601)(并非图6的示例中的电子装置(620)的一部分)接收视频样本，视频源(601)可以捕获要由视频编码器(603)进行编码的(一个或多个)视频图像。在另一示例中，视频源(601)是电子装置(620)的一部分。The video encoder (603) can receive video samples from a video source (601) (not part of the electronic device (620) in the example of Figure 6), which can capture one or more video images to be encoded by the video encoder (603). In another example, the video source (601) is part of the electronic device (620).

视频源(601)可以提供要由视频编码器(603)进行编码的呈数字视频样本流形式的源视频序列，该数字视频样本流可以具有任何合适的比特深度(例如：8比特、10比特、12比特……)、任何色彩空间(例如，BT.601Y CrCB、RGB……)和任何合适的采样结构(例如YCrCb 4:2:0、Y CrCb4:4:4)。在媒体服务系统中，视频源(601)可以是存储先前准备的视频的存储装置。在视频会议系统中，视频源(601)可以是捕获本地图像信息作为视频序列的摄像装置。视频数据可以被提供为当按顺序观看时被赋予运动的多个单独的图片。图片本身可以被组织为空间像素阵列，其中，取决于所用的采样结构、色彩空间等，每个像素可以包括一个或更多个样本。本领域技术人员可以容易理解像素与样本之间的关系。以下描述集中于样本。The video source (601) can provide a sequence of source video in the form of a digital video sample stream to be encoded by the video encoder (603). This digital video sample stream can have any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit…), any color space (e.g., BT.601YCrCb, RGB…), and any suitable sampling structure (e.g., YCrCb 4:2:0, YCrCb 4:4:4). In a media service system, the video source (601) can be a storage device storing previously prepared video. In a video conferencing system, the video source (601) can be a camera device capturing local image information as a video sequence. The video data can be provided as multiple individual pictures given motion when viewed sequentially. The pictures themselves can be organized as spatial pixel arrays, where each pixel can include one or more samples, depending on the sampling structure, color space, etc., used. Those skilled in the art will readily understand the relationship between pixels and samples. The following description focuses on samples.

根据实施方式，视频编码器(603)可以实时地或者在应用所需的任何其他时间约束下将源视频序列的图片编码并压缩成编码视频序列(643)。施行适当的编码速度是控制器(650)的一个功能。在一些实施方式中，控制器(650)控制如下所述的其他功能单元并且在功能上耦接至所述其他功能单元。为简洁起见未描绘耦接。由控制器(650)设置的参数可以包括速率控制相关参数(图片跳过、量化器、率失真优化技术的λ值……)、图片大小、图片组(GOP)布局、最大运动矢量搜索范围等。控制器(650)可以被配置成具有其他合适的功能，这些功能属于针对特定系统设计优化的视频编码器(603)。According to the implementation, the video encoder (603) can encode and compress images of the source video sequence into an encoded video sequence (643) in real time or under any other time constraints required by the application. Implementing an appropriate encoding rate is a function of the controller (650). In some implementations, the controller (650) controls and is functionally coupled to other functional units as described below. Coupling is not depicted for simplicity. Parameters set by the controller (650) may include rate control-related parameters (image skipping, quantizer, λ value of rate-distortion optimization techniques, etc.), image size, group of pictures (GOP) layout, maximum motion vector search range, etc. The controller (650) can be configured to have other suitable functions belonging to the video encoder (603) optimized for a specific system design.

在一些实施方式中，视频编码器(603)被配置成在编码环路中进行操作。作为极度简化的描述，在示例中，编码环路可以包括源编码器(630)(例如，负责基于要编码的输入图片和(一个或多个)参考图片创建符号，例如符号流)和嵌入在视频编码器(603)中的(本地)解码器(633)。解码器(633)以与(远程)解码器创建样本数据的方式类似的方式重构符号以创建样本数据(因为在所公开的主题中所考虑的视频压缩技术中，符号与编码视频比特流之间的任何压缩是无损的)。重构的样本流(样本数据)被输入到参考图片存储器(634)。由于符号流的解码产生与解码器位置(本地或远程)无关的比特精确结果，因此参考图片存储器(634)中的内容在本地编码器与远程编码器之间也是比特精确的。换句话说，编码器的预测部分将与解码器在解码期间使用预测时所“看到”的样本值完全相同的样本值“视为”参考图片样本。参考图片同步性的这种基本原理(以及在例如由于信道误差而无法维持同步性的情况下产生的偏移)也用于一些相关技术。In some implementations, the video encoder (603) is configured to operate within an encoding loop. As a highly simplified description, in this example, the encoding loop may include a source encoder (630) (e.g., responsible for creating symbols, such as a symbol stream, based on the input picture to be encoded and one or more reference pictures) and a (local) decoder (633) embedded within the video encoder (603). The decoder (633) reconstructs the symbols to create sample data in a manner similar to how the (remote) decoder creates sample data (since any compression between the symbols and the encoded video bitstream is lossless in the video compression techniques considered in the disclosed subject matter). The reconstructed sample stream (sample data) is input to a reference picture memory (634). Since decoding of the symbol stream produces bit-accurate results regardless of the decoder's location (local or remote), the contents of the reference picture memory (634) are also bit-accurate between the local and remote encoders. In other words, the encoder's prediction portion "treats" the reference picture samples as if they were the exact same sample values that the decoder "sees" during prediction. This fundamental principle of image synchronization (and the offset that occurs when synchronization cannot be maintained, for example, due to channel errors) is also used in some related techniques.

“本地”解码器(633)的操作可以与“远程”解码器例如已经在上面结合图5详细描述的视频解码器(510)的操作相同。然而，另外简要地参照图5，由于符号可用并且由熵编码器(645)将符号编码成编码视频序列以及由解析器(520)对符号进行解码可以是无损的，因此可以不在本地解码器(633)中完全实现视频解码器(510)的包括缓冲存储器(515)和解析器(520)的熵解码部分。The operation of the “local” decoder (633) can be the same as that of the “remote” decoder, such as the video decoder (510) which has been described in detail above in conjunction with FIG5. However, referring briefly to FIG5 again, since symbols are available and the encoding of symbols into a encoded video sequence by the entropy encoder (645) and the decoding of symbols by the parser (520) can be lossless, the entropy decoding portion of the video decoder (510), including the buffer memory (515) and the parser (520), can be completely omitted in the local decoder (633).

在实施方式中，除了存在于解码器中的解析/熵解码之外的任何解码器技术以相同或基本上相同的功能形式存在于对应的编码器中。因此，所公开的主题集中于解码器操作。可以简化编码器技术的描述，因为编码器技术与全面地描述的解码器技术相反。在某些方面，在下文提供更详细的描述。In the implementation, any decoder technique other than parsing/entropy decoding present in the decoder exists in the corresponding encoder with the same or substantially the same functional form. Therefore, the disclosed subject matter focuses on decoder operation. The description of encoder techniques can be simplified because encoder techniques are contrasted with the fully described decoder techniques. In some aspects, a more detailed description is provided below.

在一些示例中，在操作期间，源编码器(630)可以执行运动补偿预测编码，所述运动补偿预测编码参考来自视频序列的被指定为“参考图片”的一个或更多个先前编码图片来对输入图片进行预测性编码。以此方式，编码引擎(632)对输入图片的像素块与可以被选作该输入图片的(一个或多个)预测参考的(一个或多个)参考图片的像素块之间的差异进行编码。In some examples, during operation, the source encoder (630) may perform motion-compensated predictive coding, which predictively codes the input image with reference to one or more previously encoded images designated as "reference images" from the video sequence. In this way, the encoding engine (632) encodes the differences between pixel blocks of the input image and pixel blocks of one or more reference images (which may be selected as prediction references for the input image(s)).

本地视频解码器(633)可以基于由源编码器(630)创建的符号，对可以被指定为参考图片的图片的编码视频数据进行解码。编码引擎(632)的操作可以有利地为有损处理。当编码视频数据可以在视频解码器(图6中未示出)处被解码时，重构的视频序列通常可以是源视频序列的带有一些误差的副本。本地视频解码器(633)复制可以由视频解码器对参考图片执行的解码处理，并且可以使重构的参考图片存储在参考图片存储器(634)中。以此方式，视频编码器(603)可以在本地存储重构的参考图片的副本，所述副本与将由远端视频解码器获得的重构参考图片具有共同内容(不存在传输误差)。The local video decoder (633) can decode the encoded video data of a picture that can be designated as a reference picture based on symbols created by the source encoder (630). The operation of the encoding engine (632) can advantageously be lossy. When the encoded video data can be decoded at the video decoder (not shown in FIG. 6), the reconstructed video sequence can typically be a copy of the source video sequence with some errors. The local video decoder (633) replicates the decoding processing that can be performed on the reference picture by the video decoder, and can store the reconstructed reference picture in the reference picture memory (634). In this way, the video encoder (603) can locally store a copy of the reconstructed reference picture that shares common content with the reconstructed reference picture that will be obtained by the remote video decoder (no transmission errors).

预测器(635)可以针对编码引擎(632)执行预测搜索。也就是说，对于要被编码的新图片，预测器(635)可以在参考图片存储器(634)中搜索可以用作针对新图片的合适预测参考的样本数据(作为候选参考像素块)或特定元数据，例如参考图片运动矢量、块形状等。预测器(635)可以在逐样本块-像素块的基础上操作，以找到合适的预测参考。在一些情况下，如由通过预测器(635)获得的搜索结果所确定的，输入图片可以具有从参考图片存储器(634)中存储的多个参考图片取得的预测参考。The predictor (635) can perform a prediction search against the encoding engine (632). That is, for a new image to be encoded, the predictor (635) can search in the reference image memory (634) for sample data (as candidate reference pixel blocks) or specific metadata, such as reference image motion vectors, block shapes, etc., that can be used as suitable prediction references for the new image. The predictor (635) can operate on a sample-by-sample block-by-pixel basis to find suitable prediction references. In some cases, as determined by the search results obtained by the predictor (635), the input image may have prediction references obtained from multiple reference images stored in the reference image memory (634).

控制器(650)可以管理源编码器(630)的编码操作，包括例如设置用于对视频数据进行编码的参数和子组参数。The controller (650) can manage the encoding operations of the source encoder (630), including, for example, setting parameters and subgroup parameters for encoding video data.

所有以上提及的功能单元的输出可以在熵编码器(645)中经受熵编码。熵编码器(645)通过根据诸如霍夫曼编码、可变长度编码、算术编码等的技术对由各种功能单元生成的符号进行无损压缩来将这些符号转换为编码视频序列。The outputs of all the functional units mentioned above can be entropy encoded in the entropy encoder (645). The entropy encoder (645) converts these symbols into a coded video sequence by performing lossless compression on the symbols generated by the various functional units according to techniques such as Huffman coding, variable-length coding, arithmetic coding, etc.

传输器(640)可以缓冲由熵编码器(645)创建的(一个或多个)编码视频序列，从而为经由通信信道(660)进行传输做准备，该通信信道(660)可以是到存储编码视频数据的存储装置的硬件/软件链路。传输器(640)可以将来自视频编码器(603)的编码视频数据与要传输的其他数据合并，所述其他数据例如是编码音频数据和/或辅助数据流(未示出来源)。The transmitter (640) can buffer one or more encoded video sequences created by the entropy encoder (645) in preparation for transmission via a communication channel (660), which may be a hardware/software link to a storage device storing the encoded video data. The transmitter (640) can combine the encoded video data from the video encoder (603) with other data to be transmitted, such as encoded audio data and/or auxiliary data streams (source not shown).

控制器(650)可以管理视频编码器(603)的操作。在编码期间，控制器(650)可以向每个编码图片分配某一编码图片类型，这可能影响可以应用于相应的图片的编码技术。例如，通常可以向图片分配以下图片类型之一：The controller (650) can manage the operation of the video encoder (603). During encoding, the controller (650) can assign a specific encoding image type to each encoded image, which may affect the encoding techniques that can be applied to the corresponding image. For example, one of the following image types can typically be assigned to an image:

帧内图片(I图片)，其可以是可以在不将序列中的任何其他图片用作预测源的情况下进行编码和解码的图片。一些视频编解码器容许不同类型的帧内图片，包括例如独立解码器刷新(Independent Decoder Refresh，Intra-frame pictures (I-pictures) are pictures that can be encoded and decoded without using any other pictures in the sequence as a prediction source. Some video codecs allow different types of intra-frame pictures, including, for example, Independent Decoder Refresh.

“IDR”)图片。本领域技术人员了解I图片的那些变型及其相应的应用和特征。"IDR" images. Those skilled in the art will understand the variations of I images and their corresponding applications and characteristics.

预测性图片(P图片)，其可以是可以使用利用至多一个运动矢量和参考索引来预测每个块的样本值的帧间预测或帧内预测进行编码和解码的图片。Predictive images (P-images) can be images that are encoded and decoded using inter-frame or intra-frame prediction that uses at most one motion vector and a reference index to predict the sample values of each block.

双向预测性图片(B图片)，其可以是可以使用利用至多两个运动矢量和参考索引来预测每个块的样本值的帧间预测或帧内预测进行编码和解码的图片。类似地，多个预测性图片可以使用多于两个参考图片和相关联元数据以用于单个块的重构。A bidirectional predictive picture (B-picture) can be an image that can be encoded and decoded using inter-frame or intra-frame prediction that uses at most two motion vectors and a reference index to predict sample values for each block. Similarly, multiple predictive pictures can use more than two reference pictures and associated metadata for the reconstruction of a single block.

源图片通常可以在空间上细分成多个样本块(例如，分别为4×4、8×8、4×8或16×16样本的块)，并且逐块地被编码。可以参考其他(编码的)块对这些块进行预测性编码，所述其他块是通过应用于块的相应图片的编码分配而确定的。例如，可以对I图片的块进行非预测性编码，或者可以参考同一图片的编码块对I图片的块进行预测性编码(空间预测或帧内预测)。可以参考一个先前编码的参考图片经由空间预测或经由时间预测对P图片的像素块进行预测性编码。可以参考一个或两个先前编码的参考图片经由空间预测或经由时间预测对B图片的块进行预测性编码。The source image can typically be spatially subdivided into multiple sample blocks (e.g., blocks of 4×4, 8×8, 4×8, or 16×16 samples), and encoded block by block. These blocks can be predictively encoded with reference to other (encoded) blocks, which are determined by the coding assignments of the corresponding images applied to the blocks. For example, blocks of image I can be non-predictively encoded, or blocks of image I can be predictively encoded (spatial prediction or intra-frame prediction) with reference to coded blocks of the same image. Pixel blocks of image P can be predictively encoded with reference to a previously encoded reference image via spatial prediction or via temporal prediction. Blocks of image B can be predictively encoded with reference to one or two previously encoded reference images via spatial prediction or via temporal prediction.

视频编码器(603)可以根据诸如ITU-T H.265建议书的预定的视频编码技术或标准执行编码操作。在其操作中，视频编码器(603)可以执行各种压缩操作，包括利用输入视频序列中的时间和空间冗余的预测性编码操作。因此，编码视频数据可以符合由所使用的视频编码技术或标准指定的语法。The video encoder (603) can perform encoding operations according to a predetermined video coding technique or standard, such as ITU-T H.265 Recommendation. In its operation, the video encoder (603) can perform various compression operations, including predictive coding operations utilizing temporal and spatial redundancy in the input video sequence. Therefore, the encoded video data can conform to the syntax specified by the video coding technique or standard used.

在实施方式中，传输器(640)可以连同编码视频一起传输附加数据。源编码器(630)可以包括这样的数据作为编码视频序列的一部分。附加数据可以包括时间/空间/SNR增强层、其他形式的冗余数据例如冗余图片和切片、SEI消息、VUI参数集片段等。In this implementation, the transmitter (640) may transmit additional data along with the encoded video. The source encoder (630) may include such data as part of the encoded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, other forms of redundant data such as redundant images and slices, SEI messages, VUI parameter set fragments, etc.

视频可以按时间序列被捕获为多个源图片(视频图片)。帧内图片预测(通常被简化为帧内预测)利用给定图片中的空间相关性，而帧间图片预测利用图片之间的(时间或其他)相关性。在示例中，正在被编码/解码的特定图片(其被称为当前图片)被分割成块。在当前图片中的块类似于视频中先前编码且又被缓冲的参考图片中的参考块的情况下，可以通过被称作运动矢量的矢量对当前图片中的块进行编码。运动矢量指向参考图片中的参考块，并且在使用多个参考图片的情况下，运动矢量可以具有标识参考图片的第三维度。Video can be captured in a time-series manner as multiple source images (video frames). Intra-frame image prediction (often simplified to intra-frame prediction) utilizes spatial correlations within a given image, while inter-frame image prediction utilizes (temporal or other) correlations between images. In the example, the specific image being encoded/decoded (referred to as the current image) is segmented into blocks. Where a block in the current image resembles a reference block in a previously encoded and buffered reference image in the video, the block in the current image can be encoded using a vector called a motion vector. The motion vector points to the reference block in the reference image, and when using multiple reference images, the motion vector can have a third dimension that identifies the reference images.

在一些实施方式中，可以将双向预测技术用于帧间图片预测。根据双向预测技术，使用两个参考图片，例如按解码次序均在视频中的当前图片之前(但按显示次序可能分别在过去和将来)的第一参考图片和第二参考图片。可以通过指向第一参考图片中的第一参考块的第一运动矢量和指向第二参考图片中的第二参考块的第二运动矢量对当前图片中的块进行编码。可以通过第一参考块和第二参考块的组合来预测所述块。In some implementations, bidirectional prediction techniques can be used for inter-frame image prediction. According to bidirectional prediction, two reference images are used, such as a first reference image and a second reference image that both precede the current image in the video in decoding order (but may be in the past and future in display order, respectively). A block in the current image can be encoded using a first motion vector pointing to a first reference block in the first reference image and a second motion vector pointing to a second reference block in the second reference image. The block can be predicted using a combination of the first and second reference blocks.

此外，可以在帧间图片预测中使用合并模式技术以提高编码效率。In addition, merging mode techniques can be used in inter-frame image prediction to improve coding efficiency.

根据本公开内容的一些实施方式，以块为单位执行诸如帧间图片预测和帧内图片预测的预测。例如，根据HEVC标准，将视频图片序列中的图片分割成编码树单元(codingtree unit，CTU)以便压缩，图片中的CTU具有相同大小，例如64×64像素、32×32像素或16×16像素。一般来说，CTU包括三个编码树块(coding tree block，CTB)，即一个亮度CTB和两个色度CTB。每个CTU可以被递归地以四叉树分成一个或多个编码单元(coding unit，CU)。例如，可以将64×64像素的CTU分成一个64×64像素的CU，或4个32×32像素的CU，或16个16×16像素的CU。在示例中，对每个CU进行分析以确定针对该CU的预测类型，例如帧间预测类型或帧内预测类型。取决于时间和/或空间可预测性，将CU分成一个或更多个预测单元(prediction unit，PU)。通常，每个PU包括亮度预测块(prediction block，PB)和两个色度PB。在实施方式中，以预测块为单位来执行编解码(编码/解码)中的预测操作。使用亮度预测块作为预测块的示例，预测块包括诸如8×8像素、16×16像素、8×16像素、16×8像素等的像素的值(例如，亮度值)的矩阵。According to some embodiments of this disclosure, predictions such as inter-frame picture prediction and intra-frame picture prediction are performed on a block-by-block basis. For example, according to the HEVC standard, images in a video picture sequence are segmented into coding tree units (CTUs) for compression, with CTUs in the images having the same size, such as 64×64 pixels, 32×32 pixels, or 16×16 pixels. Generally, a CTU comprises three coding tree blocks (CTBs): one luma CTB and two chroma CTBs. Each CTU can be recursively divided into one or more coding units (CUs) using a quadtree. For example, a 64×64 pixel CTU can be divided into one 64×64 pixel CU, or four 32×32 pixel CUs, or sixteen 16×16 pixel CUs. In the example, each CU is analyzed to determine the prediction type for that CU, such as inter-frame prediction or intra-frame prediction. Depending on temporal and/or spatial predictability, a CU is divided into one or more prediction units (PUs). Typically, each PU includes a luma prediction block (PB) and two chroma PBs. In implementations, prediction operations in encoding/decoding are performed on a block-by-block basis. Using a luma prediction block as an example, a prediction block includes a matrix of pixel values (e.g., luma values) such as 8×8 pixels, 16×16 pixels, 8×16 pixels, 16×8 pixels, etc.

图7示出了根据本公开内容的另一实施方式的视频编码器(703)的图。视频编码器(703)被配置成接收视频图片序列中的当前视频图片内的样本值的处理块(例如，预测块)，并且将处理块编码到作为编码视频序列的一部分的编码图片中。在示例中，视频编码器(703)代替图4的示例中的视频编码器(403)使用。Figure 7 illustrates a diagram of a video encoder (703) according to another embodiment of the present disclosure. The video encoder (703) is configured to receive processing blocks (e.g., prediction blocks) of sample values within a current video picture in a video picture sequence, and to encode the processing blocks into encoded pictures that are part of an encoded video sequence. In this example, the video encoder (703) is used instead of the video encoder (403) in the example of Figure 4.

在HEVC示例中，视频编码器(703)接收处理块例如8×8样本的预测块等的样本值的矩阵。视频编码器(703)使用例如率失真优化来确定是使用帧内模式、帧间模式还是双向预测模式对处理块进行最佳编码。在将以帧内模式对处理块进行编码的情况下，视频编码器(703)可以使用帧内预测技术将处理块编码到编码图片中；而在将以帧间模式或双向预测模式对处理块进行编码的情况下，视频编码器(703)可以分别使用帧间预测或双向预测技术将处理块编码到编码图片中。在某些视频编码技术中，合并模式可以是帧间图片预测子模式，其中，在不借助于一个或更多个运动矢量预测器外部的编码运动矢量分量的情况下从所述预测器得出运动矢量。在某些其他视频编码技术中，可以存在适用于对象块的运动矢量分量。在示例中，视频编码器(703)包括其他部件，例如，确定处理块的模式的模式决策模块(未示出)。In the HEVC example, the video encoder (703) receives a matrix of sample values for a processing block, such as an 8×8 sample prediction block. The video encoder (703) uses, for example, rate-distortion optimization to determine whether to best encode the processing block using intra-frame mode, inter-frame mode, or bidirectional prediction mode. When encoding the processing block in intra-frame mode, the video encoder (703) can encode the processing block into an encoded picture using intra-frame prediction techniques; while when encoding the processing block in inter-frame mode or bidirectional prediction mode, the video encoder (703) can encode the processing block into an encoded picture using inter-frame prediction or bidirectional prediction techniques, respectively. In some video coding techniques, the merging mode can be an inter-frame picture prediction sub-mode, where motion vectors are derived from the predictor without the aid of encoded motion vector components outside of one or more motion vector predictors. In some other video coding techniques, motion vector components applicable to the object block may exist. In this example, the video encoder (703) includes other components, such as a mode decision module (not shown) that determines the mode of the processing block.

在图7的示例中，视频编码器(703)包括如图7所示的耦接在一起的帧间编码器(730)、帧内编码器(722)、残差计算器(723)、开关(726)、残差编码器(724)、总体控制器(721)以及熵编码器(725)。In the example of Figure 7, the video encoder (703) includes an inter-frame encoder (730), an intra-frame encoder (722), a residual calculator (723), a switch (726), a residual encoder (724), a general controller (721), and an entropy encoder (725) coupled together as shown in Figure 7.

帧间编码器(730)被配置成接收当前块(例如，处理块)的样本，将所述块与参考图片中的一个或更多个参考块(例如，先前图片和随后图片中的块)进行比较，生成帧间预测信息(例如，根据帧间编码技术的冗余信息、运动矢量、合并模式信息的描述)，以及使用任何合适的技术基于帧间预测信息计算帧间预测结果(例如，预测块)。在一些示例中，参考图片是基于编码视频信息被解码的解码参考图片。An inter-frame encoder (730) is configured to receive samples of the current block (e.g., the processing block), compare the block with one or more reference blocks in a reference image (e.g., blocks in previous and subsequent images), generate inter-frame prediction information (e.g., a description based on redundancy information, motion vectors, and merging mode information of the inter-frame coding technique), and compute inter-frame prediction results (e.g., predicted blocks) based on the inter-frame prediction information using any suitable technique. In some examples, the reference image is a decoded reference image based on the encoded video information.

帧内编码器(722)被配置成：接收当前块(例如，处理块)的样本；在一些情况下将该块与同一图片中已经编码的块进行比较；在变换之后生成量化系数；以及在一些情况下还生成帧内预测信息(例如，根据一个或更多个帧内编码技术生成帧内预测方向信息)。在示例中，帧内编码器(722)还基于帧内预测信息和同一图片中的参考块计算帧内预测结果(例如，预测块)。The intra encoder (722) is configured to: receive samples of the current block (e.g., the processing block); in some cases compare the block with already encoded blocks in the same image; generate quantization coefficients after transformation; and in some cases also generate intra prediction information (e.g., intra prediction direction information based on one or more intra coding techniques). In the example, the intra encoder (722) also calculates an intra prediction result (e.g., a prediction block) based on the intra prediction information and a reference block in the same image.

总体控制器(721)被配置成确定总体控制数据并且基于总体控制数据来控制视频编码器(703)的其他部件。在示例中，总体控制器(721)确定块的模式，并且基于该模式将控制信号提供至开关(726)。例如，当所述模式是帧内模式时，总体控制器(721)控制开关(726)以选择帧内模式结果以供残差计算器(723)使用，并且控制熵编码器(725)以选择帧内预测信息并且将所述帧内预测信息包括在比特流中；以及当所述模式是帧间模式时，总体控制器(721)控制开关(726)以选择帧间预测结果以供残差计算器(723)使用，并且控制熵编码器(725)以选择帧间预测信息并且将所述帧间预测信息包括在比特流中。The overall controller (721) is configured to determine overall control data and control other components of the video encoder (703) based on the overall control data. In an example, the overall controller (721) determines the mode of the block and provides control signals to the switch (726) based on the mode. For example, when the mode is intra-frame mode, the overall controller (721) controls the switch (726) to select intra-frame mode results for use by the residual calculator (723) and controls the entropy encoder (725) to select intra-frame prediction information and include the intra-frame prediction information in the bitstream; and when the mode is inter-frame mode, the overall controller (721) controls the switch (726) to select inter-frame prediction results for use by the residual calculator (723) and controls the entropy encoder (725) to select inter-frame prediction information and include the inter-frame prediction information in the bitstream.

残差计算器(723)被配置成计算所接收的块与选自帧内编码器(722)或帧间编码器(730)的预测结果之间的差(残差数据)。残差编码器(724)被配置成基于残差数据进行操作，以对残差数据进行编码来生成变换系数。在示例中，残差编码器(724)被配置成将残差数据从空间域转换到频域，并且生成变换系数。然后，变换系数经受量化处理以获得量化的变换系数。在各种实施方式中，视频编码器(703)还包括残差解码器(728)。残差解码器(728)被配置成执行逆变换，并且生成解码残差数据。解码残差数据可以由帧内编码器(722)和帧间编码器(730)适当地使用。例如，帧间编码器(730)可以基于解码残差数据和帧间预测信息来生成解码块，以及帧内编码器(722)可以基于解码残差数据和帧内预测信息来生成解码块。在一些示例中，适当处理解码块以生成解码图片，并且这些解码图片可以缓冲在存储器电路(未示出)中并且用作参考图片。A residual calculator (723) is configured to calculate the difference (residual data) between the received block and the prediction result selected from an intra encoder (722) or an inter encoder (730). A residual encoder (724) is configured to operate based on the residual data to encode the residual data to generate transform coefficients. In an example, the residual encoder (724) is configured to transform the residual data from the spatial domain to the frequency domain and generate transform coefficients. The transform coefficients are then quantized to obtain quantized transform coefficients. In various embodiments, the video encoder (703) also includes a residual decoder (728). The residual decoder (728) is configured to perform an inverse transform and generate decoded residual data. The decoded residual data can be appropriately used by the intra encoder (722) and the inter encoder (730). For example, the inter encoder (730) can generate a decoded block based on the decoded residual data and inter-frame prediction information, and the intra encoder (722) can generate a decoded block based on the decoded residual data and intra-frame prediction information. In some examples, the decoding blocks are properly processed to generate decoded images, and these decoded images can be buffered in memory circuitry (not shown) and used as reference images.

熵编码器(725)被配置成对比特流进行格式化以包括编码块。熵编码器(725)被配置成包括根据合适的标准例如HEVC标准的各种信息。在示例中，熵编码器(725)被配置成在比特流中包括总体控制数据、选择的预测信息(例如，帧内预测信息或帧间预测信息)、残差信息和其他合适的信息。注意，根据所公开的主题，当在帧间模式或双向预测模式的合并子模式下对块进行编码时，不存在残差信息。An entropy encoder (725) is configured to format the bitstream to include coded blocks. The entropy encoder (725) is configured to include various information according to a suitable standard, such as the HEVC standard. In this example, the entropy encoder (725) is configured to include overall control data, selected prediction information (e.g., intra-frame prediction information or inter-frame prediction information), residual information, and other suitable information in the bitstream. Note that, according to the disclosed subject matter, residual information is not present when blocks are encoded in a merged sub-mode of inter-frame mode or bidirectional prediction mode.

图8示出了根据本公开内容的另一实施方式的视频解码器(810)的图。视频解码器(810)被配置成接收作为编码视频序列的一部分的编码图片，并且对编码图片进行解码以生成重构的图片。在示例中，视频解码器(810)代替图4示例中的视频解码器(410)使用。Figure 8 illustrates a diagram of a video decoder (810) according to another embodiment of the present disclosure. The video decoder (810) is configured to receive an encoded image as part of an encoded video sequence and decode the encoded image to generate a reconstructed image. In this example, the video decoder (810) is used instead of the video decoder (410) in the example of Figure 4.

在图8的示例中，视频解码器(810)包括如图8所示的耦接在一起的熵解码器(871)、帧间解码器(880)、残差解码器(873)、重构模块(874)以及帧内解码器(872)。In the example of Figure 8, the video decoder (810) includes an entropy decoder (871), an inter-frame decoder (880), a residual decoder (873), a reconstruction module (874), and an intra-frame decoder (872) coupled together as shown in Figure 8.

熵解码器(871)可以被配置成根据编码图片来重构某些符号，这些符号表示构成编码图片的语法元素。这样的符号可以包括例如对块进行编码的模式(例如，帧内模式、帧间模式、双向预测模式、后两者的合并子模式或另一子模式)、可以标识分别供帧内解码器(872)或帧间解码器(880)使用以进行预测的某些样本或元数据的预测信息(例如，帧内预测信息或帧间预测信息)、例如量化的变换系数的形式的残差信息等。在示例中，当预测模式是帧间模式或双向预测模式时，将帧间预测信息提供给帧间解码器(880)；以及当预测类型是帧内预测类型时，将帧内预测信息提供给帧内解码器(872)。残差信息可以经受逆量化并且被提供给残差解码器(873)。The entropy decoder (871) can be configured to reconstruct certain symbols from the encoded picture, which represent the syntax elements constituting the encoded picture. Such symbols may include, for example, the mode encoding the block (e.g., intra-mode, inter-mode, bidirectional prediction mode, a combined sub-mode of the latter two, or another sub-mode), prediction information (e.g., intra-prediction information or inter-prediction information) that can identify certain samples or metadata used by the intra-decoder (872) or inter-decoder (880) for prediction, such as residual information in the form of quantized transform coefficients, etc. In the example, when the prediction mode is inter-mode or bidirectional prediction mode, the inter-prediction information is provided to the inter-decoder (880); and when the prediction type is intra-prediction type, the intra-prediction information is provided to the intra-decoder (872). The residual information may be inversely quantized and provided to the residual decoder (873).

帧间解码器(880)被配置成接收帧间预测信息，并且基于帧间预测信息生成帧间预测结果。The inter-frame decoder (880) is configured to receive inter-frame prediction information and generate inter-frame prediction results based on the inter-frame prediction information.

帧内解码器(872)被配置成接收帧内预测信息，并且基于帧内预测信息生成预测结果。The intra-frame decoder (872) is configured to receive intra-frame prediction information and generate prediction results based on the intra-frame prediction information.

残差解码器(873)被配置成执行逆量化以提取去量化的变换系数，并且处理所述去量化的变换系数，以将残差从频域转换到空间域。残差解码器(873)还可能需要某些控制信息(以包括量化器参数(Quantizer Parameter，QP))，并且所述信息可以由熵解码器(871)提供(未描绘出数据路径，因为这可能仅是少量控制信息)。The residual decoder (873) is configured to perform inverse quantization to extract the dequantized transform coefficients and process the dequantized transform coefficients to transform the residual from the frequency domain to the spatial domain. The residual decoder (873) may also require some control information (to include quantizer parameters (QP)), and this information may be provided by the entropy decoder (871) (the data path is not depicted because this may only be a small amount of control information).

重构模块(874)被配置成在空间域中将由残差解码器(873)输出的残差与预测结果(根据情况由帧间预测模块或帧内预测模块输出)进行组合以形成重构的块，所述重构的块可以是重构的图片的一部分，所述重构的图片又可以是重构的视频的一部分。注意，可以执行其他合适的操作例如去块操作等来改善视觉质量。The reconstruction module (874) is configured to combine the residual output by the residual decoder (873) with the prediction result (output by the inter-frame prediction module or intra-frame prediction module, depending on the situation) in the spatial domain to form a reconstructed block, which can be a part of a reconstructed image, which in turn can be a part of a reconstructed video. Note that other suitable operations, such as deblocking, can be performed to improve visual quality.

注意，可以使用任何合适的技术来实现视频编码器(403)、(603)和(703)以及视频解码器(410)、(510)和(810)。在实施方式中，可以使用一个或更多个集成电路来实现视频编码器(403)、(603)和(703)以及视频解码器(410)、(510)和(810)。在另一实施方式中，可以使用执行软件指令的一个或更多个处理器来实现视频编码器(403)、(603)和(603)以及视频解码器(410)、(510)和(810)。Note that any suitable technology can be used to implement the video encoders (403), (603), and (703) and the video decoders (410), (510), and (810). In one implementation, one or more integrated circuits can be used to implement the video encoders (403), (603), and (703) and the video decoders (410), (510), and (810). In another implementation, one or more processors that execute software instructions can be used to implement the video encoders (403), (603), and (603) and the video decoders (410), (510), and (810).

本公开内容描述了与神经图像压缩技术和/或神经视频压缩技术相关的视频编码技术，诸如基于人工智能(AI)的神经图像压缩(NIC)。本公开内容的方面包括NIC中的内容自适应在线训练，例如具有用于基于神经网络的端到端(E2E)优化的图像编码框架的后过滤的内容自适应在线训练NIC方法。神经网络(NN)可以包括人工神经网络(ANN)，例如深度神经网络(DNN)、卷积神经网络(CNN)等。This disclosure describes video coding techniques related to neural image compression and/or neural video compression, such as artificial intelligence (AI)-based neural image compression (NIC). Aspects of this disclosure include content-adaptive online training in NICs, such as a post-filtered content-adaptive online training NIC method with an end-to-end (E2E) optimized image coding framework based on neural networks. Neural networks (NNs) can include artificial neural networks (ANNs), such as deep neural networks (DNNs), convolutional neural networks (CNNs), etc.

在实施方式中，相关的混合视频编解码器难以作为整体进行优化。例如，混合视频编解码器中单个模块(例如，编码器)的改进可能不会导致整体性能的编码增益。在基于NN的视频编码框架中，不同的模块可以从输入到输出进行联合优化，以通过执行学习过程或训练过程(例如，机器学习过程)改进最终目标(例如，率失真性能，例如公开内容中描述的率失真损失L)，从而产生端到端优化的NIC。In implementations, related hybrid video codecs are difficult to optimize as a whole. For example, improvements to individual modules (e.g., encoders) in a hybrid video codec may not result in coding gains for the overall performance. In a neural network-based video coding framework, different modules can be jointly optimized from input to output to improve the final objective (e.g., rate-distortion performance, such as the rate-distortion loss L described in the published content) by performing a learning or training process (e.g., a machine learning process), thereby producing an end-to-end optimized NIC.

示例性NIC框架或系统可以描述如下。NIC框架可以使用输入块x作为神经网络编码器(例如，基于诸如DNN的神经网络的编码器)的输入来计算可以紧凑以例如用于存储和传输目的的压缩表示(例如，紧凑表示)神经网络解码器(例如，基于诸如DNN的神经网络的解码器)可以使用压缩表示作为输入来重构输出块(也称为重构块)在各种实施方式中，输入块x和重构块在空间域中，并且压缩表示在与空间域不同的域中。在一些示例中，压缩表示被量化和熵编码。An exemplary NIC framework or system can be described as follows. The NIC framework can use an input block x as input to a neural network encoder (e.g., an encoder based on a neural network such as a DNN) to compute a compressed representation (e.g., a compact representation) that can be compacted for purposes such as storage and transmission. A neural network decoder (e.g., a decoder based on a neural network such as a DNN) can use the compressed representation as input to reconstruct an output block (also called a reconstructed block). In various implementations, the input block x and the reconstructed block are in a spatial domain, and the compressed representation is in a domain different from the spatial domain. In some examples, the compressed representation is quantized and entropy-encoded.

在一些示例中，NIC框架可以使用变分自动编码器(VAE)结构。在VAE结构中，神经网络编码器可以直接将整个输入块x作为神经网络编码器的输入。整个输入块x可以通过一组神经网络层，这些层充当黑盒来计算压缩表示压缩表示是神经网络编码器的输出。神经网络解码器可以将整个压缩表示作为输入。压缩表示可以通过另一组神经网络层，这些层作为另一个黑盒来计算重构块可以优化率失真(R-D)损失以实现重构块的失真损失与具有权衡超参数λ的紧凑表示的比特消耗R之间的折衷。In some examples, the NIC framework can use a variational autoencoder (VAE) architecture. In a VAE architecture, the neural network encoder can directly take the entire input block x as input. The entire input block x is then processed by a set of neural network layers, which act as black boxes to compute a compressed representation. This compressed representation is the output of the neural network encoder. The neural network decoder can take the entire compressed representation as input. The compressed representation is then processed by another set of neural network layers, which act as another black box to compute. The reconstructed block can be optimized for rate-distortion (R-D) loss to achieve a trade-off between the distortion loss of the reconstructed block and the bit consumption R of the compact representation with a trade-off hyperparameter λ.

神经网络(例如，ANN)可以从示例中学习执行任务，而无需针对特定任务进行编程。ANN可以配置有连接的节点或人工神经元。节点之间的连接可以将信号从第一节点传输到第二节点(例如，接收节点)，并且可以通过权重来修改信号，该权重可以由连接的权重系数指示。接收节点可以对来自将信号传输到接收节点的节点的信号(即，接收节点的输入信号)进行处理然后通过对输入信号应用函数来生成输出信号。该函数可以是线性函数。在示例中，输出信号是输入信号的加权和。在示例中，输出信号还被可以由偏置项指示的偏置修改，因此输出信号是偏置和输入信号的加权和的总和。该函数可以包括非线性运算，例如，对输入信号的加权和或偏置和加权和的和。输出信号可以被发送到连接到接收节点的节点(下游节点)。ANN可以通过参数(例如，连接的权重和/或偏置)来表示或配置。权重和/或偏置可以通过使用可以迭代调整权重和/或偏置的示例训练ANN来获得。配置有确定的权重和/或确定的偏置的经训练的ANN可用于执行任务。Neural networks (e.g., ANNs) can learn to perform tasks from examples without being programmed for a specific task. An ANN can be configured with connected nodes or artificial neurons. Connections between nodes can transmit signals from a first node to a second node (e.g., a receiving node) and can modify the signals via weights, which can be indicated by the weight coefficients of the connections. A receiving node can process the signals from the nodes that transmitted the signals to it (i.e., the input signals to the receiving node) and then generate an output signal by applying a function to the input signals. This function can be a linear function. In the example, the output signal is a weighted sum of the input signals. In the example, the output signal is also modified by a bias, which can be indicated by a bias term, so the output signal is the sum of the weighted sum of the bias and the input signals. The function can include nonlinear operations, such as a weighted sum of the input signals or a sum of the bias and the weighted sum. The output signal can be sent to nodes connected to the receiving node (downstream nodes). An ANN can be represented or configured via parameters (e.g., the weights and/or biases of the connections). The weights and/or biases can be obtained by training the ANN using examples where the weights and/or biases can be iteratively adjusted. A trained ANN configured with defined weights and/or defined biases can be used to perform tasks.

ANN中的节点可以以任何合适的架构进行组织。在各种实施方式中，ANN中的节点被组织成层，包括将输入信号接收到ANN的输入层和输出来自ANN的输出信号的输出层。在实施方式中，ANN还包括在输入层和输出层之间的层，例如隐藏层。不同层可以对不同层的相应输入执行不同种类的变换。信号可以从输入层传输到输出层。Nodes in an ANN can be organized in any suitable architecture. In various implementations, nodes in an ANN are organized into layers, including an input layer that receives input signals into the ANN and an output layer that outputs signals from the ANN. In some implementations, the ANN also includes layers between the input and output layers, such as hidden layers. Different layers can perform different kinds of transformations on their respective inputs. Signals can be transmitted from the input layer to the output layer.

在输入层和输出层之间具有多个层的ANN可以被称为DNN。在实施方式中，DNN是其中数据从输入层流向输出层而没有环回的前馈网络。在示例中，DNN是其中一层中的每个节点都连接到下一层中的所有节点的全连接网络。在实施方式中，DNN是其中数据可以沿任何方向流动的循环神经网络(RNN)。在实施方式中，DNN是CNN。An ANN with multiple layers between its input and output layers can be called a DNN. In implementations, a DNN is a feedforward network where data flows from the input layer to the output layer without loopback. In an example, a DNN is a fully connected network where each node in one layer is connected to all nodes in the next layer. In implementations, a DNN is a recurrent neural network (RNN) where data can flow in any direction. In implementations, a DNN is a CNN.

CNN可以包括输入层、输出层以及输入层和输出层之间的隐藏层。隐藏层可以包括执行卷积例如二维(2D)卷积的卷积层(例如，在编码器中使用)。在实施方式中，在卷积层中执行的2D卷积在卷积核(也称为滤波器或通道，例如5×5矩阵)与到卷积层的输入信号(例如，2D矩阵，例如2D块，256×256矩阵)之间。在各种示例中，卷积核的维度(例如，5×5)小于输入信号的维度(例如，256×256)。因此，输入信号(例如，256×256矩阵)中被卷积核覆盖的部分(例如，5×5区域)小于输入信号的区域(例如，256×256区域)，因此可以被称为下一层相应节点中的接受域。A CNN may include an input layer, an output layer, and hidden layers between the input and output layers. Hidden layers may include convolutional layers that perform convolutions, such as two-dimensional (2D) convolutions (e.g., used in an encoder). In implementations, the 2D convolution performed in the convolutional layer lies between the convolutional kernel (also called a filter or channel, e.g., a 5×5 matrix) and the input signal to the convolutional layer (e.g., a 2D matrix, such as a 2D block, a 256×256 matrix). In various examples, the dimension of the convolutional kernel (e.g., 5×5) is smaller than the dimension of the input signal (e.g., 256×256). Therefore, the portion of the input signal (e.g., a 256×256 matrix) covered by the convolutional kernel (e.g., a 5×5 region) is smaller than the region of the input signal (e.g., a 256×256 region), and can therefore be referred to as the receptive field in the corresponding node of the next layer.

在卷积期间，计算卷积核与输入信号中对应的接受域的点积。因此，卷积核的每个元素都是应用于接受域中的相应样本的权重，因此卷积核包括权重。例如，由5×5矩阵表示的卷积核有25个权重。在一些示例中，对卷积层的输出信号施加偏置，并且输出信号基于点积和偏置的和。During convolution, the dot product between the convolution kernel and the corresponding receptive field in the input signal is calculated. Therefore, each element of the convolution kernel is a weight applied to the corresponding sample in the receptive field, and thus the convolution kernel comprises weights. For example, a convolution kernel represented by a 5×5 matrix has 25 weights. In some examples, a bias is applied to the output signal of the convolutional layer, and the output signal is based on the sum of the dot product and the bias.

卷积核可以沿输入信号(例如，2D矩阵)移动称为步幅的大小，因此卷积操作生成特征图或激活图(例如，另一个2D矩阵)，其反过来又有助于CNN中下一层的输入。例如，输入信号是具有256×256个样本的2D块，步幅是2个样本(例如，步幅为2)。对于步幅2，卷积核沿X方向(例如，水平方向)和/或Y方向(例如，垂直方向)移动2个样本。A convolutional kernel can move along the input signal (e.g., a 2D matrix) by a size called stride, so the convolution operation generates feature maps or activation maps (e.g., another 2D matrix), which in turn contribute to the input of the next layer in a CNN. For example, the input signal is a 2D block with 256×256 samples, and the stride is 2 samples (e.g., stride 2). For a stride of 2, the convolutional kernel moves 2 samples along the X direction (e.g., horizontal) and/or the Y direction (e.g., vertical).

可以在同一卷积层中将多个卷积核应用于输入信号以分别生成多个特征图，其中每个特征图可以表示输入信号的特定特征。一般来说，一个卷积层具有N个通道(即N个卷积核)，每个卷积核具有M×M个样本，步幅S可以被指定为卷积:MxM cN sS。例如，一个卷积层具有192个通道，每个卷积核具有5×5个样本，步幅2被指定为Conv:5x5 c192 s2。隐藏层可以包括执行反卷积(例如2D反卷积)的反卷积层(例如，在解码器中使用)。反卷积是卷积的逆。一个反卷积层具有192个通道，每个反卷积核具有5×5个样本，步幅2被指定为去卷积:5x5 c192 s2。Multiple convolutional kernels can be applied to the input signal within the same convolutional layer to generate multiple feature maps, each representing a specific feature of the input signal. Generally, a convolutional layer has N channels (i.e., N kernels), each kernel has M×M samples, and the stride S can be specified as convolution: MxM cN sS. For example, a convolutional layer with 192 channels, each kernel with 5×5 samples, and a stride 2 is specified as Conv: 5x5 c192 s2. Hidden layers can include deconvolutional layers that perform deconvolution (e.g., 2D deconvolution) (e.g., used in decoders). Deconvolution is the inverse of convolution. A deconvolutional layer with 192 channels, each deconvolution kernel with 5×5 samples, and a stride 2 is specified as deconvolution: 5x5 c192 s2.

在各种实施方式中，CNN具有以下益处。CNN中的许多可学习参数(即要训练的参数)可以明显小于DNN(例如，前馈DNN)中的许多可学习参数。在CNN中，相对大量的节点可以共享相同的滤波器(例如，相同的权重)和相同的偏置(如果使用偏置)，因此可以减少内存占用，因为单个偏置和权重的单个矢量可以在共享相同滤波器的所有接受域之间使用。例如，对于具有100×100个样本的输入信号，具有5×5个样本的卷积核的卷积层具有25个可学习参数(例如，权重)。如果使用偏置，则一个通道使用26个可学习参数(例如，25个权重和一个偏置)。如果卷积层有N个通道，则总的可学习参数为26×N个。另一方面，对于DNN中的全连接层，对于下一层中的每个节点使用100×100(即10000)权重。如果下一层有L个节点，那么总的可学习参数为10000×L个。In various implementations, CNNs offer the following advantages. The number of learnable parameters (i.e., parameters to be trained) in a CNN can be significantly smaller than the number of learnable parameters in a DNN (e.g., a feedforward DNN). In a CNN, a relatively large number of nodes can share the same filters (e.g., the same weights) and the same biases (if biases are used), thus reducing memory footprint, as a single vector of a single bias and weights can be used across all receptive domains sharing the same filters. For example, for an input signal with 100×100 samples, a convolutional layer with a kernel of 5×5 samples has 25 learnable parameters (e.g., weights). If biases are used, one channel uses 26 learnable parameters (e.g., 25 weights and one bias). If the convolutional layer has N channels, the total number of learnable parameters is 26×N. On the other hand, for a fully connected layer in a DNN, 100×100 (i.e., 10000) weights are used for each node in the next layer. If the next layer has L nodes, the total number of learnable parameters is 10000×L.

CNN还可以包括一个或更多个其他层，例如池化层、可以将一层中的每个节点连接到另一层中的每个节点的全连接层、归一化层等。CNN中的层可以以任何合适的顺序和任何合适的架构(例如，前馈架构、循环架构)布置。在示例中，卷积层之后是其他层，例如池化层、全连接层、归一化层等。CNNs can also include one or more other layers, such as pooling layers, fully connected layers that connect each node in one layer to each node in another layer, normalization layers, etc. The layers in a CNN can be arranged in any suitable order and with any suitable architecture (e.g., feedforward architecture, recurrent architecture). In the example, convolutional layers are followed by other layers, such as pooling layers, fully connected layers, normalization layers, etc.

池化层可用于通过将来自一层中的多个节点的输出组合到下一层中的单个节点中来减少数据的维度。下面描述具有特征图作为输入的池化层的池化操作。该描述可以适当地适用于其他输入信号。可以将特征图划分为子区域(例如矩形子区域)，各个子区域中的特征可以例如通过取平均池中的平均值或最大池中的最大值来独立地下采样(或池化)到单个值。Pooling layers can be used to reduce the dimensionality of data by combining the outputs from multiple nodes in one layer into a single node in the next layer. The pooling operation of a pooling layer with a feature map as input is described below. This description can be appropriately applied to other input signals. The feature map can be divided into sub-regions (e.g., rectangular sub-regions), and the features in each sub-region can be independently downsampled (or pooled) to a single value, for example, by taking the average value in the average pooling or the maximum value in the max pooling.

池化层可以执行池化，例如局部池化、全局池化、最大池化、平均池化等。池化是非线性下采样的一种形式。局部池化在特征图中组合了少量节点(例如，局部节点集群，例如2×2个节点)。全局池化可以组合例如特征图的所有节点。Pooling layers can perform pooling operations, such as local pooling, global pooling, max pooling, and average pooling. Pooling is a form of non-linear downsampling. Local pooling combines a small number of nodes in the feature map (e.g., a local cluster of nodes, such as 2×2 nodes). Global pooling can combine, for example, all nodes in the feature map.

池化层可以减小表示的大小，从而减少参数的数量、内存占用以及CNN中的计算量。在示例中，在CNN中的连续卷积层之间插入了池化层。在示例中，池化层之后是激活函数，例如整流线性单元(ReLU)层。在示例中，在CNN中的连续卷积层之间省略了池化层。Pooling layers can reduce the size of the representation, thereby reducing the number of parameters, memory footprint, and computational cost in a CNN. In the example, pooling layers are inserted between consecutive convolutional layers in the CNN. In the example, the pooling layer is followed by an activation function, such as a Rectified Linear Unit (ReLU) layer. In the example, pooling layers are omitted between consecutive convolutional layers in the CNN.

归一化层可以是ReLU、泄漏ReLU、广义除法归一化(GDN)、逆GDN(IGDN)等。ReLU可以应用非饱和激活函数以通过将负值设置为零来从输入信号(例如特征图)中去除负值。对于负值，泄漏ReLU可以具有小的斜率(例如0.01)而不是平坦的斜率(例如0)。因此，如果值x大于0，则来自泄漏ReLU的输出是x。否则，来自泄漏ReLU的输出是值x乘以小的斜率(例如，0.01)。在示例中，斜率在训练之前被确定，并且因此在训练期间不学习。The normalization layer can be ReLU, Leaked ReLU, Generalized Division Normalization (GDN), Inverse GDN (IGDN), etc. ReLU can apply a non-saturating activation function to remove negative values from the input signal (e.g., feature map) by setting negative values to zero. For negative values, Leaked ReLU can have a small slope (e.g., 0.01) instead of a flat slope (e.g., 0). Therefore, if the value x is greater than 0, the output from Leaked ReLU is x. Otherwise, the output from Leaked ReLU is the value x multiplied by a small slope (e.g., 0.01). In the example, the slope is determined before training and therefore not learned during training.

在基于NN的图像压缩方法(诸如基于DNN或基于CNN的图像压缩方法)中，基于块或逐块的编码机制对于以基于DNN的视频编码标准诸如FVC压缩图像是有效的，而不是直接对整个图像进行编码。整个图像可以被划分成相同(或不同)大小的块，并且这些块可以被单独压缩。在实施方式中，图像可以被分割成大小相等或大小不相等的块。可以压缩分割的块而不是图像。图9A示出了根据本公开内容的实施方式的逐块图像编码的示例。图像(980)可以被划分成块，例如块(981)至块(996)。例如，可以根据扫描顺序来压缩块(981)至块(996)。在图9A所示的示例中，块(981)至块(989)已经被压缩，并且块(990)至块(996)将被压缩。In NN-based image compression methods (such as DNN-based or CNN-based image compression methods), block-based or per-block encoding mechanisms are effective for compressing images using DNN-based video coding standards such as FVC, rather than directly encoding the entire image. The entire image can be divided into blocks of the same (or different) size, and these blocks can be compressed individually. In implementations, the image can be divided into blocks of equal or unequal size. The divided blocks, rather than the image itself, can be compressed. Figure 9A illustrates an example of per-block image encoding according to an implementation of this disclosure. The image (980) can be divided into blocks, such as blocks (981) to (996). For example, blocks (981) to (996) can be compressed according to the scan order. In the example shown in Figure 9A, blocks (981) to (989) have already been compressed, and blocks (990) to (996) will be compressed.

在实施方式中，图像被视为块，其中块是整个图像，并且图像在没有被分割成块的情况下被压缩。整个图像可以是E2E NIC框架的输入。In this implementation, the image is treated as a block, where a block is the entire image, and the image is compressed without being segmented into blocks. The entire image can be the input to an E2E NIC framework.

图9B示出了根据本公开内容的实施方式的示例性NIC框架(900)(例如，NIC系统)。NIC框架(900)可以基于神经网络，诸如DNN和/或CNN。NIC框架(900)可以用于压缩(例如，编码)块和解压缩(例如，解码或重构)被压缩的块(例如，被编码的块)。NIC框架(900)可以包括使用神经网络实现的两个子神经网络，第一子NN(951)和第二子NNFigure 9B illustrates an exemplary NIC framework (900) (e.g., a NIC system) according to an embodiment of this disclosure. The NIC framework (900) may be based on neural networks, such as DNNs and/or CNNs. The NIC framework (900) may be used to compress (e.g., encode) blocks and decompress (e.g., decode or reconstruct) compressed blocks (e.g., encoded blocks). The NIC framework (900) may include two sub-neural networks implemented using neural networks, a first sub-NN (951) and a second sub-NN.

(952)。(952).

第一子NN(951)可以类似于自动编码器，并且可以被训练以生成输入块x的压缩块并且对该压缩块(即，编码块)进行解压缩以获得重构块第一子NN(951)可以包括多个部件(或模块)，诸如主编码器神经网络(或主编码器网络)(911)、量化器(912)、熵编码器(913)、熵解码器(914)和主解码器神经网络(或主编码器网络)(915)。参照图9B，主编码器网络(911)可以从输入块x(例如，要被压缩或要被编码的块)生成潜在或潜在表示y。在示例中，使用CNN来实现主编码器网络(911)。潜在表示y与输入块x之间的关系可以使用式2来描述。The first sub-NN (951) can be analogous to an autoencoder and can be trained to generate compressed blocks of input block x and decompress the compressed blocks (i.e., encoded blocks) to obtain reconstructed blocks. The first sub-NN (951) can include multiple components (or modules), such as a master encoder neural network (or master encoder network) (911), a quantizer (912), an entropy encoder (913), an entropy decoder (914), and a master decoder neural network (or master encoder network) (915). Referring to Figure 9B, the master encoder network (911) can generate a latent or latent representation y from the input block x (e.g., the block to be compressed or encoded). In the example, a CNN is used to implement the master encoder network (911). The relationship between the latent representation y and the input block x can be described using Equation 2.

y＝f₁(x；θ₁) 式2y = f ₁ (x; θ ₁ ) Equation 2

其中，参数θ₁表示如下参数：诸如主编码器网络(911)中的卷积核中使用的权重以及偏置(如果在主编码器网络(911)中使用偏置)。Here, parameter _θ1 represents the following parameters: such as the weights used in the convolution kernels in the master encoder network (911) and the bias (if a bias is used in the master encoder network (911)).

可以使用量化器(912)对潜在表示y进行量化以生成经量化的潜在例如，可以由熵编码器(913)使用无损压缩对经量化的潜在进行压缩，以生成作为输入块x的压缩表示的压缩块(例如，编码块)(931)。熵编码器(913)可以使用诸如霍夫曼编码、算术编码等的熵编码技术。在示例中，熵编码器(913)使用算术编码并且是算术编码器。在示例中，编码块(931)在编码比特流中被传输。The latent representation y can be quantized using a quantizer (912) to generate a quantized latent. For example, the quantized latent can be compressed using lossless compression by an entropy encoder (913) to generate a compressed block (e.g., an encoded block) (931) as a compressed representation of the input block x. The entropy encoder (913) can use entropy coding techniques such as Huffman coding, arithmetic coding, etc. In the example, the entropy encoder (913) uses arithmetic coding and is an arithmetic encoder. In the example, the encoded block (931) is transmitted in an encoded bitstream.

编码块(931)可以由熵解码器(914)解压缩(例如，熵解码)以生成输出。熵解码器(914)可以使用与熵编码器(913)中使用的熵编码技术对应的诸如霍夫曼编码、算术编码等的熵编码技术。在示例中，熵解码器(914)使用算术解码并且是算术解码器。在示例中，在熵编码器(913)中使用无损压缩，在熵解码器(914)中使用无损解压缩，并且可以忽略诸如由于编码块(931)的传输而产生的噪声，来自熵解码器(914)的输出是经量化的潜在The encoded block (931) can be decompressed (e.g., entropy decoding) by the entropy decoder (914) to generate an output. The entropy decoder (914) can use an entropy coding technique such as Huffman coding, arithmetic coding, etc., corresponding to the entropy coding technique used in the entropy encoder (913). In the example, the entropy decoder (914) uses arithmetic decoding and is an arithmetic decoder. In the example, lossless compression is used in the entropy encoder (913), lossless decompression is used in the entropy decoder (914), and noise such as that generated due to the transmission of the encoded block (931) can be ignored. The output from the entropy decoder (914) is a quantized latent...

主解码器网络(915)可以对经量化的潜在进行解码以生成重构块在示例中，使用CNN来实现主解码器网络(915)。重构块(即，主解码器网络(915)的输出)与经量化的潜在(即，主解码器网络(915)的输入)之间的关系可以使用式3来描述。The master decoder network (915) can decode the quantized latent to generate reconstructed blocks. In the example, a CNN is used to implement the master decoder network (915). The relationship between the reconstructed blocks (i.e., the output of the master decoder network (915)) and the quantized latent (i.e., the input of the master decoder network (915)) can be described by Equation 3.

其中，参数θ₂表示如下参数：诸如主解码器网络(915)中的卷积核中使用的权重以及偏置(如果在主解码器网络(915)中使用偏置)。因此，第一子NN(951)可以对输入块x进行压缩(例如，编码)以获得编码块(931)，并且对编码块(931)进行解压缩(例如，解码)以获得重构块由于由量化器(912)引入的量化损失，重构块可能会与输入块x不同。Here, parameter _θ2 represents parameters such as the weights used in the convolutional kernels in the main decoder network (915) and the bias (if a bias is used in the main decoder network (915)). Thus, the first sub-NN (951) can compress (e.g., encode) the input block x to obtain the encoded block (931), and decompress (e.g., decode) the encoded block (931) to obtain the reconstructed block. Due to the quantization loss introduced by the quantizer (912), the reconstructed block may differ from the input block x.

第二子NN(952)可以针对用于熵编码的经量化的潜在学习熵模型(例如，先验概率模型)。因此，熵模型可以是取决于输入块x的条件熵模型，例如高斯混合模型(GMM)、高斯尺度模型(GSM)。第二子NN(952)可以包括上下文模型NN(916)、熵参数NN(917)、超编码器(921)、量化器(922)、熵编码器(923)、熵解码器(924)和超解码器(925)。上下文模型NN(916)中使用的熵模型可以是关于潜在(例如，经量化的潜在)的自回归模型。在示例中，超编码器(921)、量化器(922)、熵编码器(923)、熵解码器(924)和超解码器(925)形成超神经网络(例如，超先验NN)。超神经网络可以表示对校正基于上下文的预测有用的信息。来自上下文模型NN(916)和超神经网络的数据可以通过熵参数NN(917)来组合。熵参数NN(917)可以生成如下参数，诸如用于熵模型诸如条件高斯熵模型(例如，GMM)的均值和尺度参数。The second sub-NN (952) can be trained on a quantized latent entropy model (e.g., a prior probability model) for entropy encoding. Thus, the entropy model can be a conditional entropy model dependent on the input block x, such as a Gaussian mixture model (GMM) or a Gaussian scaling model (GSM). The second sub-NN (952) can include a context model NN (916), an entropy parameter NN (917), a super encoder (921), a quantizer (922), an entropy encoder (923), an entropy decoder (924), and a super decoder (925). The entropy model used in the context model NN (916) can be an autoregressive model about the latent (e.g., a quantized latent). In the example, the super encoder (921), quantizer (922), entropy encoder (923), entropy decoder (924), and super decoder (925) form a super neural network (e.g., a super-prior NN). The super neural network can represent information useful for correcting context-based predictions. Data from the context model NN (916) and the super neural network can be combined using the entropy parameter NN (917). The entropy parameter NN(917) can generate parameters such as the mean and scale parameters for entropy models such as conditional Gaussian entropy models (e.g., GMM).

参照图9B，在编码器侧，来自量化器(912)的经量化的潜在被馈送至上下文模型NN(916)中。在解码器侧，来自熵解码器(914)的经量化的潜在被馈送至上下文模型NN(916)中。上下文模型NN(916)可以使用诸如CNN的神经网络来实现。上下文模型NN(916)可以基于上下文生成输出o_cm，i，该上下文是对上下文模型NN(916)可用的经量化的潜在上下文可以包括编码器侧的先前经量化的潜在或解码器侧的先前被熵解码的经量化的潜在。上下文模型NN(916)的输出o_cm，i与输入(例如，)之间的关系可以使用式4来描述。Referring to Figure 9B, on the encoder side, the quantized latent from the quantizer (912) is fed into the context model NN (916). On the decoder side, the quantized latent from the entropy decoder (914) is fed into the context model NN (916). The context model NN (916) can be implemented using a neural network such as a CNN. The context model NN (916) can generate an output o _cm,i based on a context, which is a quantized latent context available to the context model NN (916), which may include previously quantized latents on the encoder side or previously entropy-decoded quantized latents on the decoder side. The relationship between the output o _cm,i of the context model NN (916) and the input (e.g., ) can be described using Equation 4.

其中，参数θ₃表示如下参数，诸如上下文模型NN(916)中的卷积核中使用的权重以及偏置(如果在上下文模型NN(916)中使用偏置)。Here, parameter _θ3 represents parameters such as the weights used in the convolution kernels of the context model NN(916) and the bias (if a bias is used in the context model NN(916)).

来自上下文模型NN(916)的输出o_cm，i和来自超解码器(925)的输出o_hc被馈送至熵参数NN(917)以生成输出o_ep。熵参数NN(917)可以使用诸如CNN的神经网络来实现。熵参数NN(917)的输出o_ep与输入(例如，o_cm，i和o_hc)之间的关系可以使用式5来描述。The outputs o _{cm, i} from the context model NN (916) and the output o _hc from the super decoder (925) are fed into the entropy parameter NN (917) to generate the output o _ep . The entropy parameter NN (917) can be implemented using a neural network such as CNN. The relationship between the output o _ep of the entropy parameter NN (917) and the inputs (e.g., o _{cm, i} and o _hc ) can be described by Equation 5.

o_ep＝f₄(o_cm，i，o_hc；θ₄) 式5o _ep = f ₄ (o _{cm, i} , o _hc ; θ ₄ ) Equation 5

其中，参数θ₄表示如下参数，诸如熵参数NN(917)中的卷积核中使用的权重以及偏置(如果在熵参数NN(917)中使用偏置)。熵参数NN(917)的输出o_ep可以用于确定(例如，调节)熵模型，并且因此，经调节的熵模型可以例如经由来自超解码器(925)的输出o_hc而依赖于输入块x。在示例中，输出O_ep包括用于调节熵模型(例如，GMM)的参数，诸如均值和尺度参数。参照图9B，熵编码器(913)和熵解码器(914)可以分别在熵编码和熵解码中使用熵模型(例如，条件熵模型)。Here, parameter _θ4 represents parameters such as the weights used in the convolutional kernel of the entropy parameter NN (917) and the bias (if a bias is used in the entropy parameter NN (917)). The output _oep of the entropy parameter NN (917) can be used to determine (e.g., adjust) the entropy model, and thus the adjusted entropy model can depend on the input block x, for example, via the output _ohc from the superdecoder (925). In the example, the output _Oep includes parameters for adjusting the entropy model (e.g., GMM), such as the mean and scale parameters. Referring to FIG9B, the entropy encoder (913) and entropy decoder (914) can use the entropy model (e.g., the conditional entropy model) in entropy encoding and entropy decoding, respectively.

下面可以描述第二子NN(952)。潜在y可以被馈送至超编码器(921)中以生成超潜在z。在示例中，超编码器(921)使用诸如CNN的神经网络来实现。超潜在z与潜在y之间的关系可以使用式6来描述。The second sub-NN (952) can be described below. The latent y can be fed into the super encoder (921) to generate the hyper-latest z. In the example, the super encoder (921) is implemented using a neural network such as a CNN. The relationship between the hyper-latest z and the latent y can be described using Equation 6.

z＝f₅(y；θ₅) 式6z = _f5 (y; _θ5 ) Equation 6

其中，参数θ₅表示如下参数，诸如超编码器(921)中的卷积核中使用的权重以及偏置(如果在超编码器(921)中使用偏置)。Here, parameter _θ5 represents parameters such as the weights used in the convolution kernel of the super encoder (921) and the bias (if a bias is used in the super encoder (921)).

由量化器(922)对超潜在z进行量化以生成经量化的潜在例如，可以由熵编码器(923)使用无损压缩对经量化的潜在进行压缩，以生成诸如来自超神经网络的编码比特(932)的边信息。熵编码器(923)可以使用诸如霍夫曼编码、算术编码等的熵编码技术。在示例中，熵编码器(923)使用算术编码并且是算术编码器。在示例中，诸如编码比特(932)的边信息可以在编码比特流中例如与编码块(931)一起传输。The hyperpotential z is quantized by a quantizer (922) to generate a quantized potential. For example, the quantized potential can be compressed by an entropy encoder (923) using lossless compression to generate side information such as coded bits (932) from a hyperneural network. The entropy encoder (923) can use entropy coding techniques such as Huffman coding, arithmetic coding, etc. In the example, the entropy encoder (923) uses arithmetic coding and is an arithmetic encoder. In the example, side information such as coded bits (932) can be transmitted in the coded bit stream, for example, along with coded blocks (931).

诸如编码比特(932)的边信息可以由熵解码器(924)解压缩(例如，熵解码)以生成输出。熵解码器(924)可以使用诸如霍夫曼编码、算术编码等的熵编码技术。在示例中，熵解码器(924)使用算术解码并且是算术解码器。在示例中，在熵编码器(923)中使用无损压缩，在熵解码器(924)中使用无损解压缩，并且可以忽略诸如由于边信息的传输而导致的噪声，来自熵解码器(924)的输出可以是经量化的潜在超解码器(925)可以对经量化的潜在进行解码以生成输出o_hc。输出o_hc与经量化的潜在之间的关系可以使用式7来描述。Side information, such as encoded bits (932), can be decompressed (e.g., entropy decoding) by an entropy decoder (924) to generate an output. The entropy decoder (924) can use entropy coding techniques such as Huffman coding, arithmetic coding, etc. In the example, the entropy decoder (924) uses arithmetic decoding and is an arithmetic decoder. In the example, lossless compression is used in the entropy encoder (923), lossless decompression is used in the entropy decoder (924), and noise such as that caused by the transmission of side information can be ignored. The output from the entropy decoder (924) can be a quantized latent superdecoder (925) that can decode the quantized latent to generate an output o _hc . The relationship between the output o _hc and the quantized latent can be described by Equation 7.

其中，参数θ₆表示如下参数，诸如超解码器(925)中的卷积核中使用的权重以及偏置(如果在超解码器(925)中使用偏置)。Here, parameter _θ6 represents parameters such as the weights used in the convolution kernels in the superdecoder (925) and the bias (if a bias is used in the superdecoder (925)).

如上所述，压缩比特或编码比特(932)可以作为边信息被添加到编码比特流，这使得熵解码器(914)能够使用条件熵模型。因此，熵模型可以是块相关的和空间自适应的，并且因此可以比固定熵模型更精确。As described above, compressed bits or encoded bits (932) can be added to the encoded bitstream as side information, which enables the entropy decoder (914) to use a conditional entropy model. Therefore, the entropy model can be block-dependent and spatially adaptive, and thus more accurate than the fixed entropy model.

NIC框架(900)可以被适当地调整，以例如省略图9B中所示的一个或更多个部件、修改图9B中所示的一个或更多个部件以及/或者包括图9B中未示出的一个或更多个部件。在示例中，使用固定熵模型的NIC框架包括第一子NN(951)，而不包括第二子NN(952)。在示例中，NIC框架包括NIC框架(900)中除熵编码器(923)和熵解码器(924)之外的部件。The NIC framework (900) can be appropriately adapted to, for example, omit one or more components shown in FIG. 9B, modify one or more components shown in FIG. 9B, and/or include one or more components not shown in FIG. 9B. In the example, the NIC framework using the fixed entropy model includes a first sub-NN (951) but excludes a second sub-NN (952). In the example, the NIC framework includes all components of the NIC framework (900) except for the entropy encoder (923) and the entropy decoder (924).

在实施方式中，使用诸如CNN的神经网络来实现图9B所示的NIC框架(900)中的一个或更多个部件。NIC框架(例如，NIC框架(900))中的每个基于NN的部件(例如，主编码器网络(911)、主解码器网络(915)、上下文模型NN(916)、熵参数NN(917)、超编码器(921)或超解码器(925))可以包括任何合适的架构(例如，具有任何合适的层组合)、包括任何合适类型的参数(例如，权重、偏置、权重与偏置的组合等)并且包括任意合适数目的参数。In implementation, one or more components of the NIC framework (900) shown in Figure 9B are implemented using neural networks such as CNNs. Each NN-based component in the NIC framework (e.g., the NIC framework (900)) (e.g., the master encoder network (911), the master decoder network (915), the context model NN (916), the entropy parameter NN (917), the super encoder (921), or the super decoder (925)) may include any suitable architecture (e.g., with any suitable combination of layers), any suitable type of parameters (e.g., weights, biases, combinations of weights and biases, etc.), and any suitable number of parameters.

在实施方式中，使用相应的CNN来实现主编码器网络(911)、主解码器网络(915)、上下文模型NN(916)、熵参数NN(917)、超编码器(921)和超解码器(925)。In the implementation, the main encoder network (911), the main decoder network (915), the context model NN (916), the entropy parameter NN (917), the super encoder (921), and the super decoder (925) are implemented using the corresponding CNN.

图10示出了根据本公开内容的实施方式的主编码器网络(911)的示例性CNN。例如，主编码器网络(911)包括四组层，其中每组层包括跟随有GDN层的卷积层5×5c192 s2。图10中所示的一个或更多个层可以被修改和/或省略。可以将另外的层添加到主编码器网络(911)。Figure 10 illustrates an exemplary CNN of a master encoder network (911) according to an embodiment of the present disclosure. For example, the master encoder network (911) includes four sets of layers, each set of layers including a 5×5c192 s2 convolutional layer followed by a GDN layer. One or more layers shown in Figure 10 may be modified and/or omitted. Additional layers may be added to the master encoder network (911).

图11示出了根据本公开内容的实施方式的主解码器网络(915)的示例性CNN。例如，主解码器网络(915)包括三组层，其中每组层包括跟随有IGDN层的去卷积层5x5 c192s2。此外，三组层之后是跟随有IGDN层的去卷积层5x5 c3 s2。图11中所示的一个或更多个层可以被修改和/或省略。可以将另外的层添加到主解码器网络(915)。Figure 11 illustrates an exemplary CNN of a master decoder network (915) according to an embodiment of the present disclosure. For example, the master decoder network (915) includes three sets of layers, each set including a 5x5 c192s2 deconvolutional layer followed by an IGDN layer. Furthermore, the three sets of layers are followed by a 5x5 c3s2 deconvolutional layer also followed by an IGDN layer. One or more layers shown in Figure 11 may be modified and/or omitted. Additional layers may be added to the master decoder network (915).

图12示出了根据本公开内容的实施方式的超编码器(921)的示例性CNN。例如，超编码器(921)包括跟随有泄漏ReLU的卷积层3x3 c192 s1，跟随有泄漏ReLU的卷积层5x5c192 s2，以及卷积层5x5 c192 s2。图12中所示的一个或更多个层可以被修改和/或省略。可以将另外的层添加到超编码器(921)。Figure 12 illustrates an exemplary CNN of a super encoder (921) according to an embodiment of the present disclosure. For example, the super encoder (921) includes a 3x3 c192 s1 convolutional layer followed by leaky ReLU, a 5x5 c192 s2 convolutional layer followed by leaky ReLU, and a 5x5 c192 s2 convolutional layer. One or more layers shown in Figure 12 may be modified and/or omitted. Additional layers may be added to the super encoder (921).

图13示出了根据本公开内容的实施方式的超解码器(925)的示例性CNN。例如，超解码器(925)包括跟随有泄漏ReLU的去卷积层5x5 c192 s2、跟随有泄漏ReLU的去卷积层5x5 c288 s2以及去卷积层3x3 c384 s1。图13中所示的一个或更多个层可以被修改和/或省略。可以将另外的层添加到超解码器(925)。Figure 13 illustrates an exemplary CNN of a superdecoder (925) according to an embodiment of the present disclosure. For example, the superdecoder (925) includes a 5x5 c192 s2 deconvolutional layer followed by leaky ReLU, a 5x5 c288 s2 deconvolutional layer followed by leaky ReLU, and a 3x3 c384 s1 deconvolutional layer. One or more layers shown in Figure 13 may be modified and/or omitted. Additional layers may be added to the superdecoder (925).

图14示出了根据本公开内容的实施方式的上下文模型NN(916)的示例性CNN。例如，上下文模型NN(916)包括用于上下文预测的掩蔽卷积5x5 c384 s1，并且因此式4中的上下文包括有限的上下文(例如，5×5卷积核)。可以修改图14中的卷积层。可以将另外的层添加到上下文模型NN(916)。Figure 14 illustrates an exemplary CNN of a context model NN (916) according to an embodiment of the present disclosure. For example, the context model NN (916) includes a masked convolution 5x5 c384 s1 for context prediction, and thus the context in Equation 4 includes a finite context (e.g., a 5×5 convolutional kernel). The convolutional layers in Figure 14 can be modified. Additional layers can be added to the context model NN (916).

图15示出了根据本公开内容的实施方式的熵参数NN(917)的示例性CNN。例如，熵参数NN(917)包括跟随有泄漏ReLU的卷积层1x1 c640 s1、跟随有泄漏ReLU的卷积层1x1c512 s1以及卷积层1x1 c384 s1。可以修改和/或省略图15中所示的一个或更多个层。可以将另外的层添加到熵参数NN(917)。Figure 15 illustrates an exemplary CNN with an entropy parameter NN (917) according to an embodiment of the present disclosure. For example, the entropy parameter NN (917) includes a convolutional layer 1x1 c640 s1 followed by a leaky ReLU, a convolutional layer 1x1 c512 s1 followed by a leaky ReLU, and a convolutional layer 1x1 c384 s1. One or more layers shown in Figure 15 may be modified and/or omitted. Additional layers may be added to the entropy parameter NN (917).

如参照图10至图15所描述的，NIC框架(900)可以使用CNN来实现。NIC框架(900)可以被适当地调整，以使得使用任何适当类型的神经网络(例如，基于CNN或基于非CNN的神经网络)来实现NIC框架(900)中的一个或更多个部件(例如，(911)、(915)、(916)、(917)、(921)和/或(925))。NIC框架(900)的一个或更多个其他部件可以使用神经网络来实现。As described with reference to Figures 10 to 15, the NIC framework (900) can be implemented using a CNN. The NIC framework (900) can be appropriately adapted to enable the implementation of one or more components of the NIC framework (900) (e.g., (911), (915), (916), (917), (921), and/or (925)) using any suitable type of neural network (e.g., CNN-based or non-CNN-based neural networks). One or more other components of the NIC framework (900) can be implemented using neural networks.

包括神经网络(例如，CNN)的NIC框架(900)可以被训练以学习神经网络中使用的参数。例如，当使用CNN时，可以在训练过程中分别学习由θ₁至θ₆表示的参数，诸如主编码器网络(911)中的卷积核中使用的权重和偏置(如果在主编码器网络(911)中使用偏置)、主解码器网络(915)中的卷积核中使用的权重和偏置(如果在主解码器网络(915)中使用偏置)、超编码器(921)中的卷积核中使用的权重和偏置(如果在超编码器(921)中使用偏置)、超解码器(925)中的卷积核中使用的权重和偏置(如果在超解码器(925)中使用偏置)、上下文模型NN(916)中的卷积核中使用的权重和偏置(如果在上下文模型NN(916)中使用偏置)以及熵参数NN(917)中的卷积核中使用的权重和偏置(如果在熵参数NN(917)中使用偏置)。The NIC framework (900), which includes neural networks (e.g., CNNs), can be trained to learn the parameters used in the neural networks. For example, when using a CNN, parameters represented by _θ1 to _θ6 can be learned during training, such as the weights and biases used in the convolutional kernels of the master encoder network (911) (if biases are used in the master encoder network (911)), the weights and biases used in the convolutional kernels of the master decoder network (915) (if biases are used in the master decoder network (915)), the weights and biases used in the convolutional kernels of the super encoder (921) (if biases are used in the super encoder (921)), the weights and biases used in the convolutional kernels of the super decoder (925) (if biases are used in the super decoder (925)), the weights and biases used in the convolutional kernels of the context model NN (916) (if biases are used in the context model NN (916)), and the weights and biases used in the convolutional kernels of the entropy parameter NN (917) (if biases are used in the entropy parameter NN (917)).

在示例中，参照图10，主编码器网络(911)包括四个卷积层，其中每个卷积层具有5×5的卷积核和192个通道。因此，在主编码器网络(911)中的卷积核中使用的权重数量为19200(即，4×5×5×192)。主编码器网络(911)中使用的参数包括19200权重和可选的偏置。当在主编码器网络(911)中使用偏置和/或另外的NN时，可以包括另外的参数。In the example, referring to Figure 10, the master encoder network (911) includes four convolutional layers, each with a 5×5 kernel and 192 channels. Therefore, the number of weights used in the convolutional kernels of the master encoder network (911) is 19200 (i.e., 4×5×5×192). The parameters used in the master encoder network (911) include 19200 weights and optional biases. Additional parameters may be included when biases and/or additional neural networks are used in the master encoder network (911).

参照图9B，NIC框架(900)包括至少一个建立在神经网络上的部件或模块。至少一个部件可以包括主编码器网络(911)、主解码器网络(915)、超编码器(921)、超解码器(925)、上下文模型NN(916)和熵参数NN(917)中的一个或更多个。至少一个部件可以被单独训练。在示例中，训练过程用于分别学习每个部件的参数。至少一个部件可以作为组被联合训练。在示例中，训练过程用于联合学习至少一个部件的子集的参数。在示例中，训练过程用于学习所有至少一个部件的参数，并因此被称为E2E优化。Referring to Figure 9B, the NIC framework (900) includes at least one component or module built on a neural network. The at least one component may include one or more of a master encoder network (911), a master decoder network (915), a super encoder (921), a super decoder (925), a context model NN (916), and an entropy parameter NN (917). The at least one component can be trained individually. In the example, the training process is used to learn the parameters of each component separately. The at least one component can be jointly trained as a group. In the example, the training process is used to jointly learn the parameters of a subset of at least one component. In the example, the training process is used to learn the parameters of all at least one component, and is therefore referred to as E2E optimization.

在NIC框架(900)中的一个或更多个部件的训练过程中，可以对一个或更多个部件的权重(或权重系数)进行初始化。在示例中，基于预训练的相应神经网络模型(例如，DNN模型、CNN模型)对权重进行初始化。在示例中，通过将权重设置为随机数来对权重进行初始化。During the training of one or more components in the NIC framework (900), the weights (or weight coefficients) of one or more components can be initialized. In the example, the weights are initialized based on the pre-trained corresponding neural network model (e.g., DNN model, CNN model). In the example, the weights are initialized by setting them to random numbers.

例如，在对权重进行初始化之后，可以使用训练块的集合来训练一个或更多个部件。训练块的集合可以包括具有任何合适大小的任何合适的块。在一些示例中，训练块的集合包括来自空间域中的原始图像、自然图像、计算机生成的图像等的块。在一些示例中，训练块的集合包括来自在空间域中具有残差数据的残差块或残差图像的块。残差数据可以由残差计算器(例如，残差计算器(723))来计算。在一些示例中，原始图像和/或包括残差数据的残差图像可以直接用于训练NIC框架(例如，NIC框架(900))中的神经网络。因此，原始图像、残差图像、来自原始图像的块和/或来自残差图像的块可以用于训练NIC框架中的神经网络。For example, after initializing the weights, a set of training blocks can be used to train one or more components. The set of training blocks can include any suitable blocks of any appropriate size. In some examples, the set of training blocks includes blocks from the original image, natural image, computer-generated image, etc., in the spatial domain. In some examples, the set of training blocks includes blocks from residual blocks or residual images that have residual data in the spatial domain. The residual data can be computed by a residual calculator (e.g., residual calculator (723)). In some examples, the original image and/or the residual image including the residual data can be directly used to train a neural network in the NIC framework (e.g., NIC framework (900)). Therefore, the original image, the residual image, the blocks from the original image, and/or the blocks from the residual image can be used to train a neural network in the NIC framework.

为了简洁起见，使用训练块作为示例来描述以下训练过程。该描述可以适当地适于训练图像。训练块的集合中的训练块t可以通过图9B中的编码过程以生成压缩表示(例如，例如到比特流的编码信息)。编码信息可以通过图9B中描述的解码过程以计算和重构重构块For simplicity, training blocks are used as an example to describe the following training process. This description can be appropriately adapted to training images. Training block t in the set of training blocks can be encoded using the encoding process in Figure 9B to generate a compressed representation (e.g., encoded information to a bitstream). The encoded information can be decoded using the decoding process described in Figure 9B to compute and reconstruct the reconstructed block.

对于NIC框架(900)，使两个竞争目标例如重构质量和比特消耗平衡。质量损失函数(例如，失真或失真损失)可以用于指示重构质量，诸如重构(例如，重构块)与原始块(例如，训练块t)之间的差异。速率(或速率损失)R可以用于指示压缩表示的比特消耗。在示例中，速率损失R还包括例如在确定上下文模型时使用的边信息。For the NIC framework (900), two competing objectives, such as reconstruction quality and bit consumption, are balanced. A quality loss function (e.g., distortion or distortion loss) can be used to indicate reconstruction quality, such as the difference between the reconstructed block (e.g., the reconstructed block) and the original block (e.g., the training block t). A rate (or rate loss) R can be used to indicate the bit consumption of the compressed representation. In the example, the rate loss R also includes, for example, side information used when determining the context model.

对于神经图像压缩，可以在E2E优化中使用量化的可微分近似。在各种示例中，在基于神经网络的图像压缩的训练过程中，使用噪声注入来模拟量化，因此，量化是通过噪声注入来模拟而不是由量化器(例如，量化器(912))来执行。因此，使用噪声注入进行的训练可以以变化方式逼近量化误差。每像素比特(BPP)估计器可以用于模拟熵编码器，因此，熵编码由BPP估计器模拟而不是由熵编码器(例如，(913))和熵解码器(例如，(914))执行。因此，训练过程中式1所示的损失函数L中的速率损失R例如可以基于噪声注入和BPP估计器来估计。通常，较高的速率R可以允许较低的失真D，而较低的速率R会导致较高的失真D。因此，式1中的权衡超参数λ可以用于优化联合R-D损失L，其中作为λD和R的总和的L可以被优化。训练过程可以用于调整NIC框架(900)中的一个或更多个部件(例如(911)(915))的参数，使得联合R-D损失L被最小化或优化。在一些示例中，权衡超参数λ可以根据下式用于优化联合率失真(R-D)损失：For neural image compression, a differentiable approximation of quantization can be used in E2E optimization. In various examples, noise injection is used to simulate quantization during the training process of neural network-based image compression. Thus, quantization is simulated by noise injection rather than performed by a quantizer (e.g., quantizer (912)). Therefore, training using noise injection can approximate the quantization error in a varied manner. A bit-per-pixel (BPP) estimator can be used to simulate an entropy encoder. Thus, entropy encoding is simulated by the BPP estimator rather than performed by an entropy encoder (e.g., (913)) and an entropy decoder (e.g., (914)). Therefore, the rate loss R in the loss function L shown in Equation 1 during training can be estimated, for example, based on noise injection and the BPP estimator. Generally, a higher rate R allows for a lower distortion D, while a lower rate R results in a higher distortion D. Therefore, the tradeoff hyperparameter λ in Equation 1 can be used to optimize the joint R-D loss L, where L, as the sum of λD and R, can be optimized. The training process can be used to tune the parameters of one or more components (e.g., (911)(915)) of the NIC framework (900) such that the joint R-D loss L is minimized or optimized. In some examples, the tradeoff hyperparameter λ can be used to optimize the joint rate-distortion (R-D) loss according to the following formula:

其中，E测量解码块残差与编码之前的原始块残差相比的失真，其充当残差编码/解码DNN和编码/解码DNN的正则化损失。β是平衡正则化损失的重要性的超参数。Here, E measures the distortion of the decoded block residual compared to the original block residual before encoding, and it serves as the regularization loss for both the residual encoding/decoding DNN and the encoding/decoding DNN. β is a hyperparameter that balances the importance of the regularization loss.

可以使用各种模型来确定失真损失D和速率损失R，并因此确定式1中的联合R-D损失L。在示例中，失真损失被表示为峰值信噪比(PSNR)，该峰值信噪比是基于均方误差、多尺度结构相似性(MS-SSIM)质量指数、PSNR和MS-SSIM的加权组合等的度量。Various models can be used to determine the distortion loss D and the rate loss R, and thus the joint R-D loss L in Equation 1. In the example, the distortion loss is expressed as the peak signal-to-noise ratio (PSNR), which is a measure based on mean squared error, multi-scale structural similarity (MS-SSIM) quality index, a weighted combination of PSNR and MS-SSIM, etc.

在示例中，训练过程的目标是训练诸如要在编码器侧使用的视频编码器的编码神经网络(例如，编码DNN)以及诸如要在解码器侧使用的视频解码器的解码神经网络(例如，解码DNN)。在示例中，参照图9B，编码神经网络可以包括主编码器网络(911)、超编码器(921)、超解码器(925)、上下文模型NN(916)和熵参数NN(917)。解码神经网络可以包括主解码器网络(915)、超解码器(925)、上下文模型NN(916)和熵参数NN(917)。视频编码器和/或视频解码器可以包括基于NN和/或不基于NN的其他部件。In this example, the goal of the training process is to train an encoding neural network (e.g., an encoding DNN) for a video encoder to be used on the encoder side and a decoding neural network (e.g., a decoding DNN) for a video decoder to be used on the decoder side. Referring to FIG9B, the encoding neural network may include a master encoder network (911), a super encoder (921), a super decoder (925), a context model NN (916), and an entropy parameter NN (917). The decoding neural network may include a master decoder network (915), a super decoder (925), a context model NN (916), and an entropy parameter NN (917). The video encoder and/or video decoder may include other components based on and/or not based on NN.

可以以E2E方式训练NIC框架(例如，NIC框架(900))。在示例中，编码神经网络和解码神经网络在训练过程中基于反向传播梯度以E2E方式被联合更新。The NIC framework can be trained in an E2E manner (e.g., NIC framework (900)). In the example, the encoding neural network and the decoding neural network are jointly updated in an E2E manner based on backpropagation gradients during training.

在NIC框架(900)中的神经网络的参数被训练之后，NIC框架(900)中的一个或更多个部件可以用于对块进行编码和/或解码。在实施方式中，在编码器侧，视频编码器被配置成将输入块x编码成要在比特流中传输的编码块(931)。视频编码器可以包括NIC框架(900)中的多个部件。在实施方式中，在解码器侧，对应的视频解码器被配置成将比特流中的编码块(931)解码成重构块视频解码器可以包括NIC框架(900)中的多个部件。After the parameters of the neural network in the NIC framework (900) are trained, one or more components of the NIC framework (900) can be used to encode and/or decode blocks. In an embodiment, on the encoder side, the video encoder is configured to encode the input block x into a coded block (931) to be transmitted in the bitstream. The video encoder may include multiple components in the NIC framework (900). In an embodiment, on the decoder side, a corresponding video decoder is configured to decode the coded block (931) in the bitstream into a reconstructed block. The video decoder may include multiple components in the NIC framework (900).

在示例中，视频编码器包括NIC框架(900)中的所有部件。In the example, the video encoder includes all components in the NIC framework (900).

图16A示出了根据本公开内容的实施方式的示例性视频编码器(1600A)。例如，视频编码器(1600A)包括参照图9B描述的主编码器网络(911)、量化器(912)、熵编码器(913)和第二子NN(952)。图16B示出了根据本公开内容的实施方式的示例性视频解码器(1600B)。视频解码器(1600B)可以对应于视频编码器(1600A)。视频解码器(1600B)可以包括主解码器网络(915)、熵解码器(914)、上下文模型NN(916)、熵参数NN(917)、熵解码器(924)和超解码器(925)。参照图16A至图16B，在编码器侧，视频编码器(1600A)可以生成要在比特流中传输的编码块(931)和编码比特(932)。在解码器侧，视频解码器(1600B)可以接收编码块(931)和编码比特(932)并对编码块(931)和编码比特(932)进行解码。Figure 16A illustrates an exemplary video encoder (1600A) according to an embodiment of the present disclosure. For example, the video encoder (1600A) includes a master encoder network (911), a quantizer (912), an entropy encoder (913), and a second sub-NN (952) as described with reference to Figure 9B. Figure 16B illustrates an exemplary video decoder (1600B) according to an embodiment of the present disclosure. The video decoder (1600B) may correspond to the video encoder (1600A). The video decoder (1600B) may include a master decoder network (915), an entropy decoder (914), a context model NN (916), an entropy parameter NN (917), an entropy decoder (924), and a super decoder (925). Referring to Figures 16A and 16B, on the encoder side, the video encoder (1600A) may generate coded blocks (931) and coded bits (932) to be transmitted in the bitstream. On the decoder side, the video decoder (1600B) can receive the coded block (931) and the coded bits (932) and decode the coded block (931) and the coded bits (932).

图17至图18分别示出了根据本公开内容的实施方式的示例性视频编码器(1700)和相应的视频解码器(1800)。参照图17，编码器(1700)包括主编码器网络(911)、量化器(912)和熵编码器(913)。参照图9B描述主编码器网络(911)、量化器(912)和熵编码器(913)的示例。参照图18，视频解码器(1800)包括主解码器网络(915)和熵解码器(914)。参照图9B描述主解码器网络(915)和熵解码器(914)的示例。参照图17和图18，视频编码器(1700)可以生成要在比特流中包括的编码块(931)。视频解码器(1800)可以接收编码块(931)并对编码块(931)进行解码。Figures 17 and 18 illustrate exemplary video encoders (1700) and corresponding video decoders (1800) according to embodiments of the present disclosure. Referring to Figure 17, the encoder (1700) includes a master encoder network (911), a quantizer (912), and an entropy encoder (913). An example of the master encoder network (911), quantizer (912), and entropy encoder (913) is described with reference to Figure 9B. Referring to Figure 18, the video decoder (1800) includes a master decoder network (915) and an entropy decoder (914). An example of the master decoder network (915) and entropy decoder (914) is described with reference to Figure 9B. Referring to Figures 17 and 18, the video encoder (1700) can generate coded blocks (931) to be included in the bitstream. The video decoder (1800) can receive the coded blocks (931) and decode the coded blocks (931).

视频编码技术可以包括滤波操作，对重构样本执行滤波操作使得可以减少伪影(例如，块效应)。在这样的滤波操作中可以使用去块滤波处理，其中，可以对相邻块之间的块边界(例如边界区域)进行滤波，使得可以实现样本值从一个块到另一个块的更平滑的过渡。Video coding techniques may include filtering operations that reduce artifacts (e.g., blockiness) by applying filtering to reconstructed samples. Deblocking filtering can be used in such filtering operations, where block boundaries (e.g., boundary regions) between adjacent blocks can be filtered to achieve a smoother transition of sample values from one block to another.

在一些相关示例(例如，HEVC)中，可以将去块处理或去块滤波处理应用于相邻于块边界的样本。可以按照与解码处理相同的顺序对每个CU执行去块滤波处理。例如，可以通过首先对图像的垂直边界进行水平滤波，然后对图像的水平边界进行垂直滤波，来执行去块滤波处理。可以针对亮度分量和色度分量二者将滤波应用于被确定为要进行滤波的8×8块边界。在示例中，不对4×4块边界进行处理以降低复杂性。In some relevant examples (e.g., HEVC), deblocking or deblocking filtering can be applied to samples adjacent to block boundaries. Deblocking filtering can be performed on each CU in the same order as the decoding process. For example, deblocking can be performed by first horizontally filtering the vertical boundaries of the image, and then vertically filtering the horizontal boundaries. Filtering can be applied to both the luma and chroma components of the 8×8 block boundaries that are determined to be filtered. In this example, the 4×4 block boundaries are not processed to reduce complexity.

可以使用边界强度(boundary strength，BS)来指示去块滤波处理的程度或强度。在实施方式中，BS的值为2指示强滤波，BS的值为1指示弱滤波，并且BS的值为0指示没有去块滤波。Boundary strength (BS) can be used to indicate the degree or intensity of deblocking filtering. In this implementation, a BS value of 2 indicates strong filtering, a BS value of 1 indicates weak filtering, and a BS value of 0 indicates no deblocking filtering.

可以使用任何合适的方法，例如本公开内容中的实施方式，从编码视频比特流来重构图像的块。例如，使用例如包括神经网络(例如，CNN)的视频解码器(例如，(1600B)、(1800))来重构块。在一些示例中，使用不基于NN的视频解码器来重构块。可以使用一种或更多种去块方法(也称为去块、去块处理、去块滤波处理)来减少块之间的伪影，例如相邻重构块的边界区域中的伪影。去块可以减少例如由相邻重构块中的独立量化引起的相邻重构块之间的块伪影或块效应。可以利用去块模块对相邻重构块的边界区域执行去块。为了减少块之间的伪影，可以使用基于NN的去块模型。基于NN的去块模型可以是基于DNN的去块模型、基于CNN的去块模型等。可以使用包括DNN、CNN等的NN(例如，去块NN)来实现基于NN的去块模型。Blocks of an image can be reconstructed from an encoded video bitstream using any suitable method, such as the embodiments described in this disclosure. For example, blocks can be reconstructed using a video decoder (e.g., (1600B), (1800)) that includes a neural network (e.g., CNN). In some examples, a non-NN-based video decoder is used to reconstruct blocks. One or more deblocking methods (also called deblocking, deblocking processing, deblocking filtering) can be used to reduce artifacts between blocks, such as artifacts in the boundary regions of adjacent reconstructed blocks. Deblocking can reduce block artifacts or block effects between adjacent reconstructed blocks, for example, caused by independent quantization in adjacent reconstructed blocks. Deblocking can be performed on the boundary regions of adjacent reconstructed blocks using a deblocking module. To reduce artifacts between blocks, an NN-based deblocking model can be used. NN-based deblocking models can be DNN-based deblocking models, CNN-based deblocking models, etc. NN-based deblocking models can be implemented using NNs including DNNs, CNNs, etc. (e.g., deblocking NNs).

图19A至图19C示出了根据本公开内容的实施方式的示例性去块过程(1900)。参照图19A，图像(1901)可以包括多个块(1911)至(1914)。为简洁起见，图19A中示出了四个尺寸相等的块(1911)至(1914)。通常，可以将图像划分成任何合适数量的块，并且块的尺寸可以不同或相同，并且可以适当地调整描述。在一些示例中，可以通过去块对包括伪影的区域进行处理，该伪影例如由于将图像划分成块而引起。Figures 19A to 19C illustrate an exemplary deblocking process (1900) according to an embodiment of the present disclosure. Referring to Figure 19A, an image (1901) may include multiple blocks (1911) to (1914). For simplicity, four blocks (1911) to (1914) of equal size are shown in Figure 19A. Typically, an image can be divided into any suitable number of blocks, and the block sizes may be different or the same, and the description may be adapted accordingly. In some examples, deblocking can be used to process regions containing artifacts, such as those caused by dividing the image into blocks.

在示例中，块(1911)至(1914)是来自主解码器网络(915)的重构块。图像(1901)是重构图像。在一些示例中，重构图像(1901)包括残差数据。In the example, blocks (1911) through (1914) are reconstructed blocks from the master decoder network (915). Image (1901) is the reconstructed image. In some examples, the reconstructed image (1901) includes residual data.

重构块(1911)至(1914)的第一两个相邻重构块可以包括由第一共享边界(1941)分隔的块(1911)和(1913)。块(1911)和(1913)可以包括具有在第一共享边界(1941)的两侧上的样本的边界区域A。参照图19A至图19B，边界区域A可以包括分别位于块(1911)和(1913)中的子边界区域A1和A2。第一两个相邻重构块(1911)和(1913)可以包括具有在第一共享边界(1941)的两侧上的样本的边界区域A和在边界区域A外部的非边界区域(1921)和(1923)。The first two adjacent reconstruction blocks of reconstruction blocks (1911) to (1914) may include blocks (1911) and (1913) separated by a first shared boundary (1941). Blocks (1911) and (1913) may include a boundary region A having samples on both sides of the first shared boundary (1941). Referring to Figures 19A to 19B, boundary region A may include sub-boundary regions A1 and A2 located in blocks (1911) and (1913), respectively. The first two adjacent reconstruction blocks (1911) and (1913) may include a boundary region A having samples on both sides of the first shared boundary (1941) and non-boundary regions (1921) and (1923) outside boundary region A.

重构块(1911)至(1914)中的两个相邻重构块可以包括由第二共享边界(1942)分隔的块(1912)和(1914)。块(1912)和(1914)可以包括具有在第二共享边界(1942)的两侧上的样本的边界区域B。参照图19A至图19B，边界区域B可以包括分别位于块(1912)和(1914)中的子边界区域B1和B2。块(1912)和(1914)可以包括具有在共享边界(1942)的两侧上的样本的边界区域B和在边界区域B外部的非边界区域(1922)和(1924)。Two adjacent reconstruction blocks in reconstruction blocks (1911) to (1914) may include blocks (1912) and (1914) separated by a second shared boundary (1942). Blocks (1912) and (1914) may include boundary regions B having samples on both sides of the second shared boundary (1942). Referring to Figures 19A to 19B, boundary regions B may include sub-boundary regions B1 and B2 located in blocks (1912) and (1914), respectively. Blocks (1912) and (1914) may include boundary regions B having samples on both sides of the shared boundary (1942) and non-boundary regions (1922) and (1924) outside boundary regions B.

重构块(1911)至(1914)中的两个相邻重构块可以包括由共享边界(1943)分隔的块(1911)和(1912)。块(1911)和(1912)可以包括具有在共享边界(1943)的两侧上的样本的边界区域C。参照图19A至图19B，边界区域C可以包括分别位于块(1911)和(1912)中的子边界区域C1和C2。块(1911)和(1912)可以包括具有在共享边界(1943)的两侧上的样本的边界区域C和在边界区域C外部的非边界区域(1921)和(1922)。Two adjacent reconstruction blocks in reconstruction blocks (1911) to (1914) may include blocks (1911) and (1912) separated by a shared boundary (1943). Blocks (1911) and (1912) may include a boundary region C having samples on both sides of the shared boundary (1943). Referring to Figures 19A to 19B, the boundary region C may include sub-boundary regions C1 and C2 located in blocks (1911) and (1912), respectively. Blocks (1911) and (1912) may include a boundary region C having samples on both sides of the shared boundary (1943) and non-boundary regions (1921) and (1922) outside the boundary region C.

重构块(1911)至(1914)中的两个相邻重构块可以包括由共享边界(1944)分隔的块(1913)和(1914)。块(1913)和(1914)可以包括具有在共享边界(1944)的两侧上的样本的边界区域D。参照图19A至图19B，边界区域D可以包括分别位于块(1913)和(1914)中的子边界区域D1和D2。块(1913)和(1914)可以包括具有在共享边界(1944)的两侧上的样本的边界区域D和在边界区域D外部的非边界区域(1923)和(1924)。Two adjacent reconstruction blocks in reconstruction blocks (1911) to (1914) may include blocks (1913) and (1914) separated by a shared boundary (1944). Blocks (1913) and (1914) may include a boundary region D having samples on both sides of the shared boundary (1944). Referring to Figures 19A to 19B, the boundary region D may include sub-boundary regions D1 and D2 located in blocks (1913) and (1914), respectively. Blocks (1913) and (1914) may include the boundary region D having samples on both sides of the shared boundary (1944) and non-boundary regions (1923) and (1924) outside the boundary region D.

参照图19A，图像(1901)可以包括边界区域A至D和在边界区域A至D外部的非边界区域(1921)至(1924)。Referring to FIG19A, the image (1901) may include boundary regions A to D and non-boundary regions (1921) to (1924) outside the boundary regions A to D.

子边界区域A1至D1和A2至D2(以及边界区域A至D)可以具有任何合适的尺寸(例如，宽度和/或高度)。在图19A中所示的实施方式中，子边界区域A1、A2、B1和B2具有相同的m×n尺寸，其中n是块(1911)至(1914)的宽度，m是子边界区域A1、A2、B1和B2的高度。m和n都是正整数。在示例中，m是四个像素或四个样本。因此，边界区域A和B具有相同的2m×n尺寸。子边界区域C1、C2、D1和D2具有相同的n×m尺寸，其中n是块(1911)至(1914)的高度，m是子边界区域C1、C2、D1和D2的宽度。因此，边界区域C和D具有相同的n×2m尺寸。如上所述，子边界区域和边界区域可以具有不同的尺寸，例如不同的宽度、不同的高度等。例如，子边界区域A1和A2可以具有不同的高度。在示例中，子边界区域C1和C2可以具有不同的宽度。边界区域A和B可以具有不同的宽度。边界区域C和D可以具有不同的高度。Sub-boundary regions A1 to D1 and A2 to D2 (and boundary regions A to D) can have any suitable size (e.g., width and/or height). In the embodiment shown in Figure 19A, sub-boundary regions A1, A2, B1, and B2 have the same m×n size, where n is the width of blocks (1911) to (1914), and m is the height of sub-boundary regions A1, A2, B1, and B2. Both m and n are positive integers. In the example, m is four pixels or four samples. Therefore, boundary regions A and B have the same 2m×n size. Sub-boundary regions C1, C2, D1, and D2 have the same n×m size, where n is the height of blocks (1911) to (1914), and m is the width of sub-boundary regions C1, C2, D1, and D2. Therefore, boundary regions C and D have the same n×2m size. As mentioned above, sub-boundary regions and boundary regions can have different sizes, such as different widths, different heights, etc. For example, sub-boundary regions A1 and A2 can have different heights. In this example, sub-boundary regions C1 and C2 can have different widths. Boundary regions A and B can have different widths. Boundary regions C and D can have different heights.

在一些示例中，边界区域具有不同于矩形形状或正方形形状的形状。边界区域可以具有取决于相邻块的位置、形状和/或尺寸的不规则形状。In some examples, the boundary region has a shape that differs from a rectangular or square shape. The boundary region can have an irregular shape that depends on the position, shape, and/or size of the adjacent blocks.

参照图19A至图19B，边界区域A包括自第一共享边界(1941)起块(1911)中的m排样本(例如，m行样本)和自第一共享边界(1941)起块(1913)中的m排样本(例如，m行样本)。边界区域C包括自共享边界(1943)起块(1911)中的m排样本(例如，m列样本)和自共享边界(1943)起块(1912)中的m排样本(例如，m列样本)。Referring to Figures 19A and 19B, boundary region A includes m rows of samples (e.g., m columns of samples) in block (1911) starting from the first shared boundary (1941) and m rows of samples (e.g., m columns of samples) in block (1913) starting from the first shared boundary (1941). Boundary region C includes m columns of samples (e.g., m columns of samples) in block (1911) starting from the shared boundary (1943) and m columns of samples (e.g., m columns of samples) in block (1912) starting from the shared boundary (1943).

可以利用去块NN(1930)，例如基于DNN、CNN或任何合适的NN的去块NN来对边界区域A至D中的一个或更多个执行去块。在示例中，使用包括一个或更多个卷积层的CNN来实现去块NN(1930)。去块NN(1930)可以包括本公开内容中描述的附加层，例如(一个或更多个)池化层、(一个或更多个)全连接层、(一个或更多个)归一化层等。去块NN(1930)中的层可以以任何合适的顺序和以任何合适的架构(例如，前馈架构、循环架构)来布置。在示例中，卷积层之后是其他层，例如(一个或更多个)池化层、(一个或更多个)全连接层、(一个或更多个)归一化层等。Deblocking neural networks (1930), such as those based on DNNs, CNNs, or any suitable neural network, can be used to perform deblocking on one or more of the boundary regions A through D. In the example, a CNN comprising one or more convolutional layers is used to implement the deblocking neural network (1930). The deblocking neural network (1930) may include additional layers as described in this disclosure, such as pooling layers, fully connected layers, normalization layers, etc. The layers in the deblocking neural network (1930) can be arranged in any suitable order and with any suitable architecture (e.g., feedforward architecture, recurrent architecture). In the example, convolutional layers are followed by other layers, such as pooling layers, fully connected layers, normalization layers, etc.

可以利用去块NN(1930)对边界区域A至D执行去块。边界区域A至D中的一个或更多个包括伪影。伪影可能由相应的相邻块引起。可以将边界区域A至D中的一个或更多个发送至去块NN(1930)以减少伪影。因此，至去块NN(1930)的输入包括边界区域A至D中的一个或更多个，并且从去块NN(1930)的输出包括经去块的边界区域A至D中的一个或更多个。Deblocking can be performed on boundary regions A through D using a deblocking NN (1930). One or more of the boundary regions A through D contain artifacts. These artifacts may be caused by corresponding adjacent blocks. One or more of the boundary regions A through D can be sent to the deblocking NN (1930) to reduce artifacts. Therefore, the input to the deblocking NN (1930) includes one or more of the boundary regions A through D, and the output from the deblocking NN (1930) includes one or more of the deblocked boundary regions A through D.

参照图19B，边界区域A至D包括由相应的相邻块引起的伪影。可以将边界区域A至D发送至去块NN(1930)以减少伪影。从去块NN(1930)的输出包括经去块的边界区域A’至D’。在示例中，经去块的边界区域A’至D’中的伪影与边界区域A至D中的伪影相比得到了减少。Referring to Figure 19B, boundary regions A to D include artifacts caused by their respective neighboring blocks. Boundary regions A to D can be sent to the deblocking NN (1930) to reduce artifacts. The output from the deblocking NN (1930) includes the deblocked boundary regions A' to D'. In the example, the artifacts in the deblocked boundary regions A' to D' are reduced compared to the artifacts in the boundary regions A to D.

参照图19B和图19C，图像(1901)中的边界区域A至D被更新，例如通过被替换为经去块的边界区域A’至D’而被更新。因此，生成图像(1950)(例如，经去块的图像)并且图像(1950)(例如，经去块的图像)包括经去块的边界区域A’至D’和非边界区域(1921)至(1924)。Referring to Figures 19B and 19C, the boundary regions A to D in image (1901) are updated, for example, by being replaced with deblocked boundary regions A’ to D’. Thus, image (1950) (e.g., a deblocked image) is generated, and image (1950) (e.g., a deblocked image) includes the deblocked boundary regions A’ to D’ and non-boundary regions (1921) to (1924).

在一些示例中，具有边界区域A至D的图像(1901)被视为单个边界区域。将图像(1901)发送至去块NN(1930)以减少块效应。来自去块NN(1930)的输出包括经去块的图像(例如，图像(1950))，其中经去块的图像可以包括经去块的边界区域(例如，A’至D’)和非边界区域(1921)至(1924)。In some examples, an image (1901) with boundary regions A through D is treated as a single boundary region. Image (1901) is sent to a deblocking neural network (1930) to reduce blocking artifacts. The output from the deblocking neural network (1930) includes a deblocked image (e.g., image (1950)), which may include deblocked boundary regions (e.g., A’ through D’) and non-boundary regions (1921) through (1924).

在一些示例中，将包括边界区域的相邻块(例如，包括边界区域C的块(1911)至(1912))发送到去块NN(1930)中。来自去块NN(1930)的输出包括经去块的块，其中经去块的块可以包括经去块的边界区域(例如，C’)和非边界区域(1921)至(1922)。In some examples, adjacent blocks that include boundary regions (e.g., blocks (1911) to (1912) that include boundary region C) are sent to the deblocking NN (1930). The output from the deblocking NN (1930) includes deblocked blocks, which may include deblocked boundary regions (e.g., C’) and non-boundary regions (1921) to (1922).

一个或更多个样本可以在多个边界区域中。当多个边界区域被对应的经去块的边界区域替换时，可以使用任何合适的方法来确定一个或更多个共享样本中的一个共享样本的值。One or more samples can exist in multiple boundary regions. When multiple boundary regions are replaced by corresponding deblocked boundary regions, the value of one shared sample among one or more shared samples can be determined using any suitable method.

参照图19A，样本S在边界区域A和C中。在获得边界区域A’和C’之后，可以使用以下方法来获得样本S的值。在示例中，边界区域A被经去块的边界区域A’替换，并且随后，边界区域C被经去块的边界区域C’替换。因此，样本S的值由经去块的边界区域C’中的样本S的值确定。Referring to Figure 19A, sample S is located in boundary regions A and C. After obtaining boundary regions A' and C', the value of sample S can be obtained using the following method. In the example, boundary region A is replaced by the deblocked boundary region A', and subsequently, boundary region C is replaced by the deblocked boundary region C'. Therefore, the value of sample S is determined by the value of sample S in the deblocked boundary region C'.

在示例中，边界区域C被经去块的边界区域C’替换，并且随后，边界区域A被经去块的边界区域A’替换。因此，样本S的值由经去块的边界区域A’中的样本S的值确定。In the example, boundary region C is replaced by the deblocked boundary region C', and subsequently, boundary region A is replaced by the deblocked boundary region A'. Therefore, the value of sample S is determined by the value of sample S in the deblocked boundary region A'.

在示例中，样本S的值由经去块的边界区域A’中的样本S的值和经去块的边界区域C’中的样本S的值的平均(例如，加权平均)来确定。In the example, the value of sample S is determined by the average (e.g., weighted average) of the values of sample S in the deblocked boundary region A’ and the values of sample S in the deblocked boundary region C’.

边界区域可以包括多于两个块的样本。图20示出了根据本公开内容的实施方式的示例性去块过程，其中边界区域可以包括多于两个块的样本。单个边界区域AB可以包括边界区域A和B。边界区域AB可以包括两个相邻重构块(1911)与(1913)之间的共享边界(1941)的两侧上的样本，并且包括两个相邻重构块(1912)与(1914)之间的共享边界(1942)的两侧上的样本。单个边界区域CD可以包括边界区域C和D。边界区域CD可以包括两个相邻重构块(1911)与(1912)之间的共享边界(1943)的两侧上的样本，并且包括两个相邻重构块(1913)与(1914)之间的共享边界(1944)的两侧上的样本。Boundary regions may include samples from more than two blocks. Figure 20 illustrates an exemplary deblocking process according to an embodiment of the present disclosure, wherein boundary regions may include samples from more than two blocks. A single boundary region AB may include boundary regions A and B. Boundary region AB may include samples on both sides of a shared boundary (1941) between two adjacent reconstructed blocks (1911) and (1913), and includes samples on both sides of a shared boundary (1942) between two adjacent reconstructed blocks (1912) and (1914). A single boundary region CD may include boundary regions C and D. Boundary region CD may include samples on both sides of a shared boundary (1943) between two adjacent reconstructed blocks (1911) and (1912), and includes samples on both sides of a shared boundary (1944) between two adjacent reconstructed blocks (1913) and (1914).

去块NN(诸如去块NN(1930))可以对边界区域AB和CD中的一个或更多个边界区域执行去块以生成边界区域的一个或更多个经去块的边界区域。参照图20，边界区域AB和CD被发送至去块NN(1930)并且生成经去块的边界区域AB’和CD’。经去块的边界区域AB’和CD’可以替换图像(1901)中的边界区域AB和CD，从而生成图像(2050)。图像(2050)可以包括经去块的边界区域AB’至CD’和非边界区域(1921)至(1924)。图像(2050)可以与图像(1950)相同或不同。A deblocking NN (such as deblocking NN (1930)) can perform deblocking on one or more boundary regions AB and CD to generate one or more deblocked boundary regions. Referring to FIG20, boundary regions AB and CD are sent to deblocking NN (1930) and deblocked boundary regions AB’ and CD’ are generated. Deblocked boundary regions AB’ and CD’ can replace boundary regions AB and CD in image (1901) to generate image (2050). Image (2050) may include deblocked boundary regions AB’ to CD’ and non-boundary regions (1921) to (1924). Image (2050) may be the same as or different from image (1950).

根据本公开内容的实施方式，可以使用多模型去块方法。可以将不同的去块模型应用于不同类型或类别的边界区域，以去除伪影。可以应用分类模块以将边界区域分类成不同的类别。可以应用任何分类模块。在示例中，分类模块基于NN。在示例中，分类模块不基于NN。可以根据相应类别将边界区域发送至不同的去块模型。According to embodiments of this disclosure, a multi-model deblocking method can be used. Different deblocking models can be applied to boundary regions of different types or categories to remove artifacts. A classification module can be applied to classify the boundary regions into different categories. Any classification module can be applied. In the example, the classification module is based on a neural network (NN). In the example, the classification module is not based on an NN. The boundary region can be sent to different deblocking models according to the corresponding category.

在实施方式中，去块NN包括分别基于不同的去块模型实现的多个去块NN。例如通过分类模块可以确定将多个去块NN中的哪一个应用于边界区域。可以利用确定的去块NN对边界区域执行去块。在示例中，通过基于NN(例如，分类NN)——诸如DNN、CNN等——的分类模块确定应用多个去块NN中的哪一个。In this implementation, the deblocking neural network (NN) comprises multiple NNs, each implemented based on a different deblocking model. For example, a classification module can determine which of the multiple NNs should be applied to the boundary region. Deblocking of the boundary region can then be performed using the determined NN. In this example, a classification module based on an NN (e.g., a classification NN)—such as a DNN, CNN, etc.—determines which of the multiple NNs to apply.

在实施方式中，去块NN包括实现多模型去块的单个NN。In implementation, the deblocking NN includes a single NN that implements multi-model deblocking.

图21示出了根据本公开内容的实施方式的基于多个去块模型的示例性去块过程(2100)。分类模块(2110)可以将边界区域A至D分类成一个或更多个类别。例如，将边界区域C至D分类成第一类别，将边界区域B分类成第二类别，以及将边界区域A分类成第三类别。可以将不同的去块模型应用于不同类别中的边界区域。在图21中，可以使用去块NN(2130)执行去块，诸如基于多个去块模型(例如，去块模型1至L)的多模型去块。L是正整数。当L为1时，去块NN(2130)包括单个去块模型。当L大于1时，去块NN(2130)包括多个去块模型。去块NN(2130)可以使用单个NN或多个NN来实现。Figure 21 illustrates an exemplary deblocking process (2100) based on multiple deblocking models according to an embodiment of the present disclosure. A classification module (2110) can classify boundary regions A through D into one or more categories. For example, boundary regions C through D can be classified into a first category, boundary region B into a second category, and boundary region A into a third category. Different deblocking models can be applied to boundary regions in different categories. In Figure 21, deblocking can be performed using a deblocking neural network (2130), such as multi-model deblocking based on multiple deblocking models (e.g., deblocking models 1 through L). L is a positive integer. When L is 1, the deblocking neural network (2130) includes a single deblocking model. When L is greater than 1, the deblocking neural network (2130) includes multiple deblocking models. The deblocking neural network (2130) can be implemented using a single neural network or multiple neural networks.

在示例中，将去块模型1应用于第一类别中的边界区域(例如，C和D)，并且生成经去块的边界区域(例如，C”和D”)。将去块模型2应用于第二类别中的边界区域(例如，B)，并且生成经去块的边界区域(例如B”)。将去块模型3应用于第三类别中的边界区域(例如A)，并且生成经去块的边界区域(例如A”)。经去块的边界区域A”至D”可以替换图像(1901)中对应的边界区域A至D，从而生成图像(2150)。图像(2150)可以包括经去块的边界区域A”至D”和非边界区域(2121)至(2124)。In the example, deblocking model 1 is applied to the boundary regions (e.g., C and D) in the first category, and deblocked boundary regions (e.g., C” and D”) are generated. Deblocking model 2 is applied to the boundary regions (e.g., B) in the second category, and deblocked boundary regions (e.g., B”) are generated. Deblocking model 3 is applied to the boundary regions (e.g., A) in the third category, and deblocked boundary regions (e.g., A”) are generated. The deblocked boundary regions A” to D” can replace the corresponding boundary regions A to D in image (1901) to generate image (2150). Image (2150) may include the deblocked boundary regions A” to D” and non-boundary regions (2121) to (2124).

可以应用任何合适的度量来对边界区域进行分类或归类。在示例中，根据边界区域的内容对边界区域进行分类。例如，将具有高频内容(例如，具有相对大的方差的内容)的边界区域和具有低频内容(例如具有相对小的方差的内容)的边界区域分类成与不同的去块模型相对应的不同类别。可以使用边界区域中的伪影的强度对边界区域进行分类。可以将多模型去块方法应用于任何合适的边界区域，诸如两个或更多个块之间的边界区域(例如，A、B、C、D、AB和/或CD)。可以基于边界区域内的样本的最大差异来确定边界区域的频率。在示例中，确定共享边界的第一侧中的第一边缘附近的样本的第一差异。在示例中，确定共享边界的第二侧中的第二边缘附近的样本的第二差异。在示例中，确定第一差异和第二差异。Any suitable metric can be applied to classify or categorize boundary regions. In the example, boundary regions are classified based on their content. For example, boundary regions with high-frequency content (e.g., content with relatively large variance) and boundary regions with low-frequency content (e.g., content with relatively small variance) are classified into different categories corresponding to different deblocking models. Boundary regions can be classified using the intensity of artifacts within them. Multi-model deblocking methods can be applied to any suitable boundary region, such as the boundary region between two or more blocks (e.g., A, B, C, D, AB, and/or CD). The frequency of a boundary region can be determined based on the maximum difference between samples within it. In the example, the first difference is determined for samples near the first edge on the first side of a shared boundary. In the example, the second difference is determined for samples near the second edge on the second side of a shared boundary. In the example, both the first and second differences are determined.

可以应用去块NN(例如，图19B中的去块NN(1930)或图21中的去块NN(2130))去除块之间的伪影。在示例中，靠近共享边界的样本或像素可以比远离共享边界的样本(或像素)被更大程度地去块。返回参照图19A，样本S比样本F靠近共享边界(1941)，因此样本S可以比样本F被更大程度地去块。Artifacts between blocks can be removed by applying a deblocking neural network (e.g., deblocking neural network (1930) in Figure 19B or deblocking neural network (2130) in Figure 21). In the example, samples or pixels closer to the shared boundary can be deblocked to a greater extent than samples (or pixels) farther from the shared boundary. Referring back to Figure 19A, sample S is closer to the shared boundary (1941) than sample F, therefore sample S can be deblocked to a greater extent than sample F.

去块NN(例如，去块NN(1930)或去块NN(2130))中的去块模型可以包括一个或更多个卷积层。例如，可以使用基于CNN的注意力机制(例如，非局部注意力、挤压和激励网络(Squeeze-and-Excitation Network，SENet))、残差神经网络(residual neural network，ResNet)(例如，包括一组CNN或卷积神经网络以及激活函数)等。例如，例如，可以通过将输出大小改变至与输入大小相同来使用由图像超分辨率使用的DNN。在图像超分辨率中，可以将图像的分辨率从低分辨率提高至高分辨率。Deblocking models in deblocking neural networks (e.g., deblocking NN (1930) or deblocking NN (2130)) can include one or more convolutional layers. For example, CNN-based attention mechanisms (e.g., non-local attention, squeeze-and-excitation network (SENet)), residual neural networks (ResNet) (e.g., comprising a set of CNNs or convolutional neural networks and activation functions), etc., can be used. For example, a DNN used by image super-resolution can be used by changing the output size to the same size as the input. In image super-resolution, the resolution of an image can be increased from a low resolution to a high resolution.

上面描述了如何使用NN或其他基于学习的方法对边界区域执行去块。在一些示例中，视频编码器和/或视频解码器可以在基于NN的去块方法或不基于NN的去块方法之间进行选择。可以在各种级别(诸如切片级别、图片级别、针对一组图片、序列级别等)上进行选择。该选择可以使用标志来用信令通知。该选择可以根据边界区域的内容来推断。The above describes how to perform deblocking on boundary regions using neural networks (NNs) or other learning-based methods. In some examples, the video encoder and/or video decoder can choose between NN-based or non-NN-based deblocking methods. Selection can be made at various levels (such as slice level, image level, for a set of images, sequence level, etc.). This selection can be signaled using flags. The selection can also be inferred from the content of the boundary region.

视频编码器和/或视频解码器可以应用除了例如对像素或样本的NN得出的调整在默认水平的边界强度(boundary strength，BS)的本公开内容中描述的方法和实施方式之外的各种水平的边界强度。可以通过分析边界条件和块编码特征分配不同水平的BS以修改(例如，放大或缩小)默认调整。Video encoders and/or video decoders can apply various levels of boundary strength beyond those described in the methods and implementations of this disclosure, such as adjustments to the default level of boundary strength (BS) derived from a neural network of pixels or samples. Different levels of BS can be assigned by analyzing boundary conditions and block coding features to modify (e.g., scale up or down) the default adjustment.

根据本公开内容的实施方式，相邻重构块的非边界区域中的一个或更多个可以用增强NN增强。According to embodiments of this disclosure, one or more non-boundary regions of adjacent reconstruction blocks can be enhanced using an enhanced NN.

去块NN(例如，(1930)或(2130))可以使用任何合适的滤波方法来实现，所述滤波方法可以减少伪影，例如由块效应导致的伪影，并改进视觉质量。去块NN(例如，(1930)或(2130))可以包括任何神经网络架构(例如，前馈架构或递归架构)，可以包括任何数量和任何合适的层的组合，可以包括一个或更多个子神经网络，可以包括任何合适类型的参数(例如，权重、偏置、权重和偏置的组合)，并且可以包括任何合适数量的参数，如在本公开内容中所述。在示例中，使用DNN和/或CNN来实现去块NN(例如，(1930)或(2130))。Deblocking neural networks (e.g., (1930) or (2130)) can be implemented using any suitable filtering method that reduces artifacts, such as those caused by blocking, and improves visual quality. Deblocking neural networks (e.g., (1930) or (2130)) can include any neural network architecture (e.g., a feedforward architecture or a recursive architecture), can include any number and any suitable combination of layers, can include one or more sub-neural networks, can include any suitable type of parameters (e.g., weights, biases, combinations of weights and biases), and can include any suitable number of parameters as described in this disclosure. In the examples, deblocking neural networks (e.g., (1930) or (2130)) are implemented using DNNs and/or CNNs.

在示例中，使用包括一个或更多个卷积层的CNN实现去块NN。去块NN可以包括在本公开内容中描述的附加层，例如池化层、完全连接层、归一化层等。去块NN中的层可以以任何合适的顺序和任何合适的架构(例如，前馈架构或递归架构)布置。在示例中，卷积层之后是其他层，例如池化层、完全连接层、归一化层等。去块NN中的卷积层中的每个可以包括任何合适数量的通道以及任何合适的卷积核和步幅。In the example, a deblocking neural network is implemented using a CNN comprising one or more convolutional layers. The deblocking neural network may include additional layers described in this disclosure, such as pooling layers, fully connected layers, normalization layers, etc. The layers in the deblocking neural network can be arranged in any suitable order and with any suitable architecture (e.g., feedforward or recursive architecture). In the example, convolutional layers are followed by other layers, such as pooling layers, fully connected layers, normalization layers, etc. Each of the convolutional layers in the deblocking neural network may include any suitable number of channels and any suitable convolutional kernel and stride.

可以基于任何合适的训练图像或训练块来训练去块NN。去块NN的训练可以与上面参照图9B描述的去块NN的训练相似或相同。在实施方式中，训练图像可以包括具有由于块效应而导致的伪影的重构图像。在实施方式中，训练块可以包括具有由于块效应而导致的伪影的重构块。在实施方式中，去块NN(例如，(1930)或(2130))用在相应的训练块或训练图像中具有由于块效应引起的伪影的边界区域(例如，边界区域A至D、AB和/或CD)来训练。去块NN(例如，(1930)或(2130))可以在训练过程之后用训练参数来配置。经训练的去块NN可以对一个或更多个边界区域执行去块，例如在图19A至图19C、图20、图21、图22A和图22B中所示。去块NN(例如，(1930)或(2130))可以被应用于重构图像(例如，图像(1901))或重构块(例如，块(1911)至(1912))以减少块效应，例如参照图19B至图19C所述。去块NN(例如，(1930)或(2130))可以被应用于边界区域，如下面在图22A至图22B中所示。The deblocking neural network (NN) can be trained based on any suitable training image or training block. The training of the NN can be similar to or the same as the training of the NN described above with reference to FIG9B. In an embodiment, the training image may include a reconstructed image with artifacts due to blocking effects. In an embodiment, the training block may include a reconstructed block with artifacts due to blocking effects. In an embodiment, the NN (e.g., (1930) or (2130)) is trained using boundary regions (e.g., boundary regions A to D, AB, and/or CD) in the corresponding training block or training image that have artifacts due to blocking effects. The NN (e.g., (1930) or (2130)) can be configured with training parameters after the training process. The trained NN can perform deblocking on one or more boundary regions, as shown, for example, in FIGS. 19A to 19C, FIG. 20, FIG. 21, FIG. 22A, and FIG. 22B. Deblocking neural networks (e.g., (1930) or (2130)) can be applied to reconstruct an image (e.g., image (1901)) or reconstruct blocks (e.g., blocks (1911) to (1912)) to reduce blocking artifacts, as described, for example, with reference to Figures 19B to 19C. Deblocking neural networks (e.g., (1930) or (2130)) can be applied to boundary regions, as shown below in Figures 22A to 22B.

去块NN(例如，(1930)或(2130))可以被单独训练以确定去块NN中的参数。去块NN(例如，(1930)或(2130))可以作为NIC框架中的组件来训练，或者与NIC框架一起训练。例如，NIC框架(900)和去块NN(例如，(1930)或(2130))可以被联合训练。A deblocking neural network (e.g., (1930) or (2130)) can be trained separately to determine the parameters in the deblocking neural network. A deblocking neural network (e.g., (1930) or (2130)) can be trained as a component of the NIC framework, or trained together with the NIC framework. For example, the NIC framework (900) and the deblocking neural network (e.g., (1930) or (2130)) can be trained jointly.

在以下描述中，为了简洁起见，边界信号可以是指包括边界区域的重构图像、包括边界区域的相邻重构块、或者重构图像或相邻重构块中的边界区域。边界信号可能具有由于块效应而导致的伪影。在实施方式中，边界信号没有被去块NN去块。边界信号可以对应于编码信号。In the following description, for the sake of brevity, the boundary signal may refer to a reconstructed image including a boundary region, an adjacent reconstructed block including a boundary region, or a boundary region within a reconstructed image or an adjacent reconstructed block. The boundary signal may have artifacts due to blocking effects. In this implementation, the boundary signal is not deblocked by the deblocking neural network (NN). The boundary signal may correspond to the coded signal.

参照图9B、图19A和图19B，边界信号可以基于来自主解码器网络(915)的输出(例如，重构块)。对应于边界信号的编码信号可以基于来自熵编码器(913)的输出(例如，编码块)。Referring to Figures 9B, 19A, and 19B, the boundary signal can be based on the output (e.g., a reconstructed block) from the main decoder network (915). The encoded signal corresponding to the boundary signal can be based on the output (e.g., a encoded block) from the entropy encoder (913).

在示例中，边界信号包括包含边界区域的重构图像，例如包含来自主解码器网络(915)的重构块(例如，块(1911)至(1914))的图像(1901)，其中图像(1901)包括边界区域A至D。对应于重构图像的编码信号可以包括基于来自熵编码器(913)的输出(例如，编码块)的编码图像。In the example, the boundary signal includes a reconstructed image containing boundary regions, such as image (1901) containing reconstructed blocks (e.g., blocks (1911) to (1914)) from the master decoder network (915), where image (1901) includes boundary regions A to D. The encoded signal corresponding to the reconstructed image may include an encoded image based on the output (e.g., encoded blocks) from the entropy encoder (913).

在示例中，边界信号包括重构图像(例如，图像(1901))中的边界区域(例如，边界区域A、B、C、D、AB或CD)。对应于边界区域的编码信号可以包括基于来自熵编码器(913)的输出(例如，编码块)的编码图像。In the example, the boundary signal includes boundary regions (e.g., boundary regions A, B, C, D, AB, or CD) in the reconstructed image (e.g., image (1901)). The encoded signal corresponding to the boundary region may include an encoded image based on the output (e.g., a coded block) from the entropy encoder (913).

在示例中，边界信号包括包含边界区域的相邻重构块，例如来自主解码器网络(915)的块(1911)至(1912))。对应于相邻重构块的编码信号可以包括基于来自熵编码器(913)的输出的编码块(例如，相邻编码块)。In the example, the boundary signal includes adjacent reconstructed blocks containing boundary regions, such as blocks (1911) to (1912) from the master decoder network (915). The encoded signal corresponding to the adjacent reconstructed block may include encoded blocks (e.g., adjacent encoded blocks) based on the output from the entropy encoder (913).

在示例中，边界信号包括从主解码器网络(915)输出的相邻重构块中的边界区域(例如，边界区域A、B、C、D、AB或CD)。对应于边界区域的编码信号可以包括基于来自熵编码器(913)的输出的编码块(例如，相邻编码块)。In the example, the boundary signals include boundary regions (e.g., boundary regions A, B, C, D, AB, or CD) in adjacent reconstructed blocks output from the master decoder network (915). The encoded signals corresponding to the boundary regions may include encoded blocks (e.g., adjacent encoded blocks) based on the output from the entropy encoder (913).

去块可以被应用于边界信号(例如，重构图像或重构块中的边界区域、重构图像或相邻重构块)，以确定去块边界信号(例如，去块边界区域、去块图像或去块相邻重构块)。去块可以用于减少去块边界信号的R-D损耗L，其中R-D损耗L可以包括去块边界信号的失真损耗D和对应于边界信号的编码信号的速率损耗R。在实施方式中，编码信号的速率损耗R在去块过程中保持不变，因为编码信号(例如，包括编码块931的编码图像或包括图9B中的编码块931的编码块)不受去块过程的影响。因此，去块边界信号的R-D损耗L可以由依赖于去块过程的去块边界信号的失真损耗D来指示。可以应用去块来减少去块边界信号的失真损耗D，例如，去块边界信号的失真损耗D可以小于相应边界信号的失真损耗D。Deblocking can be applied to boundary signals (e.g., boundary regions in a reconstructed image or reconstructed block, a reconstructed image, or adjacent reconstructed blocks) to determine deblocking boundary signals (e.g., deblocking boundary regions, deblocked images, or deblocking adjacent reconstructed blocks). Deblocking can be used to reduce the R-D loss L of the deblocking boundary signals, where the R-D loss L can include the distortion loss D of the deblocking boundary signals and the rate loss R of the encoded signal corresponding to the boundary signals. In an embodiment, the rate loss R of the encoded signal remains unchanged during the deblocking process because the encoded signal (e.g., an encoded image including encoded block 931 or a encoded block including encoded block 931 in FIG. 9B) is not affected by the deblocking process. Therefore, the R-D loss L of the deblocking boundary signals can be indicated by the distortion loss D of the deblocking boundary signals, which depends on the deblocking process. Deblocking can be applied to reduce the distortion loss D of the deblocking boundary signals; for example, the distortion loss D of the deblocking boundary signals can be less than the distortion loss D of the corresponding boundary signals.

参照图9A、图9B以及图19A至图19C示出了去块的示例。将被编码或压缩的输入图像可以被分割成块(例如，输入块)，如参照图9A所述。参照图9B，输入块(例如，输入块x)中的每个可以由NIC框架(900)处理，以分别生成相应的编码块和重构块Examples of deblocking are shown with reference to Figures 9A, 9B, and 19A-19C. The input image to be encoded or compressed can be segmented into blocks (e.g., input blocks), as described with reference to Figure 9A. Referring to Figure 9B, each of the input blocks (e.g., input block x) can be processed by the NIC framework (900) to generate corresponding encoded and reconstructed blocks, respectively.

在示例中，要被去块的边界信号是包括重构块的重构图像。符号I、和分别用于表示输入图像、包括编码块的编码图像和包括重构块的重构图像。参照图19A至图19C，在示例中，重构图像(例如，图像(1901))被馈送到去块NN(例如，(1930))中以获得去块图像(例如，图像(1950))。根据本公开内容的实施方式，去块可以被实现为减少式9中所示的R-D损耗In the example, the boundary signal to be deblocked is the reconstructed image including the reconstructed block. Symbols I and II are used to represent the input image, the coded image including the coded block, and the reconstructed image including the reconstructed block, respectively. Referring to Figures 19A to 19C, in the example, the reconstructed image (e.g., image (1901)) is fed into the deblocking NN (e.g., (1930)) to obtain the deblocked image (e.g., image (1950)). According to embodiments of this disclosure, deblocking can be implemented to reduce the R-D loss shown in Equation 9.

其中，对应于去块图像的R-D损耗可以包括去块图像的失真损耗和编码图像的速率损耗The R-D loss corresponding to the deblocked image can include the distortion loss of the deblocked image and the rate loss of the coded image.

如上所述，编码图像的速率损耗在去块过程中保持不变。因此，可以基于去块图像的失真损耗来确定去块NN的性能。在示例中，重构图像的失真损耗被减少到去块图像的失真损耗其中失真损耗小于失真损耗As mentioned above, the rate loss of the encoded image remains unchanged during deblocking. Therefore, the performance of the deblocking neural network can be determined based on the distortion loss of the deblocked image. In the example, the distortion loss of the reconstructed image is reduced to the distortion loss of the deblocked image, where the distortion loss is less than the distortion loss.

如果要被去块的边界信号包括(i)相邻重构块(例如，(1911)至(1912))或(ii)边界区域(例如，边界区域A)，则上述描述可以适用。The above description may apply if the boundary signal to be deblocked includes (i) adjacent reconstructed blocks (e.g., (1911) to (1912)) or (ii) a boundary region (e.g., boundary region A).

将由去块NN去块的边界信号可以对应于将被编码并可选地将被包括在编码视频比特流中的输入信号(例如，输入图像或输入块)。可以由去块NN基于边界信号生成去块边界信号。如果要被去块的边界信号明显不同于用于训练去块NN的训练图像、训练块和/或训练边界区域，则去块边界信号可以对应于相对差的R-D损耗L或相对大的失真D。因此，本公开内容的各方面描述了用于去块，例如，用在逐块图像压缩中的内容自适应在线训练方法和装置。在用于去块的内容自适应在线训练中，可以基于边界信号来确定去块NN的一个或更多个参数(例如，权重和/或偏置)，其中边界信号基于要被压缩(例如，编码)和/或要被包括在编码视频比特流中的输入信号(例如，输入图像或输入块)来生成。输入信号可以包括原始数据或残差数据。为了简洁起见，基于要被去块的边界信号的用于去块的内容自适应在线训练在本公开内容中被表示为去块训练。The boundary signal to be deblocked by the deblocking neural network (NN) can correspond to the input signal (e.g., an input image or input block) to be encoded and optionally included in the encoded video bitstream. The deblocking boundary signal can be generated by the NN based on the boundary signal. If the boundary signal to be deblocked is significantly different from the training image, training block, and/or training boundary region used to train the NN, the deblocking boundary signal can correspond to a relatively poor R-D loss L or a relatively large distortion D. Therefore, aspects of this disclosure describe content-adaptive online training methods and apparatus for deblocking, for example, in block-by-block image compression. In content-adaptive online training for deblocking, one or more parameters (e.g., weights and/or biases) of the NN can be determined based on the boundary signal, which is generated based on the input signal (e.g., an input image or input block) to be compressed (e.g., encoded) and/or included in the encoded video bitstream. The input signal can include raw data or residual data. For brevity, content-adaptive online training for deblocking based on the boundary signal to be deblocked is referred to as deblocking training in this disclosure.

根据本公开内容的实施方式，指示去块NN的一个或更多个确定的参数(例如，确定的权重和/或确定的偏置)的去块信息可以被编码并可选地包括在编码视频比特流中。因此，R-D损耗L_p可以包括在编码视频比特流中发信号通知去块信息的速率损耗或比特消耗R(p)。在示例中，边界信号是重构图像并且L_p可以如在式10中所示确定，According to embodiments of this disclosure, deblocking information indicating one or more determined parameters of the deblocking neural network (e.g., determined weights and/or determined biases) can be encoded and optionally included in the encoded video bitstream. Therefore, the RD loss _Lp can include a rate loss or bit consumption R(p) that signals the deblocking information in the encoded video bitstream. In the example, the boundary signal is the reconstructed image and _Lp can be determined as shown in Equation 10.

其中，对应于去块边界信号的R-D损耗可以包括去块边界信号(例如，去块重构图像)的失真损耗编码信号(例如，编码图像)的速率损耗和速率损耗R(p)。如上所述，速率损耗在去块过程中保持不变，并且因此速率损耗在去块训练中保持不变。与式9中所示的R-D损耗相比，将速率损耗R(p)加到式10增加了R-D损耗和R(p)之和被表示为L_training。去块训练可以被实现为减少去块边界信号的失真损耗使得尽管速率损耗R(p)增加，项L_training小于在示例中，去块训练可以减少(例如，最小化)项L_training，使得项L_training小于在没有去块过程的情况下的相应重构信号的在示例中，去块训练可以减少(例如，最小化)项L_training，使得项L_training小于在没有去块训练的情况下使用去块NN生成的去块边界信号的λD。The RD loss corresponding to the deblocking boundary signal can include the distortion loss of the deblocking boundary signal (e.g., the deblocked reconstructed image), the rate loss of the encoded signal (e.g., the encoded image), and the rate loss R(p). As mentioned above, the rate loss remains constant during the deblocking process, and therefore remains constant during deblocking _training . Compared to the RD loss shown in Equation 9, adding the rate loss R(p) to Equation 10 increases the RD loss, and the sum of R(p) is expressed as L _training . Deblocking _training can be implemented to reduce the distortion loss of _the deblocking boundary signal such that, despite the increase in rate loss R(p), the term L _training _ is less than λD of the corresponding reconstructed signal without a deblocking process.

根据本公开内容的实施方式，在编码器侧，要被压缩或编码的输入图像可以被分割成块。参照图9B，例如，基于NIC框架(900)，可以将块编码成编码块。编码块可以由NIC框架(900)处理以生成重构块。According to embodiments of this disclosure, on the encoder side, the input image to be compressed or encoded can be segmented into blocks. Referring to FIG9B, for example, based on the NIC framework (900), the blocks can be encoded into encoded blocks. The encoded blocks can be processed by the NIC framework (900) to generate reconstructed blocks.

可以基于与重构块(例如，重构块中的相邻重构块)相关联的边界信号，在例如编码器侧实现去块训练。在示例中，相邻重构块包括重构图像中的重构块的子集。在示例中，相邻重构块包括重构图像中的重构块中的每个，并且因此整个重构图像被视为相邻重构块。因此，边界信号可以包括(i)整个重构图像，例如图像(1901)，(ii)相邻重构块，例如块(1911)至(1912)，或者(iii)包括在重构块中的一个或更多个边界区域(例如，边界区域A至D中的一个或更多个)。Deblocking training can be implemented, for example, on the encoder side, based on boundary signals associated with reconstructed blocks (e.g., neighboring reconstructed blocks within a reconstructed block). In the example, neighboring reconstructed blocks comprise a subset of reconstructed blocks in the reconstructed image. In the example, neighboring reconstructed blocks comprise each of the reconstructed blocks in the reconstructed image, and thus the entire reconstructed image is considered as a neighboring reconstructed block. Therefore, the boundary signals can include (i) the entire reconstructed image, such as image (1901), (ii) neighboring reconstructed blocks, such as blocks (1911) to (1912), or (iii) included in one or more boundary regions (e.g., one or more boundary regions A to D) within the reconstructed block.

根据本公开内容的实施方式，边界信号包括整个重构图像。去块训练可以基于重构图像来实现，其中去块NN的一个或更多个参数通过优化速率失真性能来确定(例如，更新)，例如，基于迭代更新过程，例如上面参照图9B所述。在去块训练过程中，可以通过迭代更新去块NN的一个或更多个参数来减少项L_training。According to embodiments of this disclosure, the boundary signal comprises the entire reconstructed image. Deblocking training can be implemented based on the reconstructed image, wherein one or more parameters of the deblocking neural network are determined (e.g., updated) by optimizing rate distortion performance, for example, based on an iterative update process, such as described above with reference to FIG9B. During deblocking training, the term L_{training} can be reduced by iteratively updating one or more parameters of the deblocking neural network.

在示例中，边界信号包括相邻重构块，例如块(1911)至(1912)。图22A示出了根据本公开内容的实施方式的示例性去块过程。相邻重构块(1911)至(1912)被馈送到去块NN(例如，(1930))中，并且生成去块的块(1911)’至(1912)’，其中去块的块(1911)’至(1912)’包括去块边界区域C’和非边界区域(1921)至(1922)。在示例中，重构块(1911)至(1912)的失真损耗被减少到去块的块(1911)’至(1912)’的失真损耗。根据本公开内容的实施方式，去块训练可以基于相邻重构块来实现，其中去块NN的一个或更多个参数通过优化速率失真性能来确定(例如，更新)，例如，基于迭代更新过程，例如上面参照图9B所述。在去块训练过程中，可以通过迭代更新去块NN的一个或更多个参数来减少项L_training。In the example, the boundary signal includes adjacent reconstructed blocks, such as blocks (1911) to (1912). Figure 22A illustrates an exemplary deblocking process according to an embodiment of the present disclosure. Adjacent reconstructed blocks (1911) to (1912) are fed into a deblocking NN (e.g., (1930)) and deblocked blocks (1911)' to (1912)' are generated, wherein the deblocked blocks (1911)' to (1912)' include deblocking boundary regions C' and non-boundary regions (1921) to (1922). In the example, the distortion loss of the reconstructed blocks (1911) to (1912) is reduced to the distortion loss of the deblocked blocks (1911)' to (1912)'. According to an embodiment of the present disclosure, deblocking training can be implemented based on adjacent reconstructed blocks, wherein one or more parameters of the deblocking NN are determined (e.g., updated) by optimizing rate distortion performance, for example, based on an iterative update process, such as described above with reference to Figure 9B. During deblocking training, _{the training} term L can be reduced by iteratively updating one or more parameters of the deblocked neural network.

在示例中，边界信号包括一个或更多个边界区域，例如图19A中所示的边界区域A至D中的一个或更多个或者图20中所示的边界区域AB和CD中的一个或更多个。图22B示出了根据本公开内容的实施方式的示例性去块过程。相邻重构块(1911)和(1913)中的边界区域A被馈送到去块NN(例如，(1930))中，以获得去块边界区域A’。在示例中，边界区域A的失真损耗D被减少到去块边界区域A’的失真损耗D。根据本公开内容的实施方式，去块训练可以基于一个或更多个边界区域来实现，其中去块NN的一个或更多个参数通过优化速率失真性能来确定(例如，更新)，例如，基于迭代更新过程，例如上面参照图9B所述。在去块训练过程中，可以通过迭代更新去块NN的一个或更多个参数来减少项L_training。In the example, the boundary signal includes one or more boundary regions, such as one or more of boundary regions A to D shown in FIG. 19A or one or more of boundary regions AB and CD shown in FIG. 20. FIG. 22B illustrates an exemplary deblocking process according to an embodiment of the present disclosure. Boundary regions A in adjacent reconstructed blocks (1911) and (1913) are fed into the deblocking NN (e.g., (1930)) to obtain deblocking boundary region A'. In the example, the distortion loss D of boundary region A is reduced to the distortion loss D of deblocking boundary region A'. According to an embodiment of the present disclosure, deblocking training can be implemented based on one or more boundary regions, wherein one or more parameters of the deblocking NN are determined (e.g., updated) by optimizing rate distortion performance, for example, based on an iterative update process, such as described above with reference to FIG. 9B. During the deblocking training process, the term L _training can be reduced by iteratively updating one or more parameters of the deblocking NN.

根据本公开内容的实施方式，边界信号包括重构图像(例如，(1901))中的相邻重构块(例如，(1911)和(1913))中的边界区域(例如，A)。在编码器侧，指示所确定的去块NN的一个或更多个参数的去块信息可以与相应的编码块一起被编码到视频比特流中。去块信息可以对应于用于确定去块训练中的去块信息的边界信号(例如，边界区域)。为了简洁起见，除非另有说明，否则下面的示例和/或实施方式使用边界区域作为边界信号给出。如果边界信号包括重构图像或重构块，则可以适当地调整描述。According to embodiments of this disclosure, the boundary signal includes a boundary region (e.g., A) in adjacent reconstructed blocks (e.g., (1911) and (1913)) within the reconstructed image (e.g., (1901)). On the encoder side, deblocking information indicating one or more parameters of the determined deblocking NN can be encoded into the video bitstream along with the corresponding coded blocks. The deblocking information may correspond to a boundary signal (e.g., a boundary region) used to determine the deblocking information in deblocking training. For brevity, unless otherwise stated, the following examples and/or embodiments use boundary regions as boundary signals. If the boundary signal includes a reconstructed image or a reconstructed block, the description may be adjusted accordingly.

在解码器侧，可以从编码视频比特流中重构编码块。重构块可以包括相邻重构块(例如，(1911)和(1913))。对应于边界区域的编码视频比特流中的去块信息可以被解码，其中去块信息可以指示在视频解码器中去块NN的一个或更多个参数。可以基于由去块信息指示的所确定的一个或更多个参数来确定视频解码器中的去块NN。在示例中，去块NN是为边界区域专门确定的。可以基于对应于边界区域的所确定的去块NN对边界区域进行去块。On the decoder side, coded blocks can be reconstructed from the coded video bitstream. Reconstructed blocks may include adjacent reconstructed blocks (e.g., (1911) and (1913)). Deblocking information in the coded video bitstream corresponding to the boundary region can be decoded, where the deblocking information may indicate one or more parameters of the deblocking NN in the video decoder. The deblocking NN in the video decoder can be determined based on one or more parameters indicated by the deblocking information. In the example, the deblocking NN is specifically determined for the boundary region. The boundary region can be deblocked based on the determined deblocking NN corresponding to the boundary region.

由于去块NN中的一个或更多个参数是基于要被去块的边界信号(例如，边界区域)确定的，所以去块NN可以取决于边界信号(例如，边界区域)。由于边界信号是基于要被编码的输入图像或块来确定的，因此去块NN对于要被编码的输入图像或块是内容自适应的。由于去块训练可以基于要被编码的边界信号来执行，所以去块训练可以是在线训练。由于去块训练取决于边界信号，并且因此可以针对边界信号进行调整，所以通过使用利用去块训练确定的去块NN，可以实现更好的压缩性能。Since one or more parameters of the deblocking neural network (NN) are determined based on the boundary signals (e.g., boundary regions) to be deblocked, the NN can depend on these boundary signals. Because the boundary signals are determined based on the input image or blocks to be encoded, the NN is content-adaptive to the input image or blocks to be encoded. Since deblocking training can be performed based on the boundary signals to be encoded, it can be trained online. Because deblocking training depends on and can therefore be adjusted for the boundary signals, better compression performance can be achieved by using a NN determined using deblocking training.

基于NN的去块(例如，基于DNN的去块或基于CNN的去块)可以在基于NN的图像编码框架(例如，NIC框架(900))或其他图像压缩方法中实现。去块训练可以用作预处理步骤(例如，预编码步骤)，以用于提高任何图像压缩方法的压缩性能。NN-based deblocking (e.g., DNN-based or CNN-based deblocking) can be implemented in NN-based image coding frameworks (e.g., the NIC framework (900)) or other image compression methods. Deblocking training can be used as a preprocessing step (e.g., a precoding step) to improve the compression performance of any image compression method.

去块训练可以与图像压缩编解码器无关，并且可以用任何合适类型的图像压缩编解码器来实现。图像压缩编解码器可以是基于NN的，例如图9B中所示的NIC框架(900)。图像压缩编解码器可以在没有NN的情况下实现，例如在图5至图8的某些实现中。Deblocking training can be independent of the image compression codec and can be implemented with any suitable type of image compression codec. The image compression codec can be based on a neural network, such as the NIC framework (900) shown in Figure 9B. The image compression codec can also be implemented without a neural network, as in some implementations shown in Figures 5 through 8.

一个或更多个参数可以包括去块NN中的偏置项、权重系数等。One or more parameters may include bias terms, weight coefficients, etc. in the deblocked neural network.

在实施方式中，去块NN不配置有初始参数，并且去块训练被实现以生成去块NN的参数(例如，包括去块NN的一个或更多个参数)。In this implementation, the deblocking neural network is not configured with initial parameters, and deblocking training is implemented to generate parameters of the deblocking neural network (e.g., including one or more parameters of the deblocking neural network).

在实施方式中，例如，在去块训练之前，去块NN配置有初始参数(例如，初始权重和/或初始偏置)。在实施方式中，基于包括训练块、训练图像和/或训练边界区域的训练数据集来预训练去块NN。初始参数可以包括预训练参数(例如，预训练权重和/或预训练偏置)。在实施方式中，去块NN没有被预训练。初始参数可以包括随机初始化的参数。In one implementation, for example, the deblocking neural network (NN) is configured with initial parameters (e.g., initial weights and/or initial biases) before deblocking training. In another implementation, the NN is pre-trained based on a training dataset that includes training blocks, training images, and/or training boundary regions. The initial parameters may include pre-trained parameters (e.g., pre-trained weights and/or pre-trained biases). In yet another implementation, the NN is not pre-trained. The initial parameters may include randomly initialized parameters.

在实施方式中，基于边界信号在去块训练中迭代更新去块NN中的初始参数中的一个或更多个初始参数。可以基于在去块训练中确定的一个或更多个参数(例如，一个或更多个替换参数)来更新一个或更多个初始参数。例如，一个或更多个初始参数分别被一个或更多个替换参数替换。在一些示例中，由去块信息指示的一个或更多个参数被解压缩，并且然后用于更新去块NN。In implementations, one or more initial parameters in the deblocking neural network are iteratively updated during deblocking training based on boundary signals. One or more initial parameters can be updated based on one or more parameters determined during deblocking training (e.g., one or more replacement parameters). For example, one or more initial parameters are each replaced by one or more replacement parameters. In some examples, one or more parameters indicated by deblocking information are decompressed and then used to update the deblocking neural network.

在示例中，通过一个或更多个替换参数来更新整个初始参数集。在示例中，初始参数的子集被一个或更多个替换参数更新，并且初始参数的剩余子集通过去块训练保持不变。In the example, the entire initial parameter set is updated using one or more replacement parameters. In the example, a subset of the initial parameters is updated by one or more replacement parameters, and the remaining subset of the initial parameters remains unchanged through deblocking training.

通过在去块NN中使用所确定的一个或更多个参数(例如，一个或更多个替换参数)，去块NN可以被应用于要去块的边界信号，并实现更好的失真性能，例如更小的R-D损耗L_p。如参考式10所述，R(p)表示编码到视频比特流中的所确定的一个或更多个参数(例如，一个或更多个替换参数)的去块信息的比特消耗。By using one or more determined parameters (e.g., one or more replacement parameters) in the deblocking neural network, the deblocking neural network can be applied to the boundary signal to be deblocked and achieve better distortion performance, such as lower RD loss _Lp . As described in Equation 10, R(p) represents the bit consumption of the deblocking information encoded into the video bitstream by one or more determined parameters (e.g., one or more replacement parameters).

根据本公开内容的实施方式，一个或更多个其他边界区域(例如，重构块(1912)和(1914)中的边界区域B)可以基于对应于边界区域(例如，边界区域A)的确定的去块NN进行去块。例如，当边界区域(例如，A)以及一个或更多个其他边界区域(例如，B)具有相似的像素分布时，可以应用相同的去块NN来对边界区域(例如，A)以及一个或更多个其他边界区域(例如，B)进行去块，并实现相对大的编码效率。According to embodiments of this disclosure, one or more other boundary regions (e.g., boundary region B in reconstructed blocks (1912) and (1914)) can be deblocked based on a determined deblocking neural network corresponding to the boundary region (e.g., boundary region A). For example, when the boundary region (e.g., A) and one or more other boundary regions (e.g., B) have similar pixel distributions, the same deblocking neural network can be applied to deblock the boundary region (e.g., A) and one or more other boundary regions (e.g., B), achieving relatively high coding efficiency.

当使用多个边界信号，例如多个边界区域(例如，边界区域A至B)来确定去块NN中的一个或更多个参数——其中所确定的一个或更多个参数由多个边界信号共享——时，可以适当地调整上述描述。The above description may be appropriately adapted when using multiple boundary signals, such as multiple boundary regions (e.g., boundary regions A to B), to determine one or more parameters of the deblocking NN—where the determined one or more parameters are shared by multiple boundary signals.

根据本公开内容的实施方式，去块训练可以被称为微调过程，其中去块NN中的初始参数中的一个或更多个初始参数(例如，一个或更多个预训练参数)可以基于边界信号来更新(例如，微调)，所述边界信号基于要被编码并可选地包括在视频比特流中的输入图像或者要被编码并可选地包括在视频比特流中的输入图像的输入块来确定。边界信号可以不同于用于获得预训练参数的训练图像、训练块或训练边界区域。因此，去块NN可以被调整成针对输入图像或输入块的内容。According to embodiments of this disclosure, deblocking training can be referred to as a fine-tuning process, wherein one or more initial parameters (e.g., one or more pre-trained parameters) in the deblocking neural network can be updated (e.g., fine-tuned) based on boundary signals determined based on an input image to be encoded and optionally included in a video bitstream, or an input block of the input image to be encoded and optionally included in a video bitstream. The boundary signals may differ from the training image, training block, or training boundary region used to obtain the pre-trained parameters. Therefore, the deblocking neural network can be tailored to the content of the input image or input block.

在示例中，边界信号包括单个边界区域(例如，边界区域A)，并且利用单个边界区域执行去块训练。基于单个边界区域来确定(例如，训练或更新)去块NN。在解码器侧，确定的去块NN可以用于对边界区域和可选的其他边界区域进行去块。去块信息可以与对应于包括边界区域的重构块的编码块一起被编码到视频比特流中。In the example, the boundary signal comprises a single boundary region (e.g., boundary region A), and deblocking training is performed using this single boundary region. The deblocking neural network (NN) is determined (e.g., trained or updated) based on this single boundary region. On the decoder side, the determined deblocking NN can be used to deblock the boundary region and optionally other boundary regions. The deblocking information can be encoded into the video bitstream along with coded blocks corresponding to reconstructed blocks including the boundary regions.

在实施方式中，边界信号包括多个边界区域(例如，边界区域A至B)，并且利用多个边界区域执行去块训练。基于多个边界区域来确定(例如，训练或更新)去块NN。在解码器侧，确定的去块NN可以用于对多个边界区域和可选的其他边界区域进行去块。去块信息可以与对应于包括多个边界区域的重构块的编码块一起被编码到视频比特流中。In this implementation, the boundary signal includes multiple boundary regions (e.g., boundary regions A to B), and deblocking training is performed using these multiple boundary regions. A deblocking neural network (NN) is determined (e.g., trained or updated) based on these multiple boundary regions. On the decoder side, the determined NN can be used to deblock the multiple boundary regions and optionally other boundary regions. The deblocking information can be encoded into the video bitstream along with coded blocks corresponding to reconstructed blocks that include the multiple boundary regions.

速率损耗R(p)可能会随着在视频比特流中发信号通知去块信息而增加。当边界信号包括单个边界区域时，可以针对每个边界区域发信号通知去块信息，并且对速率损耗R(p)的第一增加用于指示由于针对每个边界区域发信号通知去块信息而导致的对速率损耗R(p)的增加。当边界信号包括多个边界区域时，可以针对多个边界区域发信号通知去块信息，并且可以由多个边界区域共享去块信息，并且对速率损耗R(p)的第二增加用于指示由于针对每个边界区域发信号通知去块信息而导致的对速率损耗R(p)的增加。因为去块信息由多个边界区域共享，所以对速率损耗R(p)的第二增加可以小于对速率损耗R(p)的第一增加。因此，在一些示例中，使用多个边界区域来确定(例如，训练或更新)去块NN可能是有利的。The rate loss R(p) may increase as deblocking information is signaled in the video bitstream. When the boundary signal comprises a single boundary region, deblocking information can be signaled for each boundary region, and a first increase in the rate loss R(p) indicates the increase in rate loss R(p) due to signaling deblocking information for each boundary region. When the boundary signal comprises multiple boundary regions, deblocking information can be signaled for multiple boundary regions, and the deblocking information can be shared by multiple boundary regions, and a second increase in the rate loss R(p) indicates the increase in rate loss R(p) due to signaling deblocking information for each boundary region. Because the deblocking information is shared by multiple boundary regions, the second increase in the rate loss R(p) can be less than the first increase in the rate loss R(p). Therefore, in some examples, using multiple boundary regions to determine (e.g., train or update) the deblocking neural network may be advantageous.

在实施方式中，在去块训练中确定去块NN的参数子集。例如，去块NN中的初始参数的子集在去块训练中被更新。在实施方式中，在去块训练中确定去块NN的每个参数。例如，去块NN中的每个初始参数在去块训练中被更新。在示例中，初始参数的整个集合被所确定的一个或更多个参数替换。In one implementation, a subset of parameters of the deblocking neural network is determined during deblocking training. For example, a subset of the initial parameters in the deblocking neural network is updated during deblocking training. In another implementation, each parameter of the deblocking neural network is determined during deblocking training. For example, each initial parameter in the deblocking neural network is updated during deblocking training. In this example, the entire set of initial parameters is replaced by one or more of the determined parameters.

在实施方式中，被更新的一个或更多个初始参数在去块NN的单个层(例如，单个卷积层)中。在实施方式中，被更新的一个或更多个初始参数在去块NN的多个或所有层(例如，多个或所有卷积层)中。In one implementation, one or more initial parameters are updated in a single layer (e.g., a single convolutional layer) of the deblocked neural network. In another implementation, one or more initial parameters are updated in multiple or all layers (e.g., multiple or all convolutional layers) of the deblocked neural network.

去块NN可以由不同类型的参数，例如权重、偏置等指定。去块NN可以配置有合适的初始参数，例如权重、偏置或权重和偏置的组合。当使用CNN时，权重可以包括卷积核中的元素。Deblocking neural networks (NNs) can be specified by different types of parameters, such as weights and biases. NNs can be configured with appropriate initial parameters, such as weights, biases, or a combination of weights and biases. When using CNNs, weights can include elements from the convolutional kernel.

在实施方式中，要更新的一个或更多个初始参数是偏置项，并且只有偏置项被所确定的一个或更多个参数替换。在实施方式中，要更新的一个或更多个初始参数是权重，并且只有权重被所确定的一个或更多个参数替换。在实施方式中，要更新的一个或更多个初始参数包括权重和偏置项，并且被所确定的一个或更多个参数替换。In one implementation, the one or more initial parameters to be updated are bias terms, and only the bias terms are replaced by the determined one or more parameters. In another implementation, the one or more initial parameters to be updated are weights, and only the weights are replaced by the determined one or more parameters. In yet another implementation, the one or more initial parameters to be updated include weights and bias terms, and are replaced by the determined one or more parameters.

在实施方式中，可以对于不同的边界信号(例如，边界区域)确定(例如，更新)不同类型的参数(例如，偏置或权重)。例如，第一边界区域用于更新对应于第一边界区域的去块NN中的第一类型的参数(例如，至少一个偏置)，并且第二边界区域用于更新对应于第二边界区域的去块NN中的第二类型的参数(例如，至少一个权重)。In implementation, different types of parameters (e.g., biases or weights) can be determined (e.g., updated) for different boundary signals (e.g., boundary regions). For example, a first boundary region is used to update a first type of parameter (e.g., at least one bias) in the deblocking neural network corresponding to the first boundary region, and a second boundary region is used to update a second type of parameter (e.g., at least one weight) in the deblocking neural network corresponding to the second boundary region.

在实施方式中，针对不同的边界信号(例如，不同的边界区域)更新不同的参数。In the implementation, different parameters are updated for different boundary signals (e.g., different boundary regions).

在实施方式中，多个边界信号(例如，多个边界区域)共享相同的一个或更多个参数。在示例中，重构图像中的所有边界区域共享相同的一个或更多个参数。In this implementation, multiple boundary signals (e.g., multiple boundary regions) share the same one or more parameters. In the example, all boundary regions in the reconstructed image share the same one or more parameters.

在实施方式中，基于边界信号(例如，边界区域)的特性，例如边界区域的RGB方差来选择要更新的一个或更多个初始参数。在实施方式中，基于边界区域的RD性能来选择要更新的一个或更多个初始参数。In one implementation, one or more initial parameters to be updated are selected based on characteristics of the boundary signal (e.g., the boundary region), such as the RGB variance of the boundary region. In another implementation, one or more initial parameters to be updated are selected based on the RD performance of the boundary region.

在去块训练的结束，可以为相应确定的一个或更多个参数(例如，相应的一个或更多个替换参数)计算一个或更多个更新的参数。去块信息可以指示一个或更多个更新的参数，并且因此可以通过指示一个或更多个更新的参数来指示确定的一个或更多个参数。在一些实施方式中，一个或更多个更新的参数可以作为去块信息被编码到视频比特流中。在实施方式中，一个或更多个更新的参数被计算为所确定的一个或更多个参数(例如，相应的一个或更多个替换参数)与相应的一个或更多个初始参数(例如，一个或更多个预训练参数)之间的差。例如，去块信息指示一个或更多个参数与初始参数中的相应的一个或更多个之间的差。一个或更多个参数可以根据差和初始参数中的相应的一个或更多个的和来确定。At the end of deblocking training, one or more updated parameters can be computed for the corresponding determined one or more parameters (e.g., corresponding one or more replacement parameters). Deblocking information can indicate one or more updated parameters, and therefore the determined one or more parameters can be indicated by indicating one or more updated parameters. In some implementations, one or more updated parameters can be encoded as deblocking information into the video bitstream. In implementations, one or more updated parameters are computed as the difference between the determined one or more parameters (e.g., corresponding one or more replacement parameters) and the corresponding one or more initial parameters (e.g., one or more pre-trained parameters). For example, deblocking information indicates the difference between one or more parameters and the corresponding one or more of the initial parameters. One or more parameters can be determined based on the sum of the difference and the corresponding one or more of the initial parameters.

在实施方式中，一个或更多个更新的参数分别是所确定的一个或更多个参数。In the implementation, one or more updated parameters are the one or more parameters that have been determined.

在实施方式中，如何从相应的所确定的一个或更多个参数中获得一个或更多个更新的参数取决于去块训练中使用的边界信号(例如，边界区域)。不同的方法可以用于不同的边界区域。在示例中，应用于第一边界区域的去块NN的更新参数被计算为基于第一边界区域获得的替换参数与相应的初始参数之间的差。在示例中，应用于第二边界区域的去块NN的更新参数是基于第二边界区域获得的替换参数。In implementation, how one or more updated parameters are obtained from the corresponding determined parameters depends on the boundary signals (e.g., boundary regions) used in the deblocking training. Different methods can be used for different boundary regions. In the example, the updated parameters applied to the deblocking neural network for the first boundary region are calculated as the difference between the replacement parameters obtained based on the first boundary region and the corresponding initial parameters. In the example, the updated parameters applied to the deblocking neural network for the second boundary region are the replacement parameters obtained based on the second boundary region.

在实施方式中，不同的边界信号(例如，不同的边界区域)具有不同的在一个或更多个更新的参数与确定的一个或更多个参数之间的关系。例如，对于第一边界区域，一个或更多个更新的参数被计算为一个或更多个替换参数与相应的一个或更多个预训练参数之间的差。对于第二边界区域，一个或更多个更新的参数分别是一个或更多个替换参数。In this implementation, different boundary signals (e.g., different boundary regions) have different relationships between one or more updated parameters and one or more determined parameters. For example, for a first boundary region, one or more updated parameters are calculated as the difference between one or more replacement parameters and the corresponding one or more pre-trained parameters. For a second boundary region, one or more updated parameters are each one or more replacement parameters.

在实施方式中，如何从相应的所确定的一个或更多个参数中获得一个或更多个更新的参数不取决于去块训练中使用的边界信号(例如，边界区域)。在示例中，所有边界区域共享相同的方式来更新去块NN中的参数。在实施方式中，图像中的多个边界区域(例如，所有边界区域)具有相同的在一个或更多个更新的参数与一个或更多个替换参数之间的关系。In this implementation, how one or more updated parameters are obtained from the corresponding determined one or more parameters does not depend on the boundary signals (e.g., boundary regions) used in the deblocking training. In the example, all boundary regions share the same way of updating the parameters in the deblocking neural network. In this implementation, multiple boundary regions in the image (e.g., all boundary regions) have the same relationship between one or more updated parameters and one or more replacement parameters.

在实施方式中，一个或更多个更新的参数与一个或更多个替换参数之间的关系是基于边界信号(例如，边界区域)的特性，例如边界区域的RGB方差来选择的。在实施方式中，基于边界区域的RD性能来选择一个或更多个更新的参数与一个或更多个替换参数之间的关系。In one implementation, the relationship between one or more updated parameters and one or more replacement parameters is selected based on the characteristics of the boundary signal (e.g., the boundary region), such as the RGB variance of the boundary region. In another implementation, the relationship between one or more updated parameters and one or more replacement parameters is selected based on the RD performance of the boundary region.

在实施方式中，可以例如，使用特定的线性或非线性变换，从确定的一个或更多个参数(例如，一个或更多个替换参数)中生成一个或更多个更新的参数，并且一个或更多个更新的参数是基于确定的一个或更多个参数生成的代表性参数。所确定的一个或更多个参数被变换成一个或更多个更新的参数以进行更好的压缩。In implementations, for example, specific linear or nonlinear transformations may be used to generate one or more updated parameters from a set of determined parameters (e.g., one or more replacement parameters), and the one or more updated parameters are representative parameters generated based on the determined one or more parameters. The determined one or more parameters are transformed into one or more updated parameters for better compression.

一个或更多个更新的参数可以被压缩或不被压缩。在示例中，一个或更多个更新的参数被压缩，例如，使用作为Lempel–Ziv–Markov链算法(Lempel–Ziv–Markov chainalgorithm，LZMA)的变体的LZMA2、bzip2算法等。在示例中，对于一个或更多个更新的参数，省略压缩。One or more updated parameters may or may not be compressed. In the example, one or more updated parameters are compressed, for example, using LZMA2, bzip2, etc., which are variants of the Lempel–Ziv–Markov chain algorithm (LZMA). In the example, compression is omitted for one or more updated parameters.

在实施方式中，一个或更多个更新的参数的压缩方法对于不同的边界信号(例如，不同的边界区域)是不同的。例如，对于第一边界区域，LZMA2用于对一个或更多个更新的参数进行压缩，而对于第二边界区域，bzip2用于对一个或更多个更新的参数进行压缩。在实施方式中，使用相同的压缩方法来对图像或重构块中的多个边界区域(例如，所有边界区域)的一个或更多个更新的参数进行压缩。在实施方式中，基于边界信号(例如，边界区域)的特性，例如边界区域的RGB方差来选择压缩方法。在实施方式中，基于边界区域的RD性能来选择压缩方法。In some implementations, the compression method for one or more updated parameters differs for different boundary signals (e.g., different boundary regions). For example, LZMA2 is used to compress one or more updated parameters for a first boundary region, while bzip2 is used for a second boundary region. In other implementations, the same compression method is used to compress one or more updated parameters for multiple boundary regions (e.g., all boundary regions) in an image or reconstructed block. In other implementations, the compression method is selected based on characteristics of the boundary signals (e.g., boundary regions), such as the RGB variance of the boundary regions. In other implementations, the compression method is selected based on the RD performance of the boundary regions.

去块训练可以包括多次(例如，迭代)，其中在迭代过程中更新一个或更多个初始参数。当训练损耗降到一定水平或将要降到一定水平时，去块训练可以停止。在示例中，当训练损耗(例如，R-D损耗L_p或项L_training)低于第一阈值时，去块训练停止。在示例中，当两个连续训练损耗之间的差低于第二阈值时，去块训练停止。Deblocking training can include multiple iterations, where one or more initial parameters are updated during each iteration. Deblocking training can stop when the training loss drops to or is about to drop to a certain level. In the example, deblocking training stops when the training loss (e.g., the RD loss _Lp or the term _Ltraining ) falls below a first threshold. In the example, deblocking training stops when the difference between two consecutive training losses falls below a second threshold.

两个超参数(例如，步长和最大步数)可以与损失函数(例如，R-D损耗L_p或项L_training)一起用于去块训练。最大迭代次数可以用作终止去块训练的最大迭代次数的阈值。在示例中，当迭代次数达到最大迭代次数时，去块训练停止。Two hyperparameters (e.g., step size and maximum number of steps) can be used together with a loss function (e.g., RD loss L _p or term L _training ) for deblocking training. The maximum number of iterations can be used as a threshold for the maximum number of iterations to terminate deblocking training. In the example, deblocking training stops when the maximum number of iterations is reached.

步长可以指示在线训练过程(例如，去块训练)的学习速率。步长可以用于在去块训练中执行的梯度下降算法或反向传播计算。可以使用任何合适的方法来确定步长。The step size can indicate the learning rate of an online training process (e.g., deblocking training). The step size can be used for gradient descent algorithms or backpropagation calculations performed in deblocking training. Any suitable method can be used to determine the step size.

在去块训练中，每个边界信号(例如，每个边界区域)的步长可以不同。在实施方式中，可以为图像中的边界区域分配不同的步长，以便实现更好的压缩结果(例如，更好的R-D损耗L_p或更好的项L_training)。In deblocking training, the step size for each boundary signal (e.g., each boundary region) can be different. In implementations, different step sizes can be assigned to boundary regions in the image to achieve better compression results (e.g., better RD loss _Lp or better term L _training ).

在实施方式中，不同的步长用于具有不同类型内容的边界信号(例如，边界区域)，以实现最佳结果。不同类型可以指不同的方差。在示例中，基于用于更新去块NN的边界区域的方差来确定步长。例如，具有高方差的边界区域的步长大于具有低方差的边界区域的步长，其中高方差大于低方差。In this implementation, different step sizes are used for boundary signals (e.g., boundary regions) with different types of content to achieve optimal results. Different types can refer to different variances. In the example, the step size is determined based on the variance of the boundary regions used to update the deblocked neural network. For example, the step size for boundary regions with high variance is greater than the step size for boundary regions with low variance, where high variance is greater than low variance.

在实施方式中，基于边界信号(例如，边界区域)的特性，例如边界区域的RGB方差选择步长。在实施方式中，基于边界区域的RD性能(例如，R-D损耗L_p或项L_training)来选择步长。可以基于不同的步长生成多个集合的参数(例如，多个集合的替换参数)，并且可以选择具有更好压缩性能(例如，更小的R-D损耗L_p或更小的项L_training)的集合。In one implementation, the step size is selected based on the characteristics of the boundary signal (e.g., the boundary region), such as the RGB variance of the boundary region. In another implementation, the step size is selected based on the RD performance of the boundary region (e.g., RD loss _Lp or term L _training ). Parameters for multiple sets (e.g., replacement parameters for multiple sets) can be generated based on different step sizes, and sets with better compression performance (e.g., smaller RD loss _Lp or smaller term L _training ) can be selected.

在实施方式中，第一步长可以用于运行一定次数(例如100次)的迭代。然后，第二步长(例如，第一步长加上或减去大小增量)可以用于运行一定次数的迭代。可以比较第一步长和第二步长的结果，以确定要使用的步长。可以测试大于两个的步长以确定最佳步长。In an implementation, the first step size can be used to run a certain number of iterations (e.g., 100). Then, a second step size (e.g., the first step size plus or minus a size increment) can be used to run a certain number of iterations. The results of the first and second step sizes can be compared to determine the step size to use. More than two step sizes can be tested to determine the optimal step size.

在去块训练期间，步长可以变化。步长可以在去块训练的开始时具有初始值，并且该初始值可以在去块训练的稍后阶段，例如，在一定次数的迭代之后减小(例如减半)，以实现更精细的调节。在迭代去块训练期间，步长或学习速率可以由调度器来改变。调度器可以包括用于调整步长的参数调整方法。调度器可以确定步长的值，使得步长可以在多个间隔内增加、减小或保持恒定。在示例中，学习速率在每个步骤中由调度器改变。单个调度器或多个不同的调度器可以用于不同的边界信号(例如，不同的边界区域)。因此，可以基于多个调度器生成多组参数，并且可以选择多组参数中具有更好压缩性能(例如，更小的R-D损耗L_p或更小的项L_training)的一组参数。During deblocking training, the step size can be varied. The step size can have an initial value at the beginning of deblocking training, and this initial value can be reduced (e.g., halved) later in the deblocking training process to achieve finer tuning. During iterative deblocking training, the step size or learning rate can be changed by a scheduler. The scheduler can include parameter tuning methods for adjusting the step size. The scheduler can determine the value of the step size such that it can be increased, decreased, or kept constant over multiple intervals. In the example, the learning rate is changed by the scheduler at each step. A single scheduler or multiple different schedulers can be used for different boundary signals (e.g., different boundary regions). Therefore, multiple sets of parameters can be generated based on multiple schedulers, and a set of parameters with better compression performance (e.g., smaller RD loss _Lp or smaller term _Ltraining ) can be selected from the multiple sets of parameters.

在实施方式中，为不同的边界信号(例如，不同的边界区域)分配多个学习速率调度，以实现更好的压缩结果。在示例中，图像中的所有边界区域共享相同的学习速率调度。在示例中，一组边界区域共享相同的学习速率调度。在实施方式中，学习速率调度的选择基于边界区域的特性，例如边界区域的RGB方差。在实施方式中，学习速率调度的选择基于边界区域的RD性能。In one implementation, multiple learning rate schedules are assigned to different boundary signals (e.g., different boundary regions) to achieve better compression results. In one example, all boundary regions in the image share the same learning rate schedule. In another example, a set of boundary regions shares the same learning rate schedule. In one implementation, the selection of the learning rate schedule is based on the characteristics of the boundary regions, such as their RGB variance. In yet another implementation, the selection of the learning rate schedule is based on the RD performance of the boundary regions.

在实施方式中，每个边界信号(例如，每个边界区域)的去块NN的结构(例如，架构)是相同的。去块NN的结构可以包括多个层、不同节点和/或层如何组织和连接、前馈架构、递归架构、DNN、CNN等。在示例中，结构涉及多个卷积层，并且不同的块具有相同数量的卷积层。In this implementation, the structure (e.g., architecture) of the deblocking neural network is identical for each boundary signal (e.g., each boundary region). The structure of the deblocking neural network can include multiple layers, how different nodes and/or layers are organized and connected, feedforward architectures, recursive architectures, DNNs, CNNs, etc. In this example, the structure involves multiple convolutional layers, and different blocks have the same number of convolutional layers.

在实施方式中，去块NN的不同结构对应于不同的边界信号(例如，不同的边界区域)。在示例中，去块NN具有对应于不同边界区域的不同数量的卷积层。In implementations, different structures of the deblocking neural network correspond to different boundary signals (e.g., different boundary regions). In the example, the deblocking neural network has different numbers of convolutional layers corresponding to different boundary regions.

在实施方式中，是否使用去块NN对边界信号(例如，边界区域)进行去块是基于如下来确定的：(i)具有去块训练的R-D损耗L_p和没有去块训练的R-D损耗L的比较，或者(ii)具有去块训练的项L_training和没有去块训练的加权失真损耗λD的比较。在实施方式中，基于不同去块NN的不同R-D损耗L_ps的比较选择具有最佳R-D性能(例如，最小R-D损耗L_p)的去块NN。In this implementation, whether to use a deblocking neural network (NN) to deblock boundary signals (e.g., boundary regions) is determined based on: (i) a comparison of the RD loss _Lp with deblocking training and the RD loss L without deblocking training, or (ii) a comparison of the term L _training with deblocking training and the weighted distortion loss λD without deblocking training. In this implementation, the NN with the best RD performance (e.g., the minimum RD loss _Lp ) is selected based on a comparison of different RD losses _Lp from different NNs.

根据本公开内容的实施方式，每个边界信号(例如，每个边界区域)对应于基于边界区域在去块训练中确定的去块NN。去块NN可以独立于另一个去块NN进行更新。例如，对应于第一边界区域的去块NN独立于对应于第二边界区域的去块NN进行更新。According to embodiments of this disclosure, each boundary signal (e.g., each boundary region) corresponds to a deblocking neural network (NN) determined during deblocking training based on the boundary region. The deblocking NN can be updated independently of another deblocking NN. For example, the deblocking NN corresponding to a first boundary region is updated independently of the deblocking NN corresponding to a second boundary region.

根据本公开内容的实施方式，对应于边界信号(例如，边界区域)的去块NN可以基于对应于另一边界信号(例如，另一边界区域)的去块NN来更新。在示例中，对应于第一边界区域的编码比特流中的第一去块信息被解码。第一去块信息可以指示在视频解码器中的去块NN的第一参数。可以基于由第一去块信息指示的第一参数来确定对应于第一边界区域的视频解码器中的去块NN。可以基于对应于第一边界区域的所确定的去块NN对第一边界区域进行去块。According to embodiments of this disclosure, a deblocking neural network (NN) corresponding to a boundary signal (e.g., a boundary region) can be updated based on a deblocking NN corresponding to another boundary signal (e.g., another boundary region). In an example, first deblocking information in the coded bitstream corresponding to a first boundary region is decoded. The first deblocking information may indicate a first parameter of the deblocking NN in the video decoder. The deblocking NN in the video decoder corresponding to the first boundary region can be determined based on the first parameter indicated by the first deblocking information. The first boundary region can be deblocked based on the determined deblocking NN corresponding to the first boundary region.

第二相邻重构块可以具有第二共享边界，并包括第二共享边界的两侧上的样本的第二边界区域。对应于第一边界区域的去块NN中的参数可以用于更新对应于第二边界区域的去块NN。例如，同一图像的边界区域中的像素分布可以是相似的，并且因此可以减少对应于不同边界信号(例如，不同边界区域)的去块NN的要更新的参数。对应于第二边界区域的编码比特流中的第二去块信息可以被解码。第二去块信息可以指示第二参数。第二边界区域不同于第一边界区域。可以基于第一参数和第二参数来更新去块NN。更新的去块NN对应于第二边界区域，并且由第一参数和第二参数配置。可以基于对应于第二边界区域的更新的去块NN对第二边界区域进行去块。The second adjacent reconstructed block may have a second shared boundary and include a second boundary region of samples on both sides of the second shared boundary. Parameters in the deblocking neural network corresponding to the first boundary region can be used to update the deblocking neural network corresponding to the second boundary region. For example, the pixel distribution in the boundary regions of the same image may be similar, and therefore the parameters to be updated for deblocking neural networks corresponding to different boundary signals (e.g., different boundary regions) can be reduced. Second deblocking information in the encoded bitstream corresponding to the second boundary region can be decoded. The second deblocking information may indicate second parameters. The second boundary region is different from the first boundary region. The deblocking neural network can be updated based on the first and second parameters. The updated deblocking neural network corresponds to the second boundary region and is configured by the first and second parameters. The second boundary region can be deblocked based on the updated deblocking neural network corresponding to the second boundary region.

在实施方式中，不同的去块NN可以应用于具有不同大小的边界信号(例如，边界区域)。通常，去块NN中的参数的数量可以随着边界区域的尺寸(例如，宽度、高度或面积)而增加。In implementations, different deblocking neural networks (NNs) can be applied to boundary signals (e.g., boundary regions) of different sizes. Typically, the number of parameters in the deblocking NN can increase with the size of the boundary region (e.g., width, height, or area).

在实施方式中，不同的去块NN对应于不同压缩质量目标被应用于相同边界信号(例如，相同边界区域)。In the implementation, different deblocking NNs correspond to different compression quality targets being applied to the same boundary signal (e.g., the same boundary region).

NIC框架和去块NN可以包括任何类型的神经网络，并使用任何基于神经网络的图像压缩方法，例如上下文超先验编码器-解码器框架、尺度超先验编码器-解码器框架、高斯混合似然框架和高斯混合似然框架的变体、基于RNN的递归压缩方法和基于RNN的递归压缩方法的变体等。The NIC framework and deblocking NN can include any type of neural network and use any neural network-based image compression method, such as contextual super-prior encoder-decoder framework, scale super-prior encoder-decoder framework, Gaussian mixture likelihood framework and its variants, RNN-based recursive compression method and its variants, etc.

本公开内容中的去块训练方法和装置可以具有以下优点。利用自适应在线训练机制来改进编码效率。使用灵活且通用的框架可以适应各种类型的预训练框架和质量度量。例如，去块NN中的某些初始参数(例如，预训练参数)可以通过使用利用基于要编码的块或图像的边界信号的在线训练来替换。The deblocking training method and apparatus of this disclosure have the following advantages: Improved encoding efficiency through adaptive online training mechanisms. A flexible and general framework that can be adapted to various types of pre-training frameworks and quality metrics. For example, some initial parameters (e.g., pre-training parameters) in the deblocking neural network can be replaced by online training utilizing boundary signals based on the blocks or images to be encoded.

图23示出了概述根据本公开内容的实施方式的编码过程(2300)的流程图。过程(2300)可以用于对输入块进行编码、去块训练和/或编码去块信息。在各种实施方式中，过程(2300)由处理电路系统，例如终端装置(310)、(320)、(330)和(340)中的处理电路系统、执行视频编码器(例如，(403)、(603)、(703)、(1600A)或(1700))的功能的处理电路系统等来执行。在示例中，处理电路系统执行如下功能的组合：例如(i)视频编码器(403)、视频编码器(603)和视频编码器(703)之一，以及(ii)视频编码器(1600A)或视频编码器(1700)之一。在一些实施方式中，过程(2300)在软件指令中实现，因此当处理电路系统执行软件指令时，处理电路系统执行过程(2300)。该过程开始于(S2301)，并继续进行到(S2310)。Figure 23 shows a flowchart outlining an encoding process (2300) according to an embodiment of the present disclosure. The process (2300) can be used to encode input blocks, perform deblocking training, and/or encode deblocking information. In various embodiments, the process (2300) is performed by a processing circuitry, such as a processing circuitry in terminal devices (310), (320), (330), and (340), a processing circuitry performing the functions of a video encoder (e.g., (403), (603), (703), (1600A), or (1700)). In an example, the processing circuitry performs a combination of functions, such as (i) one of a video encoder (403), a video encoder (603), and a video encoder (703), and (ii) one of a video encoder (1600A) or a video encoder (1700). In some embodiments, the process (2300) is implemented in software instructions, so that the processing circuitry executes the process (2300) when the software instructions are executed. The process begins at (S2301) and continues to (S2310).

在(S2310)处，可以使用任何合适的方法基于输入图像中的输入块(例如，X)生成编码块(例如，)。在示例中，如参照图9B所述，基于输入块X生成编码块在示例中，输入块包括输入图像中的块的子集。在示例中，输入块包括输入图像中的块的整个集合。At (S2310), any suitable method can be used to generate an encoded block (e.g., ) based on an input block (e.g., X) in the input image. In the example, as described with reference to FIG9B, the encoded block is generated based on the input block X. In the example, the input block includes a subset of the blocks in the input image. In the example, the input block includes the entire set of blocks in the input image.

在(S2320)处，可以基于编码块生成重构块。参照图9B，在示例中，使用NIC框架(900)基于编码块生成重构块(例如，)。去块训练可以被实现为基于边界信号确定去块NN的一个或更多个参数。如在本公开内容中所述，边界信号可以基于重构块确定。在示例中，边界信号包括重构块中的边界区域。At (S2320), a reconstructed block can be generated based on the coded block. Referring to FIG9B, in the example, a reconstructed block is generated based on the coded block using the NIC framework (900) (e.g., ). Deblocking training can be implemented to determine one or more parameters of the deblocked NN based on boundary signals. As described in this disclosure, boundary signals can be determined based on the reconstructed block. In the example, the boundary signals include boundary regions in the reconstructed block.

去块NN可以配置有初始参数，并且初始参数中的一个或更多个初始参数可以在去块训练中迭代更新。一个或更多个初始参数可以由一个或更多个参数替换。Deblocking neural networks can be configured with initial parameters, and one or more of these initial parameters can be iteratively updated during deblocking training. One or more initial parameters can be replaced by one or more parameters.

在(S2330)处，可以对指示去块NN的所确定的一个或更多个参数的去块信息进行编码。去块信息对应于边界信号(例如，边界区域)。在一些示例中，编码块和去块信息可以被包括在编码视频比特流中，并且可选地在编码视频比特流中传输。过程(2300)继续进行到(S2399)，并终止。At (S2330), deblocking information indicating one or more determined parameters of the deblocking NN can be encoded. The deblocking information corresponds to boundary signals (e.g., boundary regions). In some examples, the encoded blocks and deblocking information can be included in the encoded video bitstream and optionally transmitted in the encoded video bitstream. The process (2300) continues to (S2399) and terminates.

过程(2300)可以适当地适应各种情况，并且过程(2300)中的步骤可以相应地调整。过程(2300)中的步骤中的一个或更多个可以被调整、省略、重复和/或组合。可以使用任何合适的顺序来实现该过程(2300)。可以添加额外的步骤。在示例中，边界信号包括重构块，并且可以基于重构块来生成一个或更多个参数。在示例中，边界信号包括重构图像中的每个重构块，并且可以基于重构图像生成一个或更多个参数。The process (2300) can be appropriately adapted to various situations, and the steps in the process (2300) can be adjusted accordingly. One or more steps in the process (2300) can be adjusted, omitted, repeated, and/or combined. The process (2300) can be implemented using any suitable order. Additional steps can be added. In the example, the boundary signal includes reconstructed blocks, and one or more parameters can be generated based on the reconstructed blocks. In the example, the boundary signal includes each reconstructed block in the reconstructed image, and one or more parameters can be generated based on the reconstructed image.

图24示出了概述根据本公开内容的实施方式的解码过程(2400)的流程图。过程(2400)可以用于基于视频比特流中的去块信息对边界信号(例如，边界区域)进行重构和去块。在各种实施方式中，过程(2400)由处理电路系统，例如终端装置(310)、(320)、(330)和(340)中的处理电路系统、执行视频解码器(1600B)的功能的处理电路系统、以及执行包括去块NN的视频解码器的功能的处理电路系统来执行。在示例中，处理电路系统执行包括去块的功能组合。在一些实施方式中，过程(2400)在软件指令中实现，因此当处理电路系统执行软件指令时，处理电路系统执行过程(2400)。该过程开始于(S2401)，并继续进行到(S2410)。Figure 24 shows a flowchart outlining a decoding process (2400) according to an embodiment of the present disclosure. The process (2400) can be used to reconstruct and deblock boundary signals (e.g., boundary regions) based on deblocking information in the video bitstream. In various embodiments, the process (2400) is performed by processing circuitry, such as processing circuitry in terminal devices (310), (320), (330), and (340), processing circuitry performing the functions of a video decoder (1600B), and processing circuitry performing the functions of a video decoder including deblocking NN. In an example, the processing circuitry performs a combination of functions including deblocking. In some embodiments, the process (2400) is implemented in software instructions, so that the processing circuitry executes the process (2400) when the software instructions are executed. The process begins at (S2401) and continues to (S2410).

在(S2410)处，可以对要从编码视频比特流中重构的图像的块进行重构。重构块可以包括具有第一共享边界的第一相邻重构块，并且包括在第一共享边界的两侧上的样本的第一边界区域。At (S2410), blocks of an image to be reconstructed from the encoded video bitstream can be reconstructed. A reconstructed block may include first adjacent reconstructed blocks having a first shared boundary, and a first boundary region including samples on both sides of the first shared boundary.

在(S2420)处，可以对编码视频比特流中的第一去块信息进行解码。在示例中，第一去块信息对应于第一边界区域。第一去块信息可以指示视频解码器中的去块神经网络(neural network，NN)(例如，深度NN或DNN)的第一参数(或第一去块参数)。DNN的第一去块参数是先前已经通过内容自适应训练过程确定(例如，训练)的更新参数。例如，第一去块参数由在线训练过程确定。在示例中，第一去块信息可以包括第一去块参数。At (S2420), the first deblocking information in the encoded video bitstream can be decoded. In the example, the first deblocking information corresponds to a first boundary region. The first deblocking information may indicate the first parameter (or first deblocking parameter) of the deblocking neural network (NN) (e.g., a deep NN or DNN) in the video decoder. The first deblocking parameter of the DNN is an updated parameter that has been previously determined (e.g., trained) through a content-adaptive training process. For example, the first deblocking parameter is determined by an online training process. In the example, the first deblocking information may include the first deblocking parameter.

第一参数可以是去块NN中的偏置项或权重系数。The first parameter can be a bias term or weight coefficient in the deblocked neural network.

第一去块信息可以以各种方式指示去块NN的一个或更多个参数。在示例中，去块NN配置有初始参数。第一去块信息指示第一参数与初始参数之一之间的差，并且可以根据差和初始参数之一的和来确定第一参数。在另一个示例中，第一位置滤波信息直接指示一个或更多个参数。The first deblocking information can indicate one or more parameters of the deblocking neural network in various ways. In the example, the deblocking neural network is configured with initial parameters. The first deblocking information indicates the difference between the first parameter and one of the initial parameters, and the first parameter can be determined based on the sum of the difference and one of the initial parameters. In another example, the first position filtering information directly indicates one or more parameters.

在示例中，去块NN中的层的数目取决于第一边界区域的大小(例如，宽度、高度或面积)。In the example, the number of layers in the deblocking NN depends on the size of the first boundary region (e.g., width, height, or area).

在示例中，第一边界区域还包括重构块的第三两个相邻重构块之间的第三共享边界的两侧上的样本，并且第一两个相邻重构块不同于第三两个相邻重构块。In the example, the first boundary region also includes samples on both sides of the third shared boundary between the third two adjacent reconstruction blocks of the reconstruction block, and the first two adjacent reconstruction blocks are different from the third two adjacent reconstruction blocks.

在(S2430)处，可以基于由第一去块信息指示的第一参数来确定用于第一边界区域的视频解码器中的去块NN。At (S2430), the deblocking NN in the video decoder for the first boundary region can be determined based on the first parameter indicated by the first deblocking information.

在示例中，去块NN配置有初始参数，并且基于第一参数更新初始参数之一。例如，初始参数之一用第一参数替换。In the example, the deblocking NN is configured with initial parameters, and one of the initial parameters is updated based on the first parameter. For example, one of the initial parameters is replaced with the first parameter.

在(S2440)处，可以基于对应于第一边界区域的确定的去块NN对第一边界区域进行去块。At (S2440), the first boundary region can be deblocked based on the determined deblocking NN corresponding to the first boundary region.

在示例中，第一相邻重构块还包括在第一边界区域之外的非边界区域，并且第一相邻重构块中的第一边界区域用去块的第一边界区域替换。In the example, the first adjacent reconstruction block also includes a non-boundary region outside the first boundary region, and the first boundary region in the first adjacent reconstruction block is replaced by the first boundary region of the deblock.

在示例中，重构块的第二相邻重构块具有第二共享边界，并且包括具有在第二共享边界的两侧上的样本的第二边界区域。可以基于对应于第一边界区域的所确定的去块NN对第二边界区域进行去块。In the example, the second adjacent reconstruction blocks of the reconstructed block have a second shared boundary and include a second boundary region with samples on both sides of the second shared boundary. The second boundary region can be deblocked based on the determined deblocking NN corresponding to the first boundary region.

过程(2400)继续进行到(S2499)，并终止。The process (2400) continues to (S2499) and terminates.

过程(2400)可以适当地适应各种情况，并且过程(2400)中的步骤可以相应地调整。过程(2400)中的步骤中的一个或更多个可以被调整、省略、重复和/或组合。可以使用任何合适的顺序来实现该过程(2400)。可以添加额外的步骤。The process (2400) can be adapted appropriately to various situations, and the steps in the process (2400) can be adjusted accordingly. One or more steps in the process (2400) can be adjusted, omitted, repeated, and/or combined. The process (2400) can be implemented using any suitable order. Additional steps can be added.

在示例中，重构块的第二相邻重构块具有第二共享边界，并且包括在第二共享边界的两侧上的样本的第二边界区域。对应于第二边界区域的编码比特流中的第二去块信息可以被解码。第二去块信息可以指示第二参数。第二边界区域可以不同于第一边界区域。可以基于第一参数和第二参数来更新去块NN。更新的去块NN对应于第二边界区域，并且配置有至少第一参数和第二参数。可以基于对应于第二边界区域的更新的去块NN对第二边界区域进行去块。In the example, the second adjacent reconstructed blocks of the reconstructed block have a second shared boundary and include a second boundary region of samples on both sides of the second shared boundary. Second deblocking information in the encoded bitstream corresponding to the second boundary region can be decoded. The second deblocking information can indicate a second parameter. The second boundary region may be different from the first boundary region. The deblocking neural network (NN) can be updated based on the first and second parameters. The updated deblocking NN corresponds to the second boundary region and is configured with at least the first and second parameters. The second boundary region can be deblocked based on the updated deblocking NN corresponding to the second boundary region.

本公开内容中的实施方式可以单独地使用或以任意顺序组合地使用。此外，可以通过处理电路(例如，一个或更多个处理器或一个或更多个集成电路)来实现方法(或实施方式)、编码器以及解码器的每一个。在一个示例中，一个或更多个处理器执行存储在非暂态计算机可读介质中的程序。The embodiments described in this disclosure can be used individually or in any combination in any order. Furthermore, each of the method (or embodiment), encoder, and decoder can be implemented by processing circuitry (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors execute a program stored on a non-transitory computer-readable medium.

本公开内容不对用于编码器(诸如基于NN的编码器)、解码器(诸如基于NN的解码器)的方法施加任何限制。在编码器、解码器等中使用的神经网络可以是任何合适类型的神经网络，诸如DNN、CNN等。This disclosure does not impose any restrictions on the methods used for encoders (such as NN-based encoders) or decoders (such as NN-based decoders). The neural networks used in encoders, decoders, etc., can be any suitable type of neural network, such as DNNs, CNNs, etc.

因此，本公开内容的内容自适应在线训练方法可以适应不同类型的NIC框架，例如不同类型的编码DNN、解码DNN、编码CNN、解码CNN等。Therefore, the content-adaptive online training method of this disclosure can be adapted to different types of NIC frameworks, such as different types of encoding DNN, decoding DNN, encoding CNN, decoding CNN, etc.

上述技术可以实现为使用计算机可读指令的计算机软件，并且物理存储在一个或更多个计算机可读介质中。例如，图25示出了适于实现所公开主题的某些实施方式的计算机系统(2500)。The above-described techniques can be implemented as computer software using computer-readable instructions and physically stored in one or more computer-readable media. For example, Figure 25 illustrates a computer system (2500) suitable for implementing certain embodiments of the disclosed subject matter.

可以使用任何合适的机器代码或计算机语言对计算机软件进行编码，所述机器代码或计算机语言可以经受汇编、编译、链接等机制以创建包括指令的代码，所述指令可以由一个或更多个计算机中央处理单元(central processing unit，CPU)、图形处理单元(Graphics Processing Unit，GPU)等直接执行或者通过解译、微代码执行等执行。Computer software can be coded using any suitable machine code or computer language, which can be subjected to mechanisms such as assembly, compilation, and linking to create code including instructions that can be executed directly by one or more computer central processing units (CPUs), graphics processing units (GPUs), or through interpretation, microcode execution, etc.

指令可以在各种类型的计算机或其部件(包括例如个人计算机、平板计算机、服务器、智能电话、游戏装置、物联网装置等)上执行。The instructions can be executed on various types of computers or their components, including, for example, personal computers, tablets, servers, smartphones, gaming devices, Internet of Things devices, etc.

图25中示出的用于计算机系统(2500)的部件本质上是示例性的，并且不旨在对实现本公开内容的实施方式的计算机软件的使用范围或功能提出任何限制。部件的配置也不应当被解释为具有与计算机系统(2500)的示例性实施方式中示出的部件中的任何一个部件或部件的组合有关的任何依赖性或要求。The components for the computer system (2500) shown in Figure 25 are exemplary in nature and are not intended to impose any limitation on the scope or functionality of computer software implementing embodiments of this disclosure. The configuration of the components should also not be construed as having any dependency or requirement relating to any one or a combination of components shown in the exemplary embodiments of the computer system (2500).

计算机系统(2500)可以包括某些人机接口输入装置。这样的人机接口输入装置可以对由一个或更多个人类用户通过例如触觉输入(例如：击键、滑动、数据手套移动)、音频输入(例如：语音、拍打)、视觉输入(例如：姿势)、嗅觉输入(未描绘)实现的输入作出响应。人机接口装置还可以用于捕获不一定与人的有意输入直接有关的某些介质，例如，音频(例如：语音、音乐、环境声音)、图像(例如：扫描图像、从静态图像摄像装置获得的摄影图像)、视频(例如二维视频、包括立体视频的三维视频)。The computer system (2500) may include certain human-machine interface input devices. Such human-machine interface input devices can respond to input from one or more human users through, for example, tactile input (e.g., keystrokes, swipes, data glove movements), audio input (e.g., voice, tapping), visual input (e.g., gestures), and olfactory input (not depicted). The human-machine interface devices can also be used to capture certain media that are not necessarily directly related to intentional human input, such as audio (e.g., voice, music, ambient sound), images (e.g., scanned images, photographic images obtained from still image capturing devices), and video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

输入人机接口装置可以包括以下中的一个或更多个(仅描绘每个中的一个)：键盘(2501)、鼠标(2502)、轨迹板(2503)、触摸屏(2510)、数据手套(未示出)、操纵杆(2505)、麦克风(2506)、扫描仪(2507)和摄像机(2508)。The input human-machine interface device may include one or more of the following (only one of each is depicted): keyboard (2501), mouse (2502), trackpad (2503), touch screen (2510), data glove (not shown), joystick (2505), microphone (2506), scanner (2507), and camera (2508).

计算机系统(2500)还可以包括某些人机接口输出装置。这样的人机接口输出装置可以通过例如触觉输出、声音、光和气味/味道来刺激一个或更多个人类用户的感官。这样的人机接口输出装置可以包括：触觉输出装置(例如，通过触摸屏(2510)、数据手套(未示出)或操纵杆(2505)进行的触觉反馈，但是也可以存在不用作输入装置的触觉反馈装置)；音频输出装置(例如：扬声器(2509)、头戴式耳机(未描绘))；视觉输出装置(例如，屏幕(2510)，包括CRT屏幕、LCD屏幕、等离子屏幕、OLED屏幕，每个均具有或不具有触摸屏输入能力，每个均具有或不具有触觉反馈能力——其中的一些可能能够通过诸如立体图像输出的方式输出二维视觉输出或多于三维输出；虚拟现实眼镜(未描绘)；全息显示器和烟罐(未描绘))；以及打印机(未描绘)。The computer system (2500) may also include certain human-machine interface output devices. Such human-machine interface output devices may stimulate the senses of one or more human users through, for example, tactile output, sound, light, and smell/taste. Such human-machine interface output devices may include: tactile output devices (e.g., tactile feedback via a touchscreen (2510), data gloves (not shown), or joystick (2505), but tactile feedback devices that are not used as input devices may also exist); audio output devices (e.g., speakers (2509), headphones (not depicted)); visual output devices (e.g., screens (2510), including CRT screens, LCD screens, plasma screens, OLED screens, each with or without touchscreen input capability, each with or without tactile feedback capability—some of which may be able to output two-dimensional or more than three-dimensional visual output in a manner such as stereoscopic image output; virtual reality glasses (not depicted); holographic displays and ashtrays (not depicted)); and printers (not depicted).

计算机系统(2500)还可以包括人类可访问的存储装置及其相关联的介质，例如包括具有CD/DVD等介质(2521)的CD/DVD ROM/RW(2520)的光学介质、拇指驱动器(2522)、可移除硬盘驱动器或固态驱动器(2523)、诸如磁带和软盘的传统磁介质(未描绘)、基于专用ROM/ASIC/PLD的装置诸如安全软件狗(未描绘)等。The computer system (2500) may also include human-accessible storage devices and their associated media, such as optical media including CD/DVD ROM/RW (2520) having media such as CD/DVD (2521), thumb drives (2522), removable hard disk drives or solid-state drives (2523), conventional magnetic media such as magnetic tapes and floppy disks (not depicted), and devices based on dedicated ROM/ASIC/PLD such as security dongles (not depicted).

本领域技术人员还应当理解，结合当前公开的主题使用的术语“计算机可读介质”不包含传输介质、载波或其他瞬时信号。Those skilled in the art should also understand that the term "computer-readable medium" as used in connection with the presently disclosed subject matter does not include transmission media, carrier waves, or other transient signals.

计算机系统(2500)还可以包括到一个或更多个通信网络(2555)的接口(2554)。网络可以例如是无线的、有线的、光学的。网络还可以是局域网、广域网、城域网、车辆和工业网络、实时网络、延迟容忍的网络等。网络的示例包括局域网诸如以太网、无线LAN，包括GSM、3G、4G、5G、LTE等的蜂窝网络，包括有线电视、卫星电视和地面广播电视的电视有线或无线广域数字网络，包括CANBus的车辆和工业网络等。某些网络通常需要附接至某些通用数据端口或外围总线(2549)(例如，计算机系统(2500)的USB端口)的外部网络接口适配器；其他网络通常通过附接至如下所述的系统总线集成到计算机系统(2500)的核心中(例如，通过以太网接口集成到PC计算机系统，或通过蜂窝网络接口集成到智能电话计算机系统)。计算机系统(2500)可以通过使用这些网络中的任何网络与其他实体进行通信。这样的通信可以是单向仅接收的(例如，广播电视)、单向仅发送的(例如，到某些CANbus装置的CANbus)、或双向的，例如到使用局域或广域数字网络的其他计算机系统。可以在如上所述的这些网络和网络接口中的每个网络和网络接口上使用特定的协议和协议栈。The computer system (2500) may also include interfaces (2554) to one or more communication networks (2555). These networks may be, for example, wireless, wired, or optical. Networks may also be local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), vehicle and industrial networks, real-time networks, latency-tolerant networks, etc. Examples of networks include LANs such as Ethernet, wireless LANs, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., cable or wireless wide area digital networks including cable television, satellite television, and terrestrial broadcast television, vehicle and industrial networks including CANBus, etc. Some networks typically require external network interface adapters attached to certain general-purpose data ports or peripheral buses (2549) (e.g., USB ports of the computer system (2500)); other networks are typically integrated into the core of the computer system (2500) via system buses as described below (e.g., integrated into a PC computer system via an Ethernet interface, or integrated into a smartphone computer system via a cellular network interface). The computer system (2500) can communicate with other entities using any of these networks. Such communication can be one-way (receive only, e.g., broadcast television), one-way (transmit only, e.g., to a CANbus device), or bidirectional, e.g., to other computer systems using local or wide area digital networks. Specific protocols and protocol stacks can be used on each of these networks and network interfaces as described above.

以上提及的人机接口装置、人类可访问存储装置和网络接口可以附接至计算机系统(2500)的核(2540)。The human-machine interface device, human-accessible storage device and network interface mentioned above can be attached to the core (2540) of the computer system (2500).

核(2540)可以包括一个或更多个中央处理单元(CPU)(2541)、图形处理单元(GPU)(2542)、现场可编程门区域(Field Programmable Gate Area，FPGA)(2543)形式的专用可编程处理单元、用于某些任务的硬件加速器(2544)、图形适配器(2550)等。这些装置以及只读存储器(Read-only memory，ROM)(2545)、随机存取存储器(2546)、内部大容量存储装置(2547)例如内部非用户可访问的硬盘驱动器、SSD等可以通过系统总线(2548)连接。在一些计算机系统中，可以以一个或更多个物理插头的形式访问系统总线(2548)，以通过另外的CPU、GPU等实现扩展。外围装置可以直接地或通过外围总线(2549)附接至核的系统总线(2548)。在示例中，屏幕(2510)可以连接至图形适配器(2550)。外围总线的架构包括PCI、USB等。The core (2540) may include one or more central processing units (CPUs) (2541), graphics processing units (GPUs) (2542), dedicated programmable processing units in the form of field-programmable gate areas (FPGAs) (2543), hardware accelerators (2544) for certain tasks, graphics adapters (2550), etc. These devices, along with read-only memory (ROM) (2545), random access memory (2546), and internal mass storage devices (2547) such as internal non-user-accessible hard disk drives, SSDs, etc., may be connected via the system bus (2548). In some computer systems, the system bus (2548) may be accessed via one or more physical connectors to allow for expansion through additional CPUs, GPUs, etc. Peripheral devices may be attached directly or via a peripheral bus (2549) to the core's system bus (2548). In this example, a screen (2510) may be connected to the graphics adapter (2550). Peripheral bus architectures include PCI, USB, etc.

CPU(2541)、GPU(2542)、FPGA(2543)和加速器(2544)可以执行某些指令，这些指令可以组合构成以上提及的计算机代码。该计算机代码可以被存储在ROM(2545)或RAM(2546)中。过渡数据也可以被存储在RAM(2546)中，而永久数据可以被存储在例如内部大容量存储装置(2547)中。可以通过使用缓存存储器来实现对存储装置中的任何存储装置的快速存储和检索，该缓存存储器可以与一个或更多个CPUThe CPU (2541), GPU (2542), FPGA (2543), and accelerator (2544) can execute certain instructions, which can be combined to form the computer code mentioned above. This computer code can be stored in ROM (2545) or RAM (2546). Transient data can also be stored in RAM (2546), while permanent data can be stored, for example, in an internal mass storage device (2547). Fast storage and retrieval of any storage device can be achieved by using a cache memory, which can be integrated with one or more CPUs.

(2541)、GPU(2542)、大容量存储装置(2547)、ROM(2545)、RAM(2541), GPU (2542), Mass Storage Device (2547), ROM (2545), RAM

(2546)等紧密关联。(2546) and other closely related.

计算机可读介质上可以具有用于执行各种计算机实现的操作的计算机代码。介质和计算机代码可以是出于本公开内容的目的而专门设计和构造的介质和计算机代码，或者介质和计算机代码可以具有计算机软件领域的技术人员公知且可用的类型。Computer-readable media may contain computer code for performing operations of various computer implementations. The media and computer code may be specifically designed and constructed for the purposes of this disclosure, or the media and computer code may be of a type known and available to those skilled in the art of computer software.

作为示例而非限制，具有架构的计算机系统(2500)特别是核(2540)可以提供作为处理器(包括CPU、GPU、FPGA、加速器等)执行体现在一个或更多个有形计算机可读介质中的软件的结果而提供的功能。这样的计算机可读介质可以是与如以上所介绍的用户可访问的大容量存储装置相关联的介质，以及具有非暂态性质的核(2540)的某些存储装置，例如核内部的大容量存储装置(2547)或ROM(2545)。可以将实现本公开内容的各种实施方式的软件存储在这样的装置中并且由核(2540)执行。根据特定需求，计算机可读介质可以包括一个或更多个存储器装置或芯片。软件可以使核(2540)——特别是其中的处理器(包括CPU、GPU、FPGA等)——执行本文中描述的特定处理或特定处理的特定部分，包括限定存储在RAM(2546)中的数据结构以及根据由软件限定的处理修改这样的数据结构。另外地或作为替选，计算机系统可以由于硬连线或以其他方式体现在电路中(例如：加速器(2544))的逻辑而提供功能，该逻辑可以代替软件而操作或与软件一起操作以执行本文中描述的特定处理或特定处理的特定部分。在适当的情况下，提及软件可以包含逻辑，并且反之提及逻辑也可以包含软件。在适当的情况下，提及计算机可读介质可以包含存储用于执行的软件的电路(例如，集成电路(integrated circuit，IC))、实施用于执行的逻辑的电路或上述两者。本公开内容包括硬件与软件的任何合适的组合。By way of example and not limitation, a computer system (2500) with an architecture, particularly a core (2540), can provide functionality as a result of a processor (including a CPU, GPU, FPGA, accelerator, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media can be media associated with user-accessible mass storage devices as described above, as well as certain storage devices of the core (2540) having non-transitory characteristics, such as a mass storage device (2547) or ROM (2545) within the core. Software implementing various embodiments of this disclosure can be stored in such devices and executed by the core (2540). Depending on specific needs, the computer-readable media may include one or more memory devices or chips. The software can cause the core (2540)—particularly the processor therein (including a CPU, GPU, FPGA, etc.)—to execute specific processes or specific portions of specific processes described herein, including defining data structures stored in RAM (2546) and modifying such data structures according to the processes defined by the software. Alternatively or as an alternative, the computer system may be functionalized by logic hardwired or otherwise embodied in circuitry (e.g., accelerator (2544)), which may operate in place of or with the software to perform a particular process or a particular portion of a particular process described herein. Where appropriate, references to software may include logic, and conversely, references to logic may include software. Where appropriate, references to a computer-readable medium may include circuitry (e.g., an integrated circuit (IC)) storing software for execution, circuitry implementing logic for execution, or both. This disclosure includes any suitable combination of hardware and software.

附录A：首字母缩写词Appendix A: Acronyms

JEM：联合开发模型JEM: Joint Development Model

VVC：多功能视频编码VVC: Multi-functional Video Coding

BMS：基准集BMS: Benchmark Set

MV：运动矢量MV: Motion Vector

HEVC：高效视频编解码HEVC: High-efficiency video encoding and decoding

SEI：补充增强信息SEI: Supplemental Enhancement Information

VUI：视频可用性信息VUI: Video Availability Information

GOPs：图片组GOPs: Image Group

TUs：变换单元TUs: Transformation Unit

PUs：预测单元PUs: Prediction Units

CTUs：编码树单元CTUs: Coding Tree Units

CTBs：编码树块CTBs: Coded Tree Blocks

PBs：预测块PBs: Predicted Blocks

HRD：假想参考解码器HRD: Hypothetical Reference Decoder

SNR：信噪比SNR: Signal-to-noise ratio

CPUs：中央处理单元CPUs: Central Processing Unit

GPUs：图形处理单元GPUs: Graphics Processing Units

CRT：阴极射线管CRT: Cathode Ray Tube

LCD：液晶显示器LCD: Liquid Crystal Display

OLED：有机发光二极管OLED: Organic Light Emitting Diode

CD：致密盘CD: Compact Disc

DVD：数字视频光盘DVD: Digital Video Disc

ROM：只读存储器ROM: Read-Only Memory

RAM：随机存取存储器RAM: Random Access Memory

ASIC：专用集成电路ASIC: Application-Specific Integrated Circuit

PLD：可编程逻辑装置PLD: Programmable Logic Device

LAN：局域网LAN: Local Area Network

GSM：全球移动通信系统GSM: Global System for Mobile Communications

LTE：长期演进LTE: Long Term Evolution

CANBus：控制器区域网络总线CANBus: Controller Area Network Bus

USB：通用串行总线USB: Universal Serial Bus

PCI：外围组件互连PCI: Peripheral Component Interconnect

FPGA：现场可编程门区域FPGA: Field Programmable Gate Domain

SSD：固态驱动器SSD: Solid State Drive

IC：集成电路IC: Integrated Circuit

CU：编码单元CU: Encoding Unit

NIC：神经图像压缩NIC: Neural Image Compression

R-D：率失真R-D: Rate Distortion

E2E：端到端E2E: End-to-End

ANN：人工神经网络ANN: Artificial Neural Network

DNN：深度神经网络DNN: Deep Neural Network

CNN：卷积神经网络CNN: Convolutional Neural Network

虽然本公开内容已经描述了若干示例性实施方式，但是存在落入本公开内容的范围内的变更、置换和各种替代等同物。因此将认识到，虽然本文中没有明确示出或描述，但是本领域技术人员能够设想实施本公开内容的原理并且因此在其精神和范围内的许多系统和方法。While this disclosure has described several exemplary embodiments, variations, substitutions, and various alternative equivalents fall within the scope of this disclosure. It will therefore be appreciated that, although not expressly shown or described herein, those skilled in the art will be able to conceive of many systems and methods that implement the principles of this disclosure and thus within its spirit and scope.

Claims

1. A video decoding method, characterized in that it includes:

Reconstructing blocks of an image from an encoded video bitstream;

Decode the first deblocking information in the encoded video bitstream, the first deblocking information including the first deblocking parameters of the deep neural network (DNN) in the video decoder, wherein the first deblocking parameters of the DNN are updated parameters previously determined through a content adaptive training process, and the first deblocking parameters are bias terms or weight coefficients in the DNN.

The DNN in the video decoder is determined based on the first deblocking parameters included in the first deblocking information, for a first boundary region including a subset of samples in the reconstructed block, wherein the number of layers in the DNN depends on the size of the first boundary region; and

The first boundary region, which includes the subset of samples in the reconstructed block, is deblocked based on a DNN determined according to the first deblocking parameters.

The reconstruction block includes a first adjacent reconstruction block, the first adjacent reconstruction block having a first shared boundary and including the first boundary region of the samples on both sides of the first shared boundary;

The first adjacent reconstruction block also includes a non-boundary region outside the first boundary region; and

Replace the first boundary region in the first adjacent reconstructed block with the first boundary region after deblocking;

The reconstruction block includes a second adjacent reconstruction block, the second adjacent reconstruction block having a second shared boundary and including a second boundary region of samples on both sides of the second shared boundary; and

The method further includes:

Decode the second deblocking information in the encoded video bitstream corresponding to the second boundary region. The second deblocking information indicates the second deblocking parameters that have been previously determined through the content adaptive training process. The second boundary region is different from the first boundary region.

The DNN is updated based on the first deblocking parameter and the second deblocking parameter. The updated DNN corresponds to the second boundary region and is configured with the first deblocking parameter and the second deblocking parameter.

The second boundary region is deblocked based on the updated DNN corresponding to the second boundary region;

Among them, adjacent reconstruction blocks are two adjacent reconstruction sub-blocks.

2. The method according to claim 1, characterized in that,

The DNN is configured with initial parameters, and

Determining the DNN includes updating one of the initial parameters based on the first deblocking parameters.

3. The method according to claim 2, characterized in that,

The first deblocking information indicates the difference between the first deblocking parameter and one of the initial parameters, and

The method further includes determining the first deblocking parameter based on the sum of the difference and one of the initial parameters.

4. The method according to claim 1, characterized in that,

The first boundary region also includes samples on both sides of the third shared boundary, which lies between two adjacent reconstruction blocks included in the reconstruction block.

The first two adjacent reconstruction blocks are different from the third two adjacent reconstruction blocks.

5. A video decoding device, characterized in that it comprises:

The reconstruction module is used to reconstruct blocks of an image to be reconstructed from the encoded video bitstream;

The first decoding module is used to decode the first deblocking information in the encoded video bitstream. The first deblocking information includes the first deblocking parameters of the deep neural network (DNN) in the video decoder. The first deblocking parameters of the DNN are update parameters that have been previously determined through a content adaptive training process. The first deblocking parameters are bias terms or weight coefficients in the DNN.

A first determining module is configured to determine the DNN in the video decoder based on the first deblocking parameters included in the first deblocking information, for a first boundary region including a subset of samples in the reconstructed block, wherein the number of layers in the DNN depends on the size of the first boundary region; and

A first deblocking module is configured to deblock the first boundary region, which includes the subset of samples in the reconstructed block, based on a determined DNN corresponding to the first deblocking parameters.

The first deblocking module is further configured to:

6. An electronic device comprising a processor and a memory, characterized in that:

The memory stores computer-readable instructions, and

When the computer-readable instructions are executed by the processor, the electronic device performs the method according to any one of claims 1 to 4.

7. A non-transitory computer-readable storage medium storing a program, the program being executable by at least one processor to perform the method according to any one of claims 1 to 4.