HK40046974B

HK40046974B - Method and apparatus for decoding encoded video sequence, and storage medium

Info

Publication number: HK40046974B
Application number: HK62021036172.6A
Authority: HK
Inventors: 史蒂芬·文格尔; 夜静; 崔秉斗; 刘杉
Original assignee: 腾讯美国有限责任公司
Priority date: 2019-01-02
Filing date: 2019-12-27
Publication date: 2024-05-10

Description

Methods, apparatus and storage media for decoding encoded video sequences

相关申请的交叉引用Cross-references to related applications

本申请要求于2019年1月2日提交至美国专利商标局的美国临时专利申请第62/704,040号的优先权，以及于2019年12月11日提交至美国专利商标局的美国专利申请第16/710,389号的优先权，这两个申请通过引用整体并入本文。This application claims priority to U.S. Provisional Patent Application No. 62/704,040, filed January 2, 2019, with the United States Patent and Trademark Office, and to U.S. Patent Application No. 16/710,389, filed December 11, 2019, with the United States Patent and Trademark Office, both of which are incorporated herein by reference in their entirety.

技术领域Technical Field

所公开的主题涉及视频编码和解码，更具体地涉及用于在高级语法(synatx)结构中自适应图片分辨率重新缩放的语法元素以及图片段的相关的解码和缩放处理。The disclosed topics relate to video encoding and decoding, and more specifically to syntax elements for adaptive image resolution rescaling in high-level syntax (synatx) structures, as well as related decoding and scaling processing of image segments.

背景技术Background Technology

使用具有运动补偿的帧间图片预测的视频编码和解码在几十年来是已知的。未压缩的数字视频可以由一系列图片组成，每个图片具有如下空间维度，例如1920×1080亮度样本和相关联的色度样本。该一系列图片可以具有固定的或可变的图片速率(也非正式地称为帧率)，例如每秒60幅图片或60Hz。未压缩的视频具有显著的比特率要求。例如，在每样本8比特下，1080p60 4:2:0视频(在60Hz帧率下具有1920×1080亮度样本分辨率)需要接近1.5Gbit/s的带宽。一小时的这种视频需要超过600GB的存储空间。Video encoding and decoding using inter-frame prediction with motion compensation has been known for decades. Uncompressed digital video can consist of a series of pictures, each with spatial dimensions such as 1920×1080 luma samples and associated chroma samples. This series of pictures can have a fixed or variable picture rate (also informally called the frame rate), such as 60 pictures per second or 60Hz. Uncompressed video has significant bitrate requirements. For example, at 8 bits per sample, 1080p60 4:2:0 video (with 1920×1080 luma sample resolution at a 60Hz frame rate) requires close to 1.5 Gbit/s of bandwidth. One hour of such video would require more than 600 GB of storage space.

视频编码和解码的一个目的可以是通过压缩来减少输入视频信号中的冗余。压缩可有助于减少前述带宽或存储空间需求，在一些情况下可以减少两个数量级或大于两个数量级。可以采用无损压缩和有损压缩以及它们的组合。无损压缩是指可以从已压缩的原始信号中重建原始信号的精确副本的技术。当使用有损压缩时，已重建信号可能与原始信号不相同，但是原始信号和已重建信号之间的失真足够小，以使得已重建信号可用于预期应用。在视频的情况下，广泛使用有损压缩。容许的失真量取决于应用；例如，某些消费流式应用的用户相比电视分布应用的用户来说可以容忍更高的失真。可实现的压缩率可以反映：更高的可允许/可容许的失真可以产生更高的压缩率比。One objective of video encoding and decoding is to reduce redundancy in the input video signal through compression. Compression can help reduce the aforementioned bandwidth or storage space requirements, in some cases by two orders of magnitude or more. Lossless compression, lossy compression, and combinations thereof can be employed. Lossless compression refers to a technique that can reconstruct an exact copy of the original signal from the compressed original signal. When using lossy compression, the reconstructed signal may not be identical to the original signal, but the distortion between the original and reconstructed signals is small enough that the reconstructed signal can be used for the intended application. In the case of video, lossy compression is widely used. The amount of distortion tolerated depends on the application; for example, users of some consumer streaming applications may tolerate higher distortion than users of television distribution applications. The achievable compression ratio can be reflected in the fact that higher permissible/acceptable distortion results in a higher compression ratio.

视频编码器和解码器可利用来自若干广泛类别的技术，例如包括运动补偿、变换、量化和熵编码，下文将介绍其中一些技术。Video encoders and decoders can utilize techniques from several broad categories, including motion compensation, transform, quantization, and entropy coding, some of which will be discussed below.

基本信息理论表明，给定内容的系列图片的图片的空间上较低分辨率的表示可以被压缩成比较大表示更少的比特。因此，当带宽或存储不足时，或者对于不需要高空间分辨率的、对成本敏感的应用，在编码之前对输入的系列图片进行下采样并在解码之后进行相应的上采样以获得适于显示的图片，这种方式已经使用了几十年。例如，至少一些基于MPEG-2的TV分发/显示系统，在信道上可用的带宽不足以允许良好的再现质量时，可能会在编码环路之外在每个图片群组的基础上改变图片的水平分辨率。在这方面，应该注意，许多视频编解码器具有“断点”，也称为“拐点”(在速率-失真曲线中)，在该点处通过增加量化器值(为了保持在速率包络内)实现的平缓质量降低被破坏，且突然出现显著的质量降低。由于一些视频分发系统在非常接近针对平均复杂度内容的断点处操作，所以活动的突然增加可导致可能不容易由后处理技术补偿的恼人的伪影。Basic information theory states that a lower spatial resolution representation of a series of pictures of a given set of content can be compressed into a larger representation using fewer bits. Therefore, for decades, the practice of downsampling the input series of pictures before encoding and upsampling them accordingly after decoding to obtain a displayable picture has been used when bandwidth or storage is insufficient, or for cost-sensitive applications that do not require high spatial resolution. For example, at least some MPEG-2-based TV distribution/display systems may change the horizontal resolution of the pictures on a per-picture-group basis outside the encoding loop when the available bandwidth on the channel is insufficient to allow for good reproduction quality. In this regard, it should be noted that many video codecs have a “breakpoint,” also known as a “knot point” (in the rate-distortion curve), where the gradual quality degradation achieved by increasing the quantizer value (to stay within the rate envelope) is broken, and a sudden, significant quality degradation occurs. Because some video distribution systems operate very close to the breakpoint for average-complexity content, this sudden increase in activity can lead to annoying artifacts that may not be easily compensated for by post-processing techniques.

尽管从视频编解码器的实现和规范的角度来看，在编码环路之外改变分辨率可能是相对简单的问题，但是它也不是特别有效。这是因为分辨率的变化可能需要帧内编码图片，在许多情况下帧内编码图片可能比已编码视频比特流中最常见的帧间编码图片大许多倍。增加附加种类的帧内编码图片以对抗本质上可能是带宽缺乏问题的问题，可能产生相反效果，需要大的缓冲器和相关联的大的可能延迟才能起作用。While changing the resolution outside the coding loop might seem relatively simple from a video codec implementation and specification perspective, it's not particularly efficient. This is because resolution changes can require intra-coded images, which in many cases can be many times larger than the inter-coded images most commonly found in the encoded video bitstream. Adding additional types of intra-coded images to combat what may inherently be a bandwidth shortage problem can have the opposite effect, requiring large buffers and associated large potential delays to be effective.

对于延迟至关重要的应用，已设计允许在编码环路中改变视频序列的分辨率且不使用帧内编码图片的机制。由于那些技术需要对参考图片进行重新采样，所以通常被称为参考图片重新采样(RPR-)技术。RPR已引入到标准化的视频编码中，且已在1998年发行的ITU-T建议书H.263附录P中的某些视频会议系统中看到相对广泛的部署。该技术至少遇到以下缺点：1.用于用信号表示参考图片重新采样的语法不能容错，2.所使用的上采样和下采样滤波器-双线性滤波器-尽管在计算上较为经济，但不是非常有益于良好的视频质量，3.允许“扭曲”的特定技术可能对于不必要和无根据的特征过于丰富，以及4.该技术只能应用于整个图片，而不能应用于图片段。For applications where latency is critical, mechanisms have been designed to allow changes in the resolution of a video sequence within the coding loop without using intra-frame coded pictures. Because these techniques require resampling of the reference picture, they are commonly referred to as Reference Picture Resampling (RPR) techniques. RPR has been introduced into standardized video coding and has seen relatively widespread deployment in some video conferencing systems, as outlined in Annex P of ITU-T Recommendation H.263, issued in 1998. This technique suffers from at least the following drawbacks: 1. The syntax used to represent reference picture resampling with signals is not error-tolerant; 2. The upsampling and downsampling filters used—bilinear filters—while computationally economical, are not very beneficial for good video quality; 3. The specific techniques that allow for “distortion” may be overly rich for unnecessary and unfounded features; and 4. This technique can only be applied to the entire picture, not to a segment of the picture.

最近的被称为AV1的视频编码技术对RPR的支持也有限。遇到与上述问题#1和#4类似的问题，此外，所采用的滤波器对于某些应用来说非常复杂。The more recent video coding technology known as AV1 also has limited support for RPR. It encounters problems similar to those mentioned in #1 and #4 above, and furthermore, the filters used are quite complex for some applications.

发明内容Summary of the Invention

根据实施例，一种对已编码视频序列的已编码图片进行解码的方法，由至少一个处理器执行，且该方法包括：从用于多个图片的第一高级语法结构，对与参考段分辨率相关的语法元素进行解码；从由第一已编码图片到第二已编码图片而变化的第二高级语法结构，对与已解码段分辨率相关的语法元素进行解码；对来自参考图片缓冲器的样本进行重新采样，以供解码器为进行预测而使用，解码器以解码分辨率对段进行解码，且来自参考图片缓冲器的样本处于参考段分辨率；将处于已解码段分辨率的段解码成处于已解码段分辨率的已解码段；以及将已解码段存储在参考图片缓冲器中。According to an embodiment, a method for decoding encoded images of an encoded video sequence, executed by at least one processor, includes: decoding syntax elements related to a reference segment resolution from a first high-level syntax structure for a plurality of images; decoding syntax elements related to a decoded segment resolution from a second high-level syntax structure that varies from a first encoded image to a second encoded image; resampling samples from a reference image buffer for use by a decoder for prediction, the decoder decoding segments at a decoding resolution, and the samples from the reference image buffer being at the reference segment resolution; decoding segments at the decoded segment resolution into decoded segments at the decoded segment resolution; and storing the decoded segments in the reference image buffer.

根据实施例，一种用于对已编码视频序列的已编码图片进行解码的装置，包括：至少一个存储器，被配置为存储计算机程序代码；和至少一个处理器，被配置为访问至少一个存储器并根据计算机程序代码进行操作，计算机程序代码包括：第一解码代码，被配置为从用于多个图片的第一高级语法结构，对与参考段分辨率相关的语法元素进行解码；第二解码代码，被配置为从由第一已编码图片到第二已编码图片而变化的第二高级语法结构，对与已解码段分辨率相关的语法元素进行解码；重新采样代码，被配置为对来自参考图片缓冲器的样本进行重新采样，以供解码器为进行预测而使用，解码器以解码分辨率对段进行解码，且来自参考图片缓冲器的样本处于参考段分辨率；第三解码代码，被配置为将处于已解码段分辨率的段解码成处于已解码段分辨率的已解码段；以及存储代码，被配置为将已解码段存储在参考图片缓冲器中。According to an embodiment, an apparatus for decoding encoded images of an encoded video sequence includes: at least one memory configured to store computer program code; and at least one processor configured to access the at least one memory and operate according to the computer program code, the computer program code including: first decoding code configured to decode syntax elements related to a reference segment resolution from a first high-level syntax structure for a plurality of images; second decoding code configured to decode syntax elements related to a decoded segment resolution from a second high-level syntax structure varying from a first encoded image to a second encoded image; resampling code configured to resample samples from a reference image buffer for use by a decoder for prediction, the decoder decoding segments at a decoding resolution, and the samples from the reference image buffer being at the reference segment resolution; third decoding code configured to decode segments at the decoded segment resolution into decoded segments at the decoded segment resolution; and storage code configured to store the decoded segments in the reference image buffer.

根据实施例，一种非暂时性计算机可读存储介质，存储用于对已编码视频序列的已编码图片进行解码的程序，该程序包括指令，该指令使处理器：从用于多个图片的第一高级语法结构，对与参考段分辨率相关的语法元素进行解码；从由第一已编码图片到第二已编码图片而变化的第二高级语法结构，对与已解码段分辨率相关的语法元素进行解码；对来自参考图片缓冲器的样本进行重新采样，以供解码器为进行预测而使用，解码器以解码分辨率对段进行解码，且来自参考图片缓冲器的样本处于参考段分辨率；将处于已解码段分辨率的段解码成处于已解码段分辨率的已解码段；以及将已解码段存储在参考图片缓冲器中。According to an embodiment, a non-transitory computer-readable storage medium stores a program for decoding encoded images of an encoded video sequence. The program includes instructions that cause a processor to: decode syntax elements related to a reference segment resolution from a first high-level syntax structure for a plurality of images; decode syntax elements related to a decoded segment resolution from a second high-level syntax structure that varies from a first encoded image to a second encoded image; resample samples from a reference image buffer for use by a decoder for prediction, the decoder decoding segments at a decoding resolution, and the samples from the reference image buffer being at the reference segment resolution; decode segments at the decoded segment resolution into decoded segments at the decoded segment resolution; and store the decoded segments in the reference image buffer.

附图说明Attached Figure Description

根据以下详细说明和附图，所公开的主题的其它特征、性质和各种优点将变得更加明显，在附图中：Other features, properties, and various advantages of the disclosed subject matter will become more apparent from the following detailed description and accompanying drawings, in which:

图1是根据实施例的通信系统的简化框图的示意图。Figure 1 is a simplified block diagram of a communication system according to an embodiment.

图2是根据实施例的通信系统的简化框图的示意图。Figure 2 is a simplified block diagram of a communication system according to an embodiment.

图3是根据实施例的解码器的简化框图的示意图。Figure 3 is a simplified block diagram of the decoder according to an embodiment.

图4是根据实施例的编码器的简化框图的示意图。Figure 4 is a simplified block diagram of the encoder according to an embodiment.

图5是根据实施例的语法图。Figure 5 is a syntax diagram according to an embodiment.

图6是根据实施例的具有参考图片重新采样能力的解码器的简化框图的示意图。Figure 6 is a simplified block diagram of a decoder with reference image resampling capability according to an embodiment.

图7是根据实施例的图块布局的示意图，该图块布局对每个图块使用参考图片重新采样。Figure 7 is a schematic diagram of a tile layout according to an embodiment, which resamples each tile using a reference image.

图8A是示出根据实施例的对已编码视频序列的已编码图片进行解码的方法的流程图。Figure 8A is a flowchart illustrating a method for decoding encoded images of an encoded video sequence according to an embodiment.

图8B是根据实施例的用于对视频序列的解码进行控制的装置的简化框图。Figure 8B is a simplified block diagram of an apparatus for controlling the decoding of a video sequence according to an embodiment.

图9是根据实施例的计算机系统的示意图。Figure 9 is a schematic diagram of a computer system according to an embodiment.

具体实施方式Detailed Implementation

当高活动内容出现时，为了解决当在平均内容的断点附近运行视频编码时可能出现的质量问题，需要环路内RPR技术。与已知的技术相比，该技术需要从性能和计算复杂度的角度使用有效的滤波器，需要容错性，且需要仅可应用于图片的一部分，即(至少矩形的)图片段。When high-activity content is present, in-loop RPR (Rapid Reduction) techniques are needed to address potential quality issues when video encoding is run near breakpoints in average content. Compared to known techniques, this technique requires efficient filters in terms of performance and computational complexity, requires fault tolerance, and needs to be applicable only to a portion of the image, i.e., (at least a rectangular) image segment.

图1示出了根据本公开的实施例的通信系统(100)的简化框图。系统(100)可以包括经由网络(150)互连的至少两个终端(110-120)。对于数据的单向传输，第一终端(110)可以对本地位置处的视频数据进行编码，以经由网络(150)传输到另一终端(120)。第二终端(120)可以从网络(150)接收另一终端的已编码视频数据，对已编码数据进行解码并显示恢复的视频数据。单向数据传输在媒体服务等应用中可能是较常见的。Figure 1 shows a simplified block diagram of a communication system (100) according to an embodiment of the present disclosure. The system (100) may include at least two terminals (110-120) interconnected via a network (150). For unidirectional data transmission, a first terminal (110) may encode video data at its local location for transmission to another terminal (120) via the network (150). A second terminal (120) may receive the encoded video data from the other terminal from the network (150), decode the encoded data, and display the recovered video data. Unidirectional data transmission may be common in applications such as media services.

图1示出了第二对终端(130，140)，第二对终端(130，140)设置成支持例如在视频会议期间可能发生的已编码视频的双向传输。对于数据的双向传输，每个终端(130，140)可以对在本地位置捕获的视频数据进行编码，以经由网络(150)传输到另一终端。每个终端(130，140)还可以接收由另一终端传输的已编码视频数据，可以对已编码数据进行解码，并可以在本地显示设备上显示恢复的视频数据。Figure 1 illustrates a second pair of terminals (130, 140) configured to support bidirectional transmission of encoded video, for example, during video conferencing. For bidirectional data transmission, each terminal (130, 140) can encode video data captured at its local location for transmission to the other terminal via the network (150). Each terminal (130, 140) can also receive encoded video data transmitted by the other terminal, decode the encoded data, and display the recovered video data on a local display device.

在图1中，终端(110-140)可以被示为服务器、个人计算机和智能电话，但是本公开的原理可以不限于此。本公开的实施例可应用于膝上型计算机、平板计算机、媒体播放器和/或专用视频会议设备。网络(150)表示在终端(110-140)之间传送已编码视频数据的任何数量的网络，包括例如有线和/或无线通信网络。通信网络(150)可以在电路交换和/或分组交换信道中交换数据。代表性网络包括电信网络、局域网、广域网和/或因特网。出于本讨论的目的，除非在下文中有所解释，否则网络(150)的结构和拓扑对于本公开的操作来说可能是无关紧要的。In Figure 1, the terminals (110-140) can be shown as servers, personal computers, and smartphones, but the principles of this disclosure are not limited to these. Embodiments of this disclosure can be applied to laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment. The network (150) refers to any number of networks that transmit encoded video data between the terminals (110-140), including, for example, wired and/or wireless communication networks. The communication network (150) can exchange data in circuit-switched and/or packet-switched channels. Representative networks include telecommunications networks, local area networks (LANs), wide area networks (WANs), and/or the Internet. For the purposes of this discussion, unless explained below, the structure and topology of the network (150) may be irrelevant to the operation of this disclosure.

作为用于所公开的主题的应用的示例，图2示出了视频编码器和解码器在流式传输环境中的布置。所公开的主题可同等地应用于其他支持视频的应用，包括例如视频会议、数字电视、在包括CD、DVD、记忆棒等的数字媒介上存储压缩视频等。As an example of an application of the disclosed subject matter, Figure 2 illustrates the arrangement of a video encoder and decoder in a streaming environment. The disclosed subject matter can be equally applied to other video-enabled applications, including, for example, video conferencing, digital television, and storing compressed video on digital media including CDs, DVDs, Memory Sticks, etc.

流式传输系统可以包括采集子系统(213)，采集子系统(213)可以包括例如数码相机的视频源(201)，从而创建例如未压缩的视频样本流(202)。与已编码视频比特流相比，用粗线描绘以强调高数据量的样本流(202)可以由耦合到相机(201)的编码器(203)处理。编码器(203)可包括硬件、软件或软硬件组合，以实现或实施下文更详细地描述的所公开的主题的各方面。与样本流相比，用细线描绘以强调较低数据量的编码视频比特流(204)可以存储在流式传输服务器(205)上以供将来使用。一个或多个流式传输客户端(206，208)可以访问流式传输服务器(205)以检索已编码视频比特流(204)的副本(207，209)。客户端(206)可以包括视频解码器(210)，视频解码器(210)对已编码视频比特流(207)的传入副本进行解码，并创建可以在显示器(212)或其他呈现设备(未示出)上呈现的输出视频样本流(211)。在一些流式传输系统中，可以根据某些视频编码/压缩标准对视频比特流(204，207，209)进行编码。这些标准的示例包括ITU-T建议书H.265。正在开发非正式地称为通用视频编码(Versatile Video Coding)或VVC的视频编码标准。所公开的主题可以在VVC的环境中使用。The streaming system may include an acquisition subsystem (213), which may include a video source (201) such as a digital camera, thereby creating, for example, an uncompressed video sample stream (202). The sample stream (202), depicted with thicker lines to emphasize its high data volume compared to the encoded video bitstream, may be processed by an encoder (203) coupled to the camera (201). The encoder (203) may include hardware, software, or a combination of hardware and software to implement or enforce aspects of the disclosed subject matter described in more detail below. The encoded video bitstream (204), depicted with thinner lines to emphasize its lower data volume compared to the sample stream, may be stored on a streaming server (205) for future use. One or more streaming clients (206, 208) may access the streaming server (205) to retrieve a copy (207, 209) of the encoded video bitstream (204). The client (206) may include a video decoder (210) that decodes an incoming copy of the encoded video bitstream (207) and creates an output video sample stream (211) that can be presented on a display (212) or other presentation device (not shown). In some streaming systems, the video bitstream (204, 207, 209) may be encoded according to certain video coding/compression standards. Examples of these standards include ITU-T Recommendation H.265. A video coding standard informally known as Versatile Video Coding (VVC) is under development. The disclosed topics can be used in a VVC environment.

图3可以是根据本公开的实施例的视频解码器(210)的功能框图。Figure 3 may be a functional block diagram of a video decoder (210) according to an embodiment of the present disclosure.

接收器(310)可以接收待由解码器(210)解码的一个或多个编解码视频序列；在同一实施例或另一实施例中，一次接收一个已编码视频序列，其中每个已编码视频序列的解码独立于其它已编码视频序列的解码。已编码视频序列可以从信道(312)接收，信道(312)可以是通向存储已编码视频数据的存储设备的硬件/软件链路。接收器(310)可以接收已编码视频数据以及其它数据，例如已编码音频数据和/或辅助数据流，这些数据可以转发到它们各自的使用实体(未描绘)。接收器(310)可以将已编码视频序列与其它数据分开。为了防止网络抖动，缓冲存储器(315)可以耦合在接收器(310)和熵解码器/解析器(320)(下文称为“解析器”)之间。当接收器(310)从具有足够带宽和可控性的存储/转发设备接收数据或者从同步网络接收数据时，可能不需要缓冲器(315)，或者缓冲器(315)可以较小。为了在诸如因特网之类的尽力服务的分组网络上使用，可能需要缓冲器(315)，缓冲器(315)可以相对较大且可以有利地具有自适应大小。The receiver (310) can receive one or more encoded video sequences to be decoded by the decoder (210); in the same embodiment or another embodiment, one encoded video sequence is received at a time, wherein the decoding of each encoded video sequence is independent of the decoding of other encoded video sequences. The encoded video sequences can be received from a channel (312), which can be a hardware/software link to a storage device storing the encoded video data. The receiver (310) can receive the encoded video data as well as other data, such as encoded audio data and/or auxiliary data streams, which can be forwarded to their respective user entities (not depicted). The receiver (310) can separate the encoded video sequences from other data. To prevent network jitter, a buffer (315) can be coupled between the receiver (310) and the entropy decoder/parser (320) (hereinafter referred to as the "parser"). When the receiver (310) receives data from a storage/forwarding device with sufficient bandwidth and controllability or from a synchronization network, the buffer (315) may not be required, or the buffer (315) may be small. For use on best-effort packet networks such as the Internet, a buffer (315) may be required, which can be relatively large and can advantageously have an adaptive size.

视频解码器(210)可以包括解析器(320)以从熵编码的视频序列重建符号(321)。这些符号的类别包括用于管理解码器(210)的操作的信息，以及用于控制诸如显示器(212)之类的呈现设备的潜在信息，该呈现设备不是解码器的整体部分，但是可以耦合到该解码器，如图2所示。用于呈现设备的控制信息可以是辅助增强信息(SupplementaryEnhancement Information，SEI消息)或视频可用性信息(Video Usability Information，VUI)参数集片段(未描绘)的形式。解析器(320)可以对接收到的已编码视频序列进行解析/熵解码。已编码视频序列的编码可以根据视频编码技术或标准进行，且可以遵循本领域技术人员公知的原理，包括可变长度编码、霍夫曼编码(Huffman coding)、具有或不具有上下文灵敏度的算术编码等。解析器(320)可以基于与群组相对应的至少一个参数，从已编码视频序列中提取用于视频解码器中的像素的子群中的至少一个子群的子群参数集。子群可以包括图片群组(GOP)、图片、图块、切片、宏块、编码单元(Coding Unit，CU)、块、变换单元(Transform Unit，TU)、预测单元(Prediction Unit，PU)等。熵解码器/解析器还可以从已编码视频序列中提取诸如变换系数、量化器参数值、运动矢量等信息。The video decoder (210) may include a parser (320) to reconstruct symbols (321) from the entropy-encoded video sequence. These symbols may include information for managing the operation of the decoder (210) and potential information for controlling a presentation device, such as a display (212), which is not integral to the decoder but may be coupled to it, as shown in Figure 2. The control information for the presentation device may be in the form of Supplementary Enhancement Information (SEI) messages or fragments of Video Usability Information (VUI) parameter sets (not depicted). The parser (320) may perform parsing/entropy decoding on the received encoded video sequence. The encoding of the encoded video sequence may be based on video coding techniques or standards and may follow principles well-known to those skilled in the art, including variable-length coding, Huffman coding, arithmetic coding with or without context sensitivity, etc. The parser (320) can extract a set of subgroup parameters from the encoded video sequence for use in the video decoder, based on at least one parameter corresponding to the group. Subgroups may include groups of pictures (GOPs), pictures, tiles, slices, macroblocks, coding units (CUs), blocks, transform units (TUs), prediction units (PUs), etc. The entropy decoder/parser can also extract information such as transform coefficients, quantizer parameter values, and motion vectors from the encoded video sequence.

解析器(320)可以对从缓冲器(315)接收的视频序列执行熵解码/解析操作，以创建符号(321)。The parser (320) can perform entropy decoding/parsing operations on the video sequence received from the buffer (315) to create symbols (321).

符号(321)的重建可以涉及多个不同单元，这取决于已编码视频图片或一部分已编码视频图片(例如：帧间图片和帧内图片，帧间块和帧内块)的类型以及其它因素。涉及哪些单元以及涉及方式可由解析器(320)从已编码视频序列中解析的子群控制信息来控制。为了清楚起见，未描绘解析器(320)和下文的多个单元之间的这种子群控制信息流。The reconstruction of symbol (321) can involve multiple different units, depending on the type of encoded video picture or a subset of encoded video pictures (e.g., inter-frame pictures and intra-frame pictures, inter-frame blocks and intra-frame blocks) and other factors. Which units are involved and how they are involved can be controlled by the subgroup control information parsed from the encoded video sequence by the parser (320). For clarity, the flow of this subgroup control information between the parser (320) and the multiple units described below is not depicted.

除了已经提及的功能块之外，解码器210可以在概念上细分成如下所述的多个功能单元。在商业约束下运行的实际实施方式中，这些单元中的许多单元彼此紧密地相互作用，且可以至少部分地彼此集成。然而，出于描述所公开的主题的目的，在概念上细分成以下功能单元是合适的。In addition to the functional blocks already mentioned, decoder 210 can be conceptually subdivided into multiple functional units as described below. In practical implementations operating under commercial constraints, many of these units interact closely with each other and may be at least partially integrated with one another. However, for the purposes of describing the disclosed subject matter, it is appropriate to conceptually subdivide it into the following functional units.

第一单元是缩放器/逆变换单元(351)。缩放器/逆变换单元(351)从解析器(320)接收作为符号(321)的量化变换系数以及控制信息，包括使用哪个变换方式、块大小、量化因子、量化缩放矩阵等。缩放器/逆变换单元(351)可以输出包括样本值的块，该样本值可以输入到聚合器(355)中。The first unit is the scaler/inverse transform unit (351). The scaler/inverse transform unit (351) receives quantization transform coefficients as symbols (321) and control information from the parser (320), including which transform mode to use, block size, quantization factor, quantization scaling matrix, etc. The scaler/inverse transform unit (351) can output a block containing sample values, which can be input into the aggregator (355).

在一些情况下，缩放器/逆变换(351)的输出样本可以属于帧内编码块，即：不使用来自先前重建的图片的预测性信息，但是可以使用来自当前图片的先前重建部分的预测性信息的块。这种预测性信息可以由帧内图片预测单元(352)提供。在一些情况下，帧内图片预测单元(352)使用从当前(部分重建的)图片(356)提取的周围已重建信息，生成与正在重建的块具有相同大小和形状的块。在一些情况下，聚合器(355)基于每个样本将帧内预测单元(352)已经生成的预测信息添加到由缩放器/逆变换单元(351)提供的输出样本信息中。In some cases, the output samples of the scaler/inverse transform (351) may belong to intra-coded blocks, i.e., blocks that do not use predictive information from previously reconstructed images but can use predictive information from previously reconstructed portions of the current image. This predictive information can be provided by the intra-picture prediction unit (352). In some cases, the intra-picture prediction unit (352) uses surrounding reconstructed information extracted from the current (partially reconstructed) image (356) to generate blocks with the same size and shape as the blocks being reconstructed. In some cases, the aggregator (355) adds the predictive information already generated by the intra-picture prediction unit (352) to the output sample information provided by the scaler/inverse transform unit (351) based on each sample.

在其它情况下，缩放器/逆变换单元(351)的输出样本可以属于帧间编码和潜在运动补偿的块。在这种情况下，运动补偿预测单元(353)可以访问参考图片存储器(357)以提取用于预测的样本。在根据属于块的符号(321)对所提取的样本进行运动补偿之后，这些样本可以由聚合器(355)添加到缩放器/逆变换单元的输出(在这种情况下称为残差样本或残差信号)，以生成输出样本信息。运动补偿单元从参考图片存储器内的地址提取预测样本可以由运动矢量控制，该运动矢量可以以符号(321)的形式而对运动补偿单元可用，符号(321)可以具有例如X、Y和参考图片分量。运动补偿还可以包括当使用子样本精确运动矢量时从参考图片存储器中提取的样本值的内插、运动矢量预测机制等。In other cases, the output samples of the scaler/inverse transform unit (351) may belong to blocks of inter-frame coding and potential motion compensation. In this case, the motion compensation prediction unit (353) can access the reference image memory (357) to extract samples for prediction. After motion compensation of the extracted samples according to the symbols (321) belonging to the block, these samples can be added by the aggregator (355) to the output of the scaler/inverse transform unit (referred to as residual samples or residual signals in this case) to generate output sample information. The extraction of prediction samples from the address in the reference image memory by the motion compensation unit can be controlled by a motion vector, which can be available to the motion compensation unit in the form of symbols (321), which can have, for example, X, Y, and reference image components. Motion compensation may also include interpolation of sample values extracted from the reference image memory when using subsampled precise motion vectors, motion vector prediction mechanisms, etc.

聚合器(355)的输出样本可以经受环路滤波器单元(354)中的各种环路滤波技术。视频压缩技术可以包括环路内滤波器技术，环路内滤波器技术由包括在已编码视频比特流中且作为来自解析器(320)的符号(321)可用于环路滤波器单元(354)的参数控制，但是视频压缩技术也可以对在对已编码图片或已编码视频序列的先前(按解码顺序)部分进行解码期间获得的元信息进行响应，以及对先前重建和环路滤波的样本值进行响应。The output samples of the aggregator (355) can be subjected to various loop filtering techniques in the loop filter unit (354). The video compression technique may include in-loop filtering techniques, which are controlled by parameters included in the encoded video bitstream and available to the loop filter unit (354) as symbols (321) from the parser (320). However, the video compression technique may also respond to metadata obtained during decoding of previous (in the order of decoding) portions of the encoded picture or encoded video sequence, as well as to sample values from previous reconstruction and loop filtering.

环路滤波器单元(354)的输出可以是样本流，该样本流可以输出到呈现设备(212)以及存储在参考图片存储器(356)中以在将来帧间图片预测中使用。The output of the loop filter unit (354) can be a sample stream, which can be output to the presentation device (212) and stored in the reference image memory (356) for use in future inter-frame image prediction.

某些已编码图片一旦完全重建，就可以用作参考图片以用于将来预测。一旦已编码图片被完全重建且已编码图片(例如，通过解析器(320))被识别为参考图片，则当前参考图片(356)可以变成参考图片缓冲器(357)的一部分，且可以在开始对后续已编码图片进行重建之前重新分配新的当前图片存储器。Once a certain encoded image is fully reconstructed, it can be used as a reference image for future predictions. Once an encoded image is fully reconstructed and the encoded image (e.g., by the parser (320)) is identified as a reference image, the current reference image (356) can become part of the reference image buffer (357), and new current image memory can be reallocated before reconstruction of subsequent encoded images begins.

视频解码器320可根据可在标准例如ITU-T建议书H.265中记录的预定视频压缩技术来执行解码操作。已编码视频序列可以符合由正在使用的视频压缩技术或标准规定的语法，即在某种意义上已编码视频序列遵守视频压缩技术或标准的语法，如在视频压缩技术文档或标准中规定的，特别是在其配置文档中规定的。对于合规性，还可能需要已编码视频序列的复杂度处在由视频压缩技术或标准的层级所限定的范围内。在一些情况下，层级限制最大图片大小、最大帧率、最大重建采样率(例如，以每秒兆个样本为单位进行测量)、最大参考图片大小等。在一些情况下，由层级设置的限制可通过假设参考解码器(Hypothetical Reference Decoder，HRD)规范和在已编码视频序列中用信号表示的HRD缓冲器管理的元数据来进一步限定。The video decoder 320 can perform decoding operations according to a predetermined video compression technique that may be documented in standards such as ITU-T Recommendation H.265. The encoded video sequence may conform to the syntax specified by the video compression technique or standard in use; that is, the encoded video sequence in some sense adheres to the syntax of the video compression technique or standard, as specified in the video compression technique documentation or standard, particularly in its configuration documentation. For compliance, it may also be necessary that the complexity of the encoded video sequence be within the range defined by the hierarchy of the video compression technique or standard. In some cases, the hierarchy limits the maximum picture size, maximum frame rate, maximum reconstruction sampling rate (e.g., measured in megasamples per second), maximum reference picture size, etc. In some cases, the limitations set by the hierarchy can be further limited by the Hypothetical Reference Decoder (HRD) specification and the metadata managed by the HRD buffer, which is represented as a signal in the encoded video sequence.

在实施例中，接收器(310)可以接收编码视频以及附加(冗余)数据。附加数据可以被包括作为已编码视频序列的一部分。视频解码器(320)可以使用附加数据来适当地解码数据和/或更准确地重建原始视频数据。附加数据可以是例如时间、空间或信噪比(SNR)增强层、冗余切片、冗余图片、前向纠错码等的形式。In this embodiment, the receiver (310) can receive encoded video as well as additional (redundant) data. The additional data can be included as part of the encoded video sequence. The video decoder (320) can use the additional data to properly decode the data and/or more accurately reconstruct the original video data. The additional data can be in the form of, for example, temporal, spatial, or signal-to-noise ratio (SNR) enhancement layers, redundant slices, redundant images, forward error correction codes, etc.

图4可以是根据本公开的实施例的视频编码器(203)的功能框图。Figure 4 can be a functional block diagram of a video encoder (203) according to an embodiment of the present disclosure.

编码器(203)可以从视频源(201)(不是编码器的一部分)接收视频样本，该视频源可以采集待由编码器(203)编码的视频图像。The encoder (203) can receive video samples from a video source (201) (not part of the encoder), which can capture video images to be encoded by the encoder (203).

视频源(201)可以以数字视频样本流的形式提供待由编码器(203)编码的源视频序列，该数字视频采样流可以是任何合适位深度(例如：8位、10位、12位……)、任何色彩空间(例如BT.601Y CrCB、RGB……)和任何合适采样结构(例如Y CrCb 4:2:0，Y CrCb 4:4:4)。在媒体服务系统中，视频源(201)可以是存储先前准备的视频的存储设备。在视频会议系统中，视频源(203)可以是相机，该相机采集本地图像信息以作为视频序列。视频数据可以提供作为在按顺序观看时被赋予运动的多个单独图片。图片本身可以组织为空间像素阵列，其中每个像素可以包括一个或多个样本，这取决于所使用的采样结构、色彩空间等。本领域的技术人员可以容易地理解像素和样本之间的关系。下文侧重于描述样本。A video source (201) may be provided as a digital video sample stream containing a sequence of source videos to be encoded by an encoder (203). This digital video sample stream may be of any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit, etc.), any color space (e.g., BT.601YCrCb, RGB, etc.), and any suitable sampling structure (e.g., YCrCb 4:2:0, YCrCb 4:4:4). In a media service system, the video source (201) may be a storage device storing previously prepared video. In a video conferencing system, the video source (203) may be a camera that acquires local image information as a video sequence. The video data may be provided as multiple individual pictures that are given motion when viewed in sequence. The pictures themselves may be organized as spatial pixel arrays, where each pixel may include one or more samples, depending on the sampling structure, color space, etc., used. Those skilled in the art will readily understand the relationship between pixels and samples. The following focuses on describing samples.

根据实施例，编码器(203)可以实时地或者在应用程序所需的任何其它时间约束下将源视频序列的图片编码和压缩成已编码视频序列(443)。施行适当的编码速度是控制器(450)的一个功能。控制器控制以下描述的其它功能单元，且功能性地耦合到这些单元。为了清楚起见，未描绘耦合。由控制器设置的参数可以包括速率控制相关参数(图片跳过、量化器、率失真优化技术的λ值……)、图片大小、图片群组(GOP)布局、最大运动矢量搜索范围等。本领域的技术人员可容易地确定控制器(450)的其它功能，这些功能可属于针对某一系统设计而优化的视频编码器(203)。According to an embodiment, the encoder (203) can encode and compress images of a source video sequence into an encoded video sequence (443) in real time or under any other time constraints required by the application. Implementing an appropriate encoding rate is a function of the controller (450). The controller controls and is functionally coupled to other functional units described below. Coupling is not depicted for clarity. Parameters set by the controller may include rate control-related parameters (image skipping, quantizer, λ value of rate-distortion optimization techniques, etc.), image size, group of images (GOP) layout, maximum motion vector search range, etc. Other functions of the controller (450) can be readily identified by those skilled in the art and may belong to the video encoder (203) optimized for a particular system design.

一些视频编码器以本领域技术人员容易认识到的“编码环路”的方式操作。作为过于简化的描述，编码环路可以包括编码器(430)(此后称为“源编码器”)的编码部分(负责基于待编码的输入图片和参考图片来创建符号)以及嵌入在编码器(203)中的(本地)解码器(433)，(本地)解码器(433)重建符号以创建样本数据，该样本数据也可由(远程)解码器创建(因为在所公开的主题中考虑的视频压缩技术中，符号和已编码视频比特流之间的任何压缩是无损的)。该重建的样本流输入到参考图片存储器(434)。由于符号流的解码导致与(本地或远程)解码器位置无关的位精确结果，因此参考图片缓冲器内容在本地编码器和远程编码器之间也是按比特位精确对应。换句话说，编码器的预测部分“看到”的参考图片样本与解码器在解码期间使用预测时可“看到”的样本值完全相同。这种参考图片同步性基本原理(以及如果例如由于信道误差而不能保持同步性，则产生漂移)是本领域技术人员所熟知的。Some video encoders operate in a manner readily recognized by those skilled in the art as an “encoding loop.” As an oversimplification, an encoding loop may include the encoding portion of an encoder (430) (hereinafter referred to as the “source encoder”) responsible for creating symbols based on the input image to be encoded and a reference image, and a (local) decoder (433) embedded in the encoder (203), which reconstructs the symbols to create sample data, which may also be created by a (remote) decoder (since any compression between the symbols and the encoded video bitstream is lossless in the video compression techniques considered in the disclosed subject matter). This reconstructed sample stream is input to a reference image memory (434). Since the decoding of the symbol stream results in a bit-accurate result independent of the (local or remote) decoder location, the contents of the reference image buffer also correspond bit-accurately between the local and remote encoders. In other words, the reference image samples “seen” by the encoder’s prediction portion are exactly the same sample values that the decoder can “see” when using prediction during decoding. The basic principles of this reference image synchronization (and the drift that occurs if synchronization cannot be maintained, for example, due to channel errors) are well known to those skilled in the art.

“本地”解码器(433)的操作可以与上面已经结合图3详细描述的“远程”解码器(210)的操作相同。然而，另外简单地参照图3，由于符号可用，且通过熵编码器(445)和解析器(320)将符号编码/解码成已编码视频序列可能是无损的，因此解码器(210)的熵解码部分(包括信道(312)、接收器(310)、缓冲器(315)和解析器(320))可能不能在本地解码器(433)中完全实现。The operation of the “local” decoder (433) can be the same as that of the “remote” decoder (210) which has been described in detail above in conjunction with Figure 3. However, referring simply to Figure 3, since symbols are available and encoding/decoding symbols into an encoded video sequence via an entropy encoder (445) and a parser (320) may be lossless, the entropy decoding part of the decoder (210) (including the channel (312), receiver (310), buffer (315), and parser (320)) may not be fully implemented in the local decoder (433).

此时可以观察到，除了在解码器中存在的解析/熵解码之外的任何解码器技术，也需要以基本上相同的功能形式存在于相应的编码器中。为此，所公开的主题侧重于解码器操作。编码器技术的描述可以简化，因为编码器技术与全面描述的解码器技术互逆。仅在某些区域中需要更详细的描述，并在下文提供。It can be observed that any decoder technique other than parsing/entropy decoding present in the decoder also needs to exist in the corresponding encoder in essentially the same functional form. Therefore, the subject matter presented here focuses on decoder operation. The description of encoder techniques can be simplified, as encoder techniques are inverses of the fully described decoder techniques. More detailed descriptions are only required in certain areas, and are provided below.

作为源编码器的操作的一部分，源编码器(430)可执行运动补偿预测编码，该运动补偿预测编码参考来自视频序列的被指定为“参考帧”的一个或多个先前已编码帧来对输入帧进行预测性编码。以此方式，编码引擎(432)对输入帧的像素块与可被选择作为输入帧的预测参考的参考帧的像素块之间的差异进行编码。As part of the operation of the source encoder, the source encoder (430) may perform motion-compensated predictive coding, which predictively codes the input frame with reference to one or more previously encoded frames from the video sequence designated as "reference frames". In this way, the encoding engine (432) encodes the differences between the pixel blocks of the input frame and the pixel blocks of the reference frame, which can be selected as the prediction reference for the input frame.

本地视频解码器(433)可基于由源编码器(430)创建的符号来对可指定为参考帧的帧的已编码视频数据进行解码。编码引擎(432)的操作可以有利地是有损过程。当已编码视频数据可以在视频解码器(图4中未示出)处解码时，已重建视频序列通常可以是源视频序列的副本，但带有一些误差。本地视频解码器(433)对可以由视频解码器在参考帧上执行的解码过程进行复制，并可以使得已重建参考帧存储在参考图片高速缓存(434)中。以这种方式，编码器(203)可以在本地存储已重建参考帧的副本，该副本与将由远端视频解码器获得的已重建参考帧具有共同内容(不存在传输错误)。The local video decoder (433) can decode encoded video data of a frame that can be designated as a reference frame based on symbols created by the source encoder (430). The operation of the encoding engine (432) can advantageously be a lossy process. When the encoded video data can be decoded at the video decoder (not shown in FIG. 4), the reconstructed video sequence can typically be a copy of the source video sequence, but with some errors. The local video decoder (433) replicates the decoding process that can be performed by the video decoder on the reference frame and can make the reconstructed reference frame stored in a reference picture cache (434). In this way, the encoder (203) can locally store a copy of the reconstructed reference frame that shares common content with the reconstructed reference frame that will be obtained by the remote video decoder (without transmission errors).

预测器(435)可以对编码引擎(432)执行预测搜索。也就是说，对于待编码的新帧，预测器(435)可以搜索参考图片存储器(434)以寻找可以用作新图片的适当预测参考的样本数据(作为候选参考像素块)或某些元数据，诸如参考图片运动矢量、块形状等。预测器(435)可以基于样本块逐像素块操作，以找到合适的预测参考。在一些情况下，如由预测器(435)获得的搜索结果所确定的，输入图片可以具有从存储在参考图片存储器(434)中的多个参考图片中取得的预测参考。The predictor (435) can perform a prediction search on the encoding engine (432). That is, for a new frame to be encoded, the predictor (435) can search the reference image memory (434) for sample data (as candidate reference pixel blocks) or certain metadata, such as reference image motion vectors, block shapes, etc., that can be used as a suitable prediction reference for the new image. The predictor (435) can operate pixel-by-pixel based on the sample blocks to find a suitable prediction reference. In some cases, as determined by the search results obtained by the predictor (435), the input image may have prediction references obtained from multiple reference images stored in the reference image memory (434).

控制器(450)可以管理视频编码器(430)的编码操作，包括例如设置用于对视频数据进行编码的参数和子群参数。The controller (450) can manage the encoding operations of the video encoder (430), including, for example, setting parameters and subgroup parameters for encoding video data.

上述所有功能单元的输出可以在熵编码器(445)中进行熵编码。熵编码器通过根据本领域技术人员已知的技术(例如霍夫曼编码、可变长度编码、算术编码等)对由各种功能单元生成的符号进行无损压缩，将该符号转换为已编码视频序列。The outputs of all the above functional units can be entropy encoded in the entropy encoder (445). The entropy encoder converts the symbols generated by the various functional units into an encoded video sequence by lossless compression of the symbols according to techniques known to those skilled in the art (e.g., Huffman coding, variable-length coding, arithmetic coding, etc.).

传输器(440)可以缓冲由熵编码器(445)创建的已编码视频序列，以准备将已编码视频序列经由通信信道(460)进行传输，该通信信道可以是通向将存储已编码视频数据的存储设备的硬件/软件链路。传输器(440)可以将来自视频编码器(430)的已编码视频数据与要传输的其它数据(例如，已编码音频数据和/或辅助数据流(未示出来源))合并。The transmitter (440) can buffer the encoded video sequence created by the entropy encoder (445) in preparation for transmission of the encoded video sequence via a communication channel (460), which may be a hardware/software link to a storage device that will store the encoded video data. The transmitter (440) can combine the encoded video data from the video encoder (430) with other data to be transmitted, such as encoded audio data and/or auxiliary data streams (sources not shown).

控制器(450)可以管理编码器(203)的操作。在编码期间，控制器(450)可以为每个已编码图片分配特定已编码图片类型，这可能影响可以应用于相应图片的编码技术。例如，图片通常可以分配有以下帧类型之一：The controller (450) can manage the operation of the encoder (203). During encoding, the controller (450) can assign a specific encoded image type to each encoded image, which may affect the encoding techniques that can be applied to the corresponding image. For example, images can typically be assigned one of the following frame types:

帧内图片(I图片)，其可以是可以在不使用序列中的任何其它帧作为预测源的情况下被编码和解码的图片。一些视频编解码器允许不同类型的帧内图片，包括例如独立解码器刷新图片。本领域的技术人员知道I图片的那些变体以及它们各自的应用和特征。An intra-frame picture (I-picture) is a picture that can be encoded and decoded without using any other frames in the sequence as a prediction source. Some video codecs allow different types of intra-frame pictures, including, for example, pictures refreshed by a separate decoder. Those skilled in the art are familiar with those variations of I-pictures, as well as their respective applications and characteristics.

预测图片(P图片)，其可以是可以使用帧内预测或帧间预测来进行编码和解码的图片，帧内预测或帧间预测使用至多一个运动矢量和参考索引来预测每个块的样本值。A predicted image (P-image) can be an image that can be encoded and decoded using intra-frame prediction or inter-frame prediction, which uses at most one motion vector and a reference index to predict the sample value for each block.

双向预测图片(B图片)，其可以是使用帧内预测或帧间预测进行编码和解码的图片，帧内预测或帧间预测使用最多两个运动矢量和参考指数来预测每个块的样本值。类似地，多个预测图片可以使用多于两个参考图片和相关联的元数据以用于重建单个块。A bidirectional prediction image (B-image) can be an image encoded and decoded using intra-frame prediction or inter-frame prediction, which uses up to two motion vectors and a reference index to predict the sample values for each block. Similarly, multiple prediction images can use more than two reference images and associated metadata to reconstruct a single block.

源图片通常可以在空间上细分成多个样本块(例如，每个块有4×4，8×8，4×8或16×16个样本)，并逐块进行编码。块可以参考如由应用于块的相应图片的编码分配所确定的其它(已编码)块来进行预测性编码。例如，I图片的块可以进行非预测性编码，或者该块可以参考同一图片的已编码块来进行预测性编码(空间预测或帧内预测)。P图片的像素块可以参考一个先前已编码参考图片经由空间预测或者经由时间预测来进行非预测性编码。B图片的块可以参考一个或两个先前已编码参考图片经由空间预测或者经由时间预测来进行非预测性编码。The source image can typically be spatially subdivided into multiple sample blocks (e.g., each block has 4×4, 8×8, 4×8, or 16×16 samples) and encoded block by block. A block can be predictively coded with reference to other (already coded) blocks, as determined by the coding assignment applied to the corresponding image of the block. For example, a block of an I-image can be unpredictably coded, or the block can be predictively coded (spatial prediction or intra-frame prediction) with reference to already coded blocks of the same image. A pixel block of a P-image can be unpredictably coded with reference to a previously coded reference image via spatial prediction or temporal prediction. A block of a B-image can be unpredictably coded with reference to one or two previously coded reference images via spatial prediction or temporal prediction.

视频编码器(203)可根据预定视频编码技术或标准例如ITU-T建议书H.265执行编码操作。在视频编码器的操作中，视频编码器(203)可以执行各种压缩操作，包括利用输入视频序列中的时间和空间冗余的预测性编码操作。因此，已编码视频数据可以符合由所使用的视频编码技术或标准指定的语法。The video encoder (203) can perform encoding operations according to a predetermined video coding technique or standard, such as ITU-T Recommendation H.265. In operation, the video encoder (203) can perform various compression operations, including predictive coding operations that utilize temporal and spatial redundancy in the input video sequence. Therefore, the encoded video data can conform to the syntax specified by the video coding technique or standard used.

在实施例中，传输器(440)可以传输已编码视频以及附加数据。视频编码器(430)可以包括这样的数据作为已编码视频序列的一部分。附加数据可以包括时间/空间/SNR增强层、其它形式的冗余数据(诸如冗余图片和切片)、辅助增强信息(SEI)消息、视觉可用性信息(VUI)参数集片段等。In this embodiment, the transmitter (440) can transmit encoded video as well as additional data. The video encoder (430) can include such data as part of the encoded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, other forms of redundant data (such as redundant images and slices), supplementary enhancement information (SEI) messages, fragments of visual usability information (VUI) parameter sets, etc.

参照图5，在实施例中，标志(例如adaptive_picture_resolution)(502)可以指示图片段(例如图块、图块群组、CTU、CTU群组)的空间分辨率是否可以自适应地重新采样/重新缩放/未缩放(这三项在全文中可互换地使用)，以用于解码、用于预测参考以及用于显示输出(统称为RPR信息)。如果所述标志指示存在RPR信息，则某些语法元素可以分别指示多个参考图片和输出图片的图片大小。这些语法元素和上述标志可以处于任何合适的高级语法结构，以包括例如解码器/视频/序列/图片/切片/图块参数集、序列/GOP/图片/切片/GOB/图块群组/图块标头、和/或SEI消息。并不是所有这些语法元素都需要始终存在。例如，虽然RPR分辨率可以是动态的，但是图片的纵横比在视频编码技术或标准中可以是固定的，或者该固定可以在适当的高级语法结构中的标记中用信号表示。类似地，视频编码技术或标准可以指定参考图片重新采样和省略输出图片重新采样，在这种情况下也可以省略输出图片大小信息。在另一示例中，输出图片大小信息的存在可以根据其自身的标志(未描绘)来调节。Referring to Figure 5, in an embodiment, a flag (e.g., adaptive_picture_resolution) (502) may indicate whether the spatial resolution of a picture segment (e.g., tile, tile group, CTU, CTU group) can be adaptively resampled/rescaled/unscaled (these three terms are used interchangeably throughout the text) for decoding, for predictive reference, and for display output (collectively referred to as RPR information). If the flag indicates the presence of RPR information, certain syntax elements may indicate the picture sizes of multiple reference pictures and the output picture, respectively. These syntax elements and the aforementioned flags may be in any suitable high-level syntax structure to include, for example, decoder/video/sequence/picture/slice/tile parameter sets, sequence/GOP/picture/slice/GOB/tile group/tile header, and/or SEI messages. Not all of these syntax elements need to be always present. For example, while the RPR resolution may be dynamic, the aspect ratio of the picture may be fixed in the video coding technique or standard, or this fixed aspect ratio may be indicated by a signal in a tag within an appropriate high-level syntax structure. Similarly, video coding techniques or standards can specify reference image resampling and omit output image resampling, in which case output image size information can also be omitted. In another example, the presence of output image size information can be adjusted based on its own flag (not depicted).

在一个非限制性示例中，某些RPR信息可以位于序列参数集(501)中。语法元素reference_pic_width_in_luma_samples(503)和reference_pic_height_in_luma_samples(504)可以分别指示参考图片的宽度和高度。语法元素output_pic_width_in_luma_samples(505)和output_pic_height_in_luma_samples(506)可以指定输出图片分辨率。所有上述值可以是亮度样本的单位或其他单位，因为它们在视频压缩技术或标准中是常见的。对语法元素的值的某些限制也可以由视频编码技术或标准施加；例如，一个或多个语法元素的值可能需要是二的某次幂(以使得图片能够容易地适配到视频编码中常用的块中)，或者水平尺寸之间的关系可被限制到某一值(以允许针对某些分辨率之比优化的滤波器设计的有限集合，如下所述)。In a non-restrictive example, some RPR information may reside in the sequence parameter set (501). The syntax elements reference_pic_width_in_luma_samples (503) and reference_pic_height_in_luma_samples (504) may indicate the width and height of the reference picture, respectively. The syntax elements output_pic_width_in_luma_samples (505) and output_pic_height_in_luma_samples (506) may specify the output picture resolution. All of the above values may be in units of luminance samples or other units, as these are common in video compression techniques or standards. Certain restrictions on the values of syntax elements may also be imposed by video coding techniques or standards; for example, the values of one or more syntax elements may need to be powers of two (so that the picture can be easily fitted into blocks commonly used in video coding), or the relationship between horizontal dimensions may be restricted to a certain value (so that a finite set of filters optimized for certain resolution ratios, as described below).

上述信息的编码可以是任何合适的形式。如图所示，一个简单的选项可以是使用可变长度，该可变长度是由ue(v)表示的的无符号整数值。其它选项总是可能的，包括用于指示传统视频编码技术或标准(例如H.264或H.265)中的图片大小的那些选项。The encoding of the above information can be in any suitable form. As shown in the figure, a simple option is to use a variable length, which is an unsigned integer value represented by ue(v). Other options are always possible, including those used to indicate the image size in traditional video coding techniques or standards (such as H.264 or H.265).

所公开的主题的一个目的是允许RPR在编码环路内；也就是说，在已编码视频序列(CVS)中的不同图片之间。因此，指定图片的实际解码大小的语法元素可能需要处于如下的语法结构，该语法结构允许语法元素在CVS内因从一个图片改变到另一图片而潜在地改变。在实施例中，语法元素decoded_pic_width_in_luma_samples(508)和decoded_pic_height_in_luma_samples(509)存在于适当的高级语法结构中，这里是存在于PPS(507)中，且字段的值可以在已编码视频序列(CVS)内改变。其它合适的高级语法结构可以包括PPS、切片参数集、图块参数集、图片/切片/GOB/图块群组/图块标头、和/或SEI消息。使用SEI消息可能不太可取，因为RPR技术可能对解码过程具有规范性影响。至于这些语法元素的编码，上述论述是适用的。One objective of the disclosed subject matter is to allow RPR within the coding loop; that is, between different pictures in a video sequence (CVS). Therefore, the syntax elements specifying the actual decoded size of a picture may need to be in a syntax structure that allows the syntax elements to potentially change within the CVS as they move from one picture to another. In an embodiment, the syntax elements decoded_pic_width_in_luma_samples (508) and decoded_pic_height_in_luma_samples (509) reside in a suitable high-level syntax structure, here in PPS (507), and the values of the fields can change within the video sequence (CVS). Other suitable high-level syntax structures may include PPS, slice parameter sets, tile parameter sets, picture/slice/GOB/tile group/tile header, and/or SEI messages. Using SEI messages may be less desirable because RPR techniques can have a prescriptive impact on the decoding process. The above discussion applies to the encoding of these syntax elements.

在实施例中，reference_pic_width_in_luma_samples和reference_pic_height_in_luma_samples可以指示已解码图片缓冲器中的参考图片或参考图片段的图片分辨率。这可以意味着参考图片总是保持在全分辨率，与所应用的重新采样无关，且是本文讨论的技术和在H.263附录P中描述的技术之间的一个关键不同之处。In embodiments, `reference_pic_width_in_luma_samples` and `reference_pic_height_in_luma_samples` can indicate the image resolution of a reference image or a segment of a reference image in the decoded image buffer. This can mean that the reference image is always kept at full resolution, regardless of the applied resampling, and is a key difference between the techniques discussed herein and those described in Appendix P of H.263.

上面的描述假设RPR技术应用于整个图片。某些环境可受益于可应用于图片段的RPR技术，图片段例如图块群组、图块、切片、GOB等。例如，图片可以在空间上划分成语义上不同的空间区域，其通常称为图块。一个示例是安全视频，而另一示例是在360度视频中具有例如立方体投影的各种视图(其中对应于立方体的六个表面的六个视图构成了360度场景的表示)。在这样和类似的场景中，每个图块的语义上不同的内容可能要求基于每个图块不同地应用多个RPR技术，因为每个图块的内容活动可能不同。因此，在实施例中，RPR技术可应用于每一图块。这需要基于每个图块(未描绘)的信令。这些信令技术可以与以上针对每个图片的信令所描述的技术类似，除了可能需要包括潜在的多个图块的信令之外。The above description assumes that RPR technology is applied to the entire image. Certain environments may benefit from RPR technology that can be applied to image segments, such as tile groups, tiles, slices, GOBs, etc. For example, an image can be spatially divided into semantically distinct spatial regions, often referred to as tiles. One example is security video, and another is a 360-degree video with various views, such as a cube projection (where six views corresponding to the six faces of the cube constitute a representation of the 360-degree scene). In such and similar scenarios, the semantically distinct content of each tile may require the application of multiple RPR technologies differently on a per-tile basis, as the content activity of each tile may differ. Therefore, in an embodiment, RPR technology may be applied to each tile. This requires signaling based on each tile (not depicted). These signaling technologies can be similar to those described above for signaling per image, except that they may need to include signaling for potentially multiple tiles.

在实施例中，每个图块或图块群组可以在图块群组标头或标头参数集或其它合适的高级语法结构中具有不同的reference_tile_width_in_luma_samples和reference_tile_height_in_luma_samples值。In an embodiment, each tile or tile group may have different reference_tile_width_in_luma_samples and reference_tile_height_in_luma_samples values in the tile group header or header parameter set or other suitable high-level syntax structure.

在实施例中，如果参考图片分辨率不同于已解码图片分辨率，则可以相对于参考图片分辨率和已解码图片分辨率之间的比率对已解码图片进行重新缩放，然后可以将重新缩放的已解码图片存储在已解码图片缓冲器(DPB)中作为参考图片。In an embodiment, if the resolution of the reference image is different from the resolution of the decoded image, the decoded image can be rescaled relative to the ratio between the resolution of the reference image and the resolution of the decoded image, and then the rescaled decoded image can be stored in the decoded image buffer (DPB) as a reference image.

在实施例中，如果如上所述明确地用信号表示已解码图片分辨率和参考图片分辨率之间的竖直/水平分辨率之比，则可以相对于用信号表示的比率对已解码图片进行重新缩放，然后可以将重新缩放的已解码图片存储在已解码图片缓冲器(DPB)中作为参考图片。In an embodiment, if the vertical/horizontal resolution ratio between the decoded image resolution and the reference image resolution is explicitly represented by a signal as described above, the decoded image can be rescaled relative to the ratio represented by the signal, and the rescaled decoded image can then be stored in the decoded image buffer (DPB) as a reference image.

在实施例中，output_pic_width_in_luma_samples和output_pic_height_in_luma_samples可以向视频播放器指示输出图片或输出图片段的图片分辨率。In this embodiment, output_pic_width_in_luma_samples and output_pic_height_in_luma_samples can indicate the image resolution of the output image or the output image segment to the video player.

在实施例中，如果输出图片分辨率不同于参考图片分辨率，则可以相对于输出图片分辨率和参考图片分辨率之间的比率对参考图片进行重新缩放，然后重新缩放的参考图片可以作为输出图片从DPB中突出，并被馈送到视频播放器中以显示图片。In an embodiment, if the output image resolution differs from the reference image resolution, the reference image can be rescaled relative to the ratio between the output image resolution and the reference image resolution. The rescaled reference image can then be highlighted from the DPB as the output image and fed into the video player for display.

在实施例中，如果明确地用信号表示参考图片分辨率和输出图片分辨率之间的竖直/水平分辨率之比，则可以相对于输出图片分辨率和参考图片分辨率之间的比率对参考图片进行重新缩放，然后重新缩放的参考图片可以作为输出图片从DPB中突出，并被馈送到视频播放器中以显示图片。In an embodiment, if the vertical/horizontal resolution ratio between the reference image resolution and the output image resolution is explicitly represented by a signal, the reference image can be rescaled relative to the ratio between the output image resolution and the reference image resolution. The rescaled reference image can then be highlighted from the DPB as the output image and fed into the video player for display.

在实施例中，每个图块或图块群组可以在图块群组标头或标头参数集或其它合适的语法结构中具有不同的output_tile_width_in_luma_samples和output_tile_height_in_luma_samples值。In an embodiment, each tile or tile group may have different output_tile_width_in_luma_samples and output_tile_height_in_luma_samples values in the tile group header or header parameter set or other suitable syntax structure.

某些视频编码技术或标准包括时间子层形式的时间可缩放性。在实施例中，每个子层可以具有不同的reference_pic_width_in_luma_samples、reference_pic_height_in_luma_samples、output_pic_width_in_luma_samples,output_pic_height_in_luma_samples、decoded_pic_width_in_luma_samples,decoded_pic_height_in_luma_samples值。用于每个子层的语法元素可以例如以SPS或任何其它合适的高级语法结构来用信号表示。Some video coding techniques or standards include temporal scalability in the form of temporal sublayers. In embodiments, each sublayer may have different values for reference_pic_width_in_luma_samples, reference_pic_height_in_luma_samples, output_pic_width_in_luma_samples, output_pic_height_in_luma_samples, decoded_pic_width_in_luma_samples, decoded_pic_height_in_luma_samples. Syntax elements for each sublayer may be represented by signals, for example, using SPS or any other suitable high-level syntax structure.

参照图6，在实施例中，视频比特流解析器(602)可以从已编码视频比特流中解析和解释上述语法元素和其它语法元素，已编码视频比特流从已编码图片缓冲器(601)接收。在从已编码视频比特流接收非RPS相关语法元素时，视频解码器可以以潜在地下采样的分辨率重建已编码图片。为了这样做，可能需要参考样本，可以从已解码图片缓冲器(604)接收参考样本。根据实施例，由于已解码图片缓冲器(604)以全分辨率存储参考图片或段，因此可能需要重新缩放(605)以向解码器(603)提供适当重新采样的参考图片。重新缩放(603)可以由重新缩放控制器(606)控制，重新缩放控制器(606)可以接收缩放参数(例如上述的语法元素)(607)，并将缩放参数转换成重新缩放器(605)的适当信息(608)；例如，计算适当的重新缩放滤波器参数。最后，如果还需要输出分辨率重新缩放，则重新缩放控制器(606)还可以将重新缩放信息609提供给重新缩放机制(610)以供显示。最后，已重建视频可以由视频播放器(611)播放，或者以其它方式处理以供消费或存储。Referring to Figure 6, in an embodiment, the video bitstream parser (602) can parse and interpret the aforementioned syntax elements and other syntax elements from the encoded video bitstream, which is received from an encoded picture buffer (601). When receiving non-RPS-related syntax elements from the encoded video bitstream, the video decoder can reconstruct the encoded picture at a potentially undersampled resolution. To do this, reference samples may be needed, which can be received from the decoded picture buffer (604). According to an embodiment, since the decoded picture buffer (604) stores reference pictures or segments at full resolution, rescaling (605) may be necessary to provide the decoder (603) with appropriately resampled reference pictures. Rescaling (603) can be controlled by a rescaling controller (606), which can receive scaling parameters (e.g., the aforementioned syntax elements) (607) and convert the scaling parameters into appropriate information (608) for the rescaler (605); for example, calculating appropriate rescaling filter parameters. Finally, if the output resolution needs to be rescaled, the rescaling controller (606) can also provide rescaling information 609 to the rescaling mechanism (610) for display. Finally, the reconstructed video can be played by a video player (611) or otherwise processed for consumption or storage.

在重新缩放处理中使用的滤波器可以在视频编码技术或标准中规定。由于两个滤波方向需要在编码环路的“内部”，即下采样(例如，从已解码图片缓冲器(604)到视频解码器(603))和上采样(例如，从视频解码器(603)到已解码图片缓冲器(604))需要在编码环路的“内部”，因此这两个滤波方向可能按要求是完全指定的，且应该由视频压缩技术或标准来指定，以尽可能地实现可逆性。至于滤波器设计本身，可能需要在计算/实现简单性和性能之间保持平衡。某些初始结果表明，如H.263附录P中所建议的双线性滤波器从性能角度来看可能是次优的。另一方面，采用基于神经网络的处理的某些自适应滤波技术在计算上可能太复杂，以致于不能在商业上适当的时间范围内和在商业上适当的复杂度约束下广泛采用视频编码技术或标准。作为平衡，诸如在SHVC中使用的滤波器设计或者如在HEVC中使用的各种内插滤波器可能是适当的，并将具有可以很好地理解其特性的附加优点。The filters used in the rescaling process can be specified in the video coding technique or standard. Since both filtering directions need to be "inside" the coding loop—that is, downsampling (e.g., from the decoded image buffer (604) to the video decoder (603)) and upsampling (e.g., from the video decoder (603) to the decoded image buffer (604))—these two filtering directions may be fully specified as required and should be specified by the video compression technique or standard to achieve reversibility as much as possible. As for the filter design itself, a balance may need to be struck between computational/implementation simplicity and performance. Some initial results suggest that bilinear filters, such as those suggested in Appendix P of H.263, may be suboptimal in terms of performance. On the other hand, certain adaptive filtering techniques employing neural network-based processing may be computationally too complex to be widely adopted by video coding techniques or standards within a commercially appropriate timeframe and under commercially appropriate complexity constraints. As a balance, filter designs such as those used in SHVC or various interpolation filters, such as those used in HEVC, may be appropriate and will have the added advantage of a better understanding of their characteristics.

参照图7，在实施例中，可以以不同的分辨率独立地将诸如切片、GOB、图块或图块群组(下文中，图块)之类的每个图片段从已解码图块重新缩放成参考图块，以及从参考图块重新缩放成输出图块(或图片)。Referring to Figure 7, in an embodiment, each image segment, such as a slice, GOB, tile, or tile group (hereinafter, tile), can be independently rescaled from a decoded tile to a reference tile, and from a reference tile to an output tile (or image) at different resolutions.

考虑到输入图片(701)进入正方形的编码器中，并分成四个正方形源图块(702)(示出了四个源图块中的源图块2)，每个源图块覆盖输入图片的1/4。当然，根据所公开的主题，其它图片几何形状和图块布局同样是可能的。使每个图块的宽度和高度分别为W的2倍和H的2倍，下文将两倍的宽度表示为“2W”，将两倍的高度表示为“2H”(对于其它数字，类似地表示；例如，1W意味着1倍的宽度，3H意味着3倍的高度-这种惯例在整个附图及其描述中使用)。源图块可以是例如对于安全相机环境中的不同场景的相机视图。这样，每个图块可以覆盖具有可能根本不同的活动水平的内容，可能需要为每个图块选择不同的RPR。Considering that the input image (701) enters a square encoder and is divided into four square source tiles (702) (source tile 2 of the four source tiles is shown), each source tile covers 1/4 of the input image. Of course, other image geometries and tile layouts are equally possible, depending on the subject matter disclosed. Each tile is made twice the width and twice the height of W and H, respectively, hereinafter referred to as "2W" for twice the width and "2H" for twice the height (similarly for other numbers; for example, 1W means 1 times the width and 3H means 3 times the height – this convention is used throughout the figures and their description). The source tiles can be, for example, camera views of different scenes in a secure camera environment. Thus, each tile can cover content with potentially fundamentally different levels of activity, which may require selecting different RPRs for each tile.

假设编码器(未描绘)创建已编码图片，则在重建之后产生了具有如下重新缩放的分辨率的四个图块：Assuming the encoder (not depicted) creates the encoded image, the reconstruction produces four tiles with the following rescaled resolution:

已解码图块0(702)：1H和1WDecoded blocks 0(702): 1H and 1W

已解码图块1(703)：1H和2WDecoded Plot 1 (703): 1H and 2W

已解码图块2(704)：2H和2WDecoded Plot 2 (704): 2H and 2W

已解码图块3(705)：2H和1WDecoded Plot 3 (705): 2H and 1W

这产生了按比例描绘的已解码图块大小。This produces a scaled-down representation of the decoded tile size.

注意在某些视频编码技术或标准中，在已解码图片中可能存在未分配给任何图块的某些样本。如果有的话，如何对这些样本进行编码，这对于不同的视频编码技术是不同的。在实施例中，在某些情况下，未分配给所描绘的任何图块的样本可以分配给其它图块，且图块的所有样本可以以创建低数量的编码比特的形式来编码，例如以跳过模式来编码。在实施例中，视频编码技术或标准可能不具有必须在每个视频图片中以某种形式对图片的所有样本进行编码的要求(当前在某种程度上是常见的)，因此在那些样本上不会浪费任何比特。在又一实施例中，某些填充技术可用于有效地填充未使用的样本，使得其编码开销可忽略。Note that in some video coding techniques or standards, there may be samples in a decoded image that are not assigned to any tile. How these samples are encoded, if any, varies depending on the video coding technique. In some embodiments, samples not assigned to any depicted tile may be assigned to other tiles, and all samples of a tile may be encoded by creating a low number of coding bits, for example, in a skip mode. In other embodiments, the video coding technique or standard may not have the requirement to encode all samples of the image in some form in every video frame (which is currently common to some extent), thus avoiding any wasted bits on those samples. In yet another embodiment, certain padding techniques can be used to efficiently fill in unused samples, making their coding overhead negligible.

在该示例中，参考图片缓冲器将参考图片样本保持在全分辨率，在这种情况下，全分辨率与源分辨率相同。因此，用于参考的四个重新缩放的图块(706到709)可分别保持在2H和2W分辨率。为了匹配已解码图块(702到705)的变化的分辨率，在从解码器到参考图片缓冲器以及从参考图片缓冲器到解码器这两个方向上的重新缩放(710)对于每个图块可能是不同的。In this example, the reference image buffer maintains the reference image sample at full resolution, which in this case is the same as the source resolution. Therefore, the four rescaled tiles (706 to 709) used for reference can be maintained at 2H and 2W resolutions, respectively. To match the varying resolutions of the decoded tiles (702 to 705), the rescaling (710) in both directions—from the decoder to the reference image buffer and from the reference image buffer to the decoder—may be different for each tile.

如果输出的重新缩放(711)也在使用，则已解码图片缓冲器的输出可以在每个图块或每个图片粒度上被重新缩放成输出图片以供显示(或以其它方式处理)(712)。用于显示的输出图片(712)的分辨率可以大于或小于已解码图片缓冲器中的图片的分辨率。If output rescaling (711) is also in use, the output of the decoded image buffer can be rescaled to the output image for display (or otherwise processed) at each tile or each image granularity (712). The resolution of the output image (712) used for display can be greater than or less than the resolution of the image in the decoded image buffer.

图8A是示出根据实施例的对已编码视频序列的已编码图片进行解码的方法(800)的流程图。在一些实施方式中，图8A的一个或多个处理块可以由解码器(210)执行。在一些实施方式中，图8A的一个或多个处理块可以由与解码器(210)分离或包括解码器(210)的另一设备或一组设备(例如编码器(203))来执行。Figure 8A is a flowchart illustrating a method (800) for decoding encoded images of an encoded video sequence according to an embodiment. In some embodiments, one or more processing blocks of Figure 8A may be performed by a decoder (210). In some embodiments, one or more processing blocks of Figure 8A may be performed by another device or a group of devices (e.g., an encoder (203)) separate from or including the decoder (210).

参照图8A，方法(800)包括确定是否存在RPR信息(805)，如果确定不存在RPR信息，则方法结束(855)。如果确定存在RPR信息，则方法包括从用于多个图片的第一高级语法结构，对与参考段分辨率相关的语法元素进行解码(810)。Referring to Figure 8A, the method (800) includes determining whether RPR information exists (805), and if it is determined that RPR information does not exist, the method ends (855). If it is determined that RPR information exists, the method includes decoding the syntax elements related to the reference segment resolution from a first high-level syntax structure for multiple images (810).

方法(800)包括从由第一已编码图片到第二已编码图片而变化的第二高级语法结构，对与已解码段分辨率相关的语法元素进行解码(820)。The method (800) includes a second high-level syntax structure that varies from the first encoded image to the second encoded image, and decodes syntax elements related to the resolution of the decoded segment (820).

方法(800)包括对来自参考图片缓冲器的样本进行重新采样，以供解码器为进行预测而使用，解码器以解码分辨率对段进行解码，且来自参考图片缓冲器的样本处于参考段分辨率(830)。Method (800) includes resampling samples from a reference image buffer for use by the decoder for prediction, the decoder decoding the segment at a decoding resolution, and the samples from the reference image buffer at a reference segment resolution (830).

方法(800)包括将处于已解码段分辨率的段解码成处于已解码段分辨率的已解码段(840)。The method (800) includes decoding a segment at the resolution of a decoded segment into a decoded segment at the resolution of a decoded segment (840).

此外，方法(800)包括将已解码段存储在参考图片缓冲器中(850)。In addition, the method (800) includes storing the decoded segment in a reference image buffer (850).

方法(800)可进一步包括将已解码段重新采样到参考段分辨率中。Method (800) may further include resampling the decoded segment to the reference segment resolution.

方法(800)可进一步包括重新采样滤波器，该重新采样滤波器用于以下至少一项：对来自参考图片缓冲器的样本进行重新采样，以供解码器为进行预测而使用；以及将已解码段重新采样到参考段分辨率中，其中，重新采样滤波器比双线性滤波器在计算上更复杂且不是自适应的。The method (800) may further include a resampling filter for at least one of the following: resampling samples from a reference image buffer for use by the decoder for prediction; and resampling the decoded segment to the resolution of the reference segment, wherein the resampling filter is computationally more complex and not adaptive than a bilinear filter.

方法(800)可进一步包括其中，重新采样滤波器基于解码分辨率和参考段分辨率之间的关系从多个重新采样滤波器中选择。The method (800) may further include the resampling filter being selected from a plurality of resampling filters based on the relationship between the decoding resolution and the reference segment resolution.

方法(800)可进一步包括其中，该段是图片。Method (800) may further include, in which the segment is an image.

方法(800)可进一步包括其中，第一已编码图片和第二已编码图片中的每一个包含多个段。The method (800) may further include, wherein each of the first encoded image and the second encoded image contains a plurality of segments.

方法(800)可进一步包括根据第三高级语法结构，对与输出分辨率相关的语法元素进行解码；以及将已解码段的样本重新采样到输出分辨率。The method (800) may further include decoding the syntax elements related to the output resolution according to the third high-level syntax structure; and resampling samples of the decoded segments to the output resolution.

方法(800)可进一步包括其中，重新采样使用在宽度和高度上不同的重新采样因子。Method (800) may further include resampling using different resampling factors in width and height.

尽管图8A示出了方法(800)的示例块，但是在一些实施方式中，方法(800)可以包括比图8A中所描绘的块更多的块、更少的块、不同的块或不同排列的块。另外或者可替换地，方法(800)的两个或更多个块可以并行地执行。Although Figure 8A shows an example block of method (800), in some implementations, method (800) may include more blocks, fewer blocks, different blocks, or blocks arranged differently than those depicted in Figure 8A. Alternatively, two or more blocks of method (800) may be executed in parallel.

此外，所提出的方法可由处理电路(例如，一个或多个处理器或者一个或多个集成电路)实施。在示例中，一个或多个处理器执行存储在非暂时性计算机可读介质中的程序，以执行一个或多个所提出的方法。Furthermore, the proposed methods can be implemented by processing circuitry (e.g., one or more processors or one or more integrated circuits). In the example, one or more processors execute a program stored in a non-transitory computer-readable medium to perform one or more of the proposed methods.

图8B是根据实施例的用于对视频序列的已编码图片进行解码的装置(860)的简化框图。Figure 8B is a simplified block diagram of an apparatus (860) for decoding encoded images of a video sequence according to an embodiment.

参照图8B，装置(860)包括第一解码代码(870)、第二解码代码(875)、重新采样代码(880)、第三解码代码(885)和存储代码(890)。Referring to FIG8B, the device (860) includes a first decoding code (870), a second decoding code (875), a resampling code (880), a third decoding code (885), and a storage code (890).

第一解码代码(870)被配置为从用于多个图片的第一高级语法结构，对与参考段分辨率相关的语法元素进行解码。The first decoding code (870) is configured to decode syntax elements related to the resolution of the reference segment from a first high-level syntax structure for multiple images.

第二解码代码(875)被配置为从由第一已编码图片到第二已编码图片而变化的第二高级语法结构，对与已解码段分辨率相关的语法元素进行解码。The second decoding code (875) is configured as a second high-level syntax structure that varies from the first encoded image to the second encoded image, decoding syntax elements related to the resolution of the decoded segment.

重新采样代码(880)被配置为对来自参考图片缓冲器的样本进行重新采样，以供解码器为进行预测而使用，解码器以解码分辨率对段进行解码，且来自参考图片缓冲器的样本处于参考段分辨率。The resampling code (880) is configured to resample the samples from the reference image buffer for use by the decoder for prediction, the decoder decodes the segment at the decoding resolution, and the samples from the reference image buffer are at the reference segment resolution.

第三解码代码(885)被配置为将处于已解码段分辨率的段解码成处于已解码段分辨率的已解码段。The third decoding code (885) is configured to decode a segment at the resolution of a decoded segment into a decoded segment at the resolution of a decoded segment.

存储代码(890)被配置为将已解码段存储在参考图片缓冲器中。The storage code (890) is configured to store the decoded segment in the reference image buffer.

上述技术可以实现为计算机软件，计算机软件使用计算机可读指令并物理地存储在一个或多个计算机可读介质中。The above technology can be implemented as computer software, which uses computer-readable instructions and is physically stored in one or more computer-readable media.

上述用于自适应图片分辨率重新缩放的技术可以实现为计算机软件，计算机软件使用计算机可读指令并物理地存储在一个或多个计算机可读介质中。例如，图9示出了适于实施所公开的主题的某些实施例的计算机系统900。The aforementioned techniques for adaptive image resolution rescaling can be implemented as computer software, which uses computer-readable instructions and is physically stored in one or more computer-readable media. For example, Figure 9 illustrates a computer system 900 suitable for implementing certain embodiments of the disclosed subject matter.

可以使用任何合适的机器代码或计算机语言对计算机软件进行编码，任何合适的机器代码或计算机语言可以经受汇编、编译、链接或类似的机制以创建包括指令的代码，该指令可以由一个或多个计算机中央处理单元(CPU)、图形处理单元(GPU)等直接执行或通过解释、微代码等执行。Computer software can be coded using any suitable machine code or computer language. Any suitable machine code or computer language can be assembled, compiled, linked, or similarly processed to create code containing instructions that can be executed directly by one or more computer central processing units (CPUs), graphics processing units (GPUs), etc., or executed through interpretation, microcode, etc.

指令可以在各种类型的计算机或其组件上执行，例如包括个人计算机、平板计算机、服务器、智能电话、游戏装置、物联网装置等。The instructions can be executed on various types of computers or their components, including personal computers, tablets, servers, smartphones, gaming devices, and Internet of Things devices.

图9中所示的用于计算机系统900的组件本质上是示例性的，并不旨在对实现本公开的实施例的计算机软件的使用范围或功能提出任何限制。也不应将组件的配置解释为具有与计算机系统900的实施例中所说明的组件中的任何一个组件或组件的组合相关的任何依赖或要求。The components for computer system 900 shown in Figure 9 are exemplary in nature and are not intended to impose any limitation on the scope or functionality of computer software implementing embodiments of this disclosure. Nor should the configuration of the components be construed as having any dependencies or requirements relating to any one or a combination of components described in the embodiments of computer system 900.

计算机系统900可以包括某些人机接口输入设备。此类人机接口输入设备可以响应于一个或多个人类用户通过例如下述的输入：触觉输入(例如：击键、划动，数据手套移动)、音频输入(例如：语音、拍手)、视觉输入(例如：手势)、嗅觉输入(未描绘)。人机接口装置还可以用于捕获不一定与人的意识输入直接相关的某些媒介，例如音频(例如：语音、音乐、环境声音)、图像(例如：扫描的图像、从静止图像相机获取摄影图像)、视频(例如二维视频、包括立体视频的三维视频)等。Computer system 900 may include certain human-machine interface input devices. Such human-machine interface input devices may respond to input from one or more human users through, for example, tactile input (e.g., keystrokes, swipes, movement of a data glove), audio input (e.g., speech, clapping), visual input (e.g., gestures), and olfactory input (not depicted). Human-machine interface devices may also be used to capture certain media that are not necessarily directly related to human conscious input, such as audio (e.g., speech, music, ambient sounds), images (e.g., scanned images, photographic images acquired from a still image camera), and video (e.g., two-dimensional video, three-dimensional video including stereoscopic video), etc.

输入人机接口设备可以包括下述中的一项或多项(每种中仅示出一个)：键盘901、鼠标902、触控板903、触摸屏910、数据手套904、操纵杆905、麦克风906、扫描仪907、相机908。The input human-machine interface device may include one or more of the following (only one of each is shown): keyboard 901, mouse 902, touchpad 903, touch screen 910, data glove 904, joystick 905, microphone 906, scanner 907, camera 908.

计算机系统900还可以包括某些人机接口输出设备。这样的人机接口输出设备可以例如通过触觉输出、声音、光和气味/味道来刺激一个或多个人类用户的感官。此类人机接口输出设备可以包括触觉输出设备(例如触摸屏910、数据手套904或操纵杆905的触觉反馈，但是也可以是不作为输入设备的触觉反馈设备)、音频输出设备(例如：扬声器909、耳机(未描绘))、视觉输出设备(例如包括CRT屏幕、LCD屏幕、等离子屏幕、OLED屏幕的屏幕910，每种屏幕都有或没有触摸屏输入功能，每种屏幕都有或没有触觉反馈功能-其中的一些屏幕能够通过诸如立体图片输出、虚拟现实眼镜(未描绘)、全息显示器和烟箱(未描绘)以及打印机(未描绘)之类的构件来输出二维视觉输出或超过三维输出。Computer system 900 may also include certain human-machine interface output devices. Such human-machine interface output devices can stimulate the senses of one or more human users, for example, through tactile output, sound, light, and smell/taste. Such human-machine interface output devices may include tactile output devices (e.g., tactile feedback from touchscreen 910, data gloves 904, or joystick 905, but may also be tactile feedback devices that are not input devices), audio output devices (e.g., speakers 909, headphones (not depicted)), and visual output devices (e.g., screens 910 including CRT screens, LCD screens, plasma screens, OLED screens, each with or without touchscreen input functionality, each with or without tactile feedback functionality—some of which are capable of outputting two-dimensional or more three-dimensional visual outputs through components such as stereoscopic image output, virtual reality glasses (not depicted), holographic displays and smoke boxes (not depicted), and printers (not depicted).

计算机系统900还可以包括人类可访问存储设备及其关联介质，例如包括具有CD/DVD等介质921的CD/DVD ROM/RW 920的光学介质、指状驱动器922、可拆卸硬盘驱动器或固态驱动器923、诸如磁带和软盘之类的传统磁性介质(未描绘)、诸如安全软件狗之类的基于专用ROM/ASIC/PLD的设备(未描绘)等。The computer system 900 may also include human-accessible storage devices and their associated media, such as optical media including CD/DVD ROM/RW 920 with media such as CD/DVD 921, finger drives 922, removable hard disk drives or solid-state drives 923, conventional magnetic media such as magnetic tapes and floppy disks (not depicted), devices based on dedicated ROM/ASIC/PLD such as security dongles (not depicted), etc.

本领域技术人员还应该理解，结合当前公开的主题使用的术语“计算机可读介质”不涵盖传输介质、载波或其他暂时性信号。Those skilled in the art should also understand that the term "computer-readable medium" as used in connection with the presently disclosed subject matter does not cover transmission media, carrier waves, or other transient signals.

计算机系统900还可以包括通到一个或多个通信网络(955)的接口。网络(955)可以例如是无线网络、有线网络、光网络。网络(955)可以进一步地是本地网络、广域网络、城域网络、车辆和工业网络、实时网络、延迟容忍网络等。网络(955)的示例包括诸如以太网之类的局域网、无线LAN、包括GSM、3G、4G、5G、LTE等的蜂窝网络、云和类似物、包括有线电视、卫星电视和地面广播电视的电视有线或无线广域数字网络、包括CANBus的车辆和工业用电视等等。某些网络(955)通常需要连接到某些通用数据端口或外围总线(949)的外部网络接口适配器(954)(例如计算机系统900的USB端口)；如下所述，其他网络接口通常通过连接到系统总线而集成到计算机系统900的内核中(例如，连接到PC计算机系统中的以太网接口或连接到智能手机计算机系统中的蜂窝网络接口)。计算机系统900可以使用这些网络(955)中的任何一个网络与其他实体通信。此类通信可以是仅单向接收的(例如，广播电视)、仅单向发送的(例如，连接到某些CANbus设备的CANbus)或双向的，例如使用局域网或广域网数字网络连接到其他计算机系统。如上所述，可以在那些网络(955)和网络接口(954)的每一个上使用某些协议和协议栈。Computer system 900 may also include interfaces to one or more communication networks (955). Networks (955) may be, for example, wireless networks, wired networks, or optical networks. Networks (955) may further be local area networks, wide area networks, metropolitan area networks, vehicle and industrial networks, real-time networks, latency-tolerant networks, etc. Examples of networks (955) include local area networks such as Ethernet, wireless LANs, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., cloud and the like, cable or wireless wide area digital television networks including cable television, satellite television, and terrestrial broadcast television, vehicle and industrial television including CANBus, etc. Some networks (955) typically require an external network interface adapter (954) (e.g., a USB port of computer system 900) to connect to some general-purpose data port or peripheral bus (949); other network interfaces are typically integrated into the core of computer system 900 by connecting to the system bus (e.g., an Ethernet interface connected to a PC computer system or a cellular network interface connected to a smartphone computer system), as described below. Computer system 900 can communicate with other entities using any of these networks (955). Such communication can be one-way receiving (e.g., broadcast television), one-way transmitting (e.g., CANbus connected to certain CANbus devices), or bidirectional, such as using a local area network or wide area network to connect to other computer systems. As mentioned above, certain protocols and protocol stacks can be used on each of those networks (955) and network interfaces (954).

上述人机接口设备、人机可访问的存储设备和网络接口可以附接到计算机系统900的内核940。The aforementioned human-machine interface devices, human-machine accessible storage devices, and network interfaces can be attached to the kernel 940 of the computer system 900.

内核940可以包括一个或多个中央处理单元(CPU)941、图形处理单元(GPU)942、现场可编程门区域(FPGA)943形式的专用可编程处理单元、用于某些任务的硬件加速器944等。这些设备以及只读存储器(ROM)945、随机存取存储器946、图形适配器950、诸如内部非用户可访问的硬盘驱动器、SSD等之类的内部大容量存储器947可以通过系统总线948连接。在一些计算机系统中，可以以一个或多个物理插头的形式访问系统总线948，以能够通过附加CPU、GPU等进行扩展。外围设备可以直接连接到内核的系统总线948或通过外围总线949连接到内核的系统总线948。外围总线的体系结构包括PCI、USB等。The core 940 may include one or more central processing units (CPUs) 941, graphics processing units (GPUs) 942, dedicated programmable processing units in the form of field-programmable gate areas (FPGAs) 943, hardware accelerators 944 for certain tasks, etc. These devices, along with read-only memory (ROM) 945, random access memory 946, graphics adapter 950, and internal mass storage 947 such as internal non-user-accessible hard disk drives (SDs), etc., can be connected via the system bus 948. In some computer systems, the system bus 948 can be accessed via one or more physical connectors to allow for expansion by adding CPUs, GPUs, etc. Peripheral devices can be directly connected to the core's system bus 948 or connected via a peripheral bus 949. Peripheral bus architectures include PCI, USB, etc.

CPU 941、GPU 942、FPGA 943和加速器944可以执行某些指令，这些指令可以组合来构成上述计算机代码。计算机代码可以存储在ROM 945或RAM 946中。过渡数据也可以存储在RAM 946中，而永久数据可以例如存储在内部大容量存储器947中。可以通过使用高速缓存来进行对任何存储设备的快速存储及检索，该高速缓存可以与下述紧密关联：一个或多个CPU 941、GPU 942、大容量存储器947、ROM 945、RAM 946等。The CPU 941, GPU 942, FPGA 943, and accelerator 944 can execute certain instructions, which can be combined to form the aforementioned computer code. The computer code can be stored in ROM 945 or RAM 946. Transient data can also be stored in RAM 946, while permanent data can be stored, for example, in internal mass storage 947. Fast storage and retrieval of any storage device can be achieved using a cache, which can be closely associated with one or more CPUs 941, GPUs 942, mass storage 947, ROM 945, RAM 946, etc.

计算机可读介质可以在其上具有用于执行各种由计算机实现的操作的计算机代码。介质和计算机代码可以是出于本公开的目的而专门设计和构造的介质和计算机代码，或者介质和计算机代码可以是计算机软件领域的技术人员公知且可用的类型。Computer-readable media may have computer code thereon for performing various computer-implemented operations. The media and computer code may be media and computer code specifically designed and constructed for the purposes of this disclosure, or the media and computer code may be of a type known and available to those skilled in the art of computer software.

作为非限制性示例，可以由于一个或多个处理器(包括CPU、GPU、FPGA、加速器等)执行包含在一种或多种有形的计算机可读介质中的软件而使得具有架构900，特别是内核940的计算机系统可以提供功能。此类计算机可读介质可以是与如上所述的用户可访问的大容量存储器相关联的介质，以及某些非暂时性的内核940的存储器，例如内核内部大容量存储器947或ROM 945。可以将实施本公开的各种实施例的软件存储在此类设备中并由内核940执行。根据特定需要，计算机可读介质可以包括一个或多个存储设备或芯片。软件可以使得内核940，特别是其中的处理器(包括CPU、GPU、FPGA等)执行本文所描述的特定过程或特定过程的特定部分，包括定义存储在RAM 946中的数据结构以及根据由软件定义的过程来修改此类数据结构。附加地或替换地，可以由于硬连线或以其他方式体现在电路(例如，加速器944)中的逻辑而使得计算机系统提供功能，该电路可以替换软件或与软件一起运行以执行本文描述的特定过程或特定过程的特定部分。在适当的情况下，提及软件的部分可以包含逻辑，反之亦然。在适当的情况下，提及计算机可读介质的部分可以包括存储用于执行的软件的电路(例如集成电路(IC))、体现用于执行的逻辑的电路或包括两者。本公开包括硬件和软件的任何合适的组合。As a non-limiting example, a computer system having architecture 900, particularly kernel 940, can provide functionality by having one or more processors (including CPUs, GPUs, FPGAs, accelerators, etc.) execute software contained in one or more tangible computer-readable media. Such computer-readable media can be media associated with user-accessible mass storage as described above, as well as some non-transitory memory of kernel 940, such as internal mass storage 947 or ROM 945. Software implementing various embodiments of this disclosure can be stored in such devices and executed by kernel 940. Depending on specific needs, the computer-readable media may include one or more storage devices or chips. The software can cause kernel 940, particularly the processors therein (including CPUs, GPUs, FPGAs, etc.), to execute specific processes or specific portions of specific processes described herein, including defining data structures stored in RAM 946 and modifying such data structures according to software-defined processes. Additionally or alternatively, the computer system may provide functionality through hard-wired or otherwise embodied logic in circuitry (e.g., accelerator 944), which may replace or operate with the software to perform a particular process or a particular portion of a particular process described herein. Where appropriate, references to software may include logic, and vice versa. Where appropriate, references to computer-readable media may include circuitry storing software for execution (e.g., integrated circuits (ICs)), circuitry embodying logic for execution, or both. This disclosure includes any suitable combination of hardware and software.

尽管本公开已经描述了多个示例性实施例，但是存在落入本公开的范围内的修改、置换和各种替换等效物。因此，应当理解，本领域技术人员将能够设计出许多虽然未在本文中明确示出或描述，但体现了本公开的原理，因此落入本公开的精神和范围内的系统和方法。Although several exemplary embodiments have been described in this disclosure, modifications, substitutions, and various equivalent alternatives that fall within the scope of this disclosure exist. Therefore, it should be understood that those skilled in the art will be able to design numerous systems and methods that, while not expressly shown or described herein, embody the principles of this disclosure and thus fall within its spirit and scope.

Claims

1. A method for decoding encoded images of an encoded video sequence, the method being executed by at least one processor, and the method comprising:

Decode the syntax elements related to the resolution of the reference image from the first high-level syntax structure used for multiple images;

A second high-level syntax structure that varies from a first encoded image to a second encoded image decodes syntax elements related to the resolution of the decoded tiles, wherein each of the first encoded image and the second encoded image contains multiple tiles;

Samples from the reference image buffer are resampled for use by the decoder for prediction. The decoder decodes the current tile at the decoding resolution, and the samples from the reference image buffer are at the reference image resolution.

Decode the current tile at the resolution of the decoded tile into a decoded tile at the resolution of the decoded tile; and

The decoded image tiles are stored in the reference image buffer, and the decoded image tiles are resampled to the reference image resolution;

In response to the existence of samples not assigned to any tile, the samples are assigned to other tiles, wherein the samples in the other tiles are encoded in a skip mode.

2. The method of claim 1, wherein the resampling filter is used for at least one of: resampling the samples from the reference image buffer for use by the decoder for prediction; and resampling the decoded patches to the reference image resolution, wherein the resampling filter is computationally more complex and not adaptive than a bilinear filter.

3. The method of claim 2, wherein the resampling filter is selected from a plurality of resampling filters based on the relationship between the decoding resolution and the reference image resolution.

4. The method according to any one of claims 1 to 3, further comprising:

Based on the third-level grammatical structure, decode the grammatical elements related to the output resolution; and

The samples of the decoded tiles are resampled to the output resolution.

5. The method according to any one of claims 1 to 3, wherein the resampling uses resampling factors that are different in width and height.

6. An apparatus for decoding encoded images of an encoded video sequence, the apparatus comprising:

At least one memory is configured to store computer program code; and

At least one processor is configured to access the at least one memory and operate according to the computer program code, the computer program code comprising:

The first decoding code is configured to decode syntax elements related to the resolution of a reference image from a first high-level syntax structure used for multiple images;

The second decoding code is configured to vary from the first encoded image to the second encoded image, and to decode syntax elements related to the resolution of the decoded tiles, wherein each of the first encoded image and the second encoded image contains multiple tiles;

The resampling code is configured to resample samples from a reference image buffer for use by the decoder for prediction, the decoder decoding the current patch at a decoding resolution, and the samples from the reference image buffer are at the reference image resolution;

The third decoding code is configured to decode the current tile at the resolution of the decoded tile into a decoded tile at the resolution of the decoded tile; and

The storage code is configured to store the decoded tile in the reference image buffer and resample the decoded tile to the reference image resolution;

7. The apparatus of claim 6, wherein the resampling filter is used for at least one of: resampling the samples from the reference image buffer for use by the decoder for prediction; and resampling the decoded patches to the reference image resolution, wherein the resampling filter is computationally more complex and not adaptive than a bilinear filter.

8. The apparatus of claim 7, wherein the resampling filter is selected from a plurality of resampling filters based on the relationship between the decoding resolution and the reference image resolution.

9. The apparatus according to any one of claims 6 to 8, wherein the resampling uses resampling factors that are different in width and height.

10. A non-transitory computer-readable storage medium storing a program for decoding encoded images of an encoded video sequence, the program comprising instructions that cause a processor to:

11. The non-transitory computer-readable storage medium of claim 10, wherein the resampling filter is used for at least one of: resampling the samples from the reference image buffer for use by the decoder for prediction; and resampling the decoded tiles to the reference image resolution, wherein the resampling filter is computationally more complex and non-adaptive than a bilinear filter.

12. The non-transitory computer-readable storage medium of claim 11, wherein the resampling filter is selected from a plurality of resampling filters based on the relationship between the decoding resolution and the reference image resolution.

13. An apparatus for decoding encoded images of an encoded video sequence, the apparatus comprising:

The first decoding module is configured to decode syntax elements related to the resolution of a reference image from a first high-level syntax structure used for multiple images;

The second decoding module is configured to use a second high-level syntax structure that varies from the first encoded image to the second encoded image to decode syntax elements related to the resolution of the decoded tiles, wherein each of the first encoded image and the second encoded image contains multiple tiles;

A resampling module is configured to resample samples from a reference image buffer for use by a decoder for prediction, the decoder decoding the current patch at a decoding resolution, and the samples from the reference image buffer being at the reference image resolution;

The third decoding module is configured to decode the current tile at the resolution of the decoded tile into a decoded tile at the resolution of the decoded tile; and

The storage module is configured to store the decoded tiles in the reference image buffer and resample the decoded tiles to the reference image resolution;