HK1234565B - Method and system for selectively breaking prediction in video coding - Google Patents
Method and system for selectively breaking prediction in video coding Download PDFInfo
- Publication number
- HK1234565B HK1234565B HK17108276.9A HK17108276A HK1234565B HK 1234565 B HK1234565 B HK 1234565B HK 17108276 A HK17108276 A HK 17108276A HK 1234565 B HK1234565 B HK 1234565B
- Authority
- HK
- Hong Kong
- Prior art keywords
- prediction
- tiles
- slice
- video
- video image
- Prior art date
Links
Description
本申请是申请号为201180062300.7、申请日为2011年12月28日、发明名称为“用于选择性地破坏视频编码中的预测的方法和系统”的发明专利申请的分案申请。This application is a divisional application of the invention patent application with application number 201180062300.7, application date December 28, 2011, and invention name “Method and system for selectively destroying prediction in video coding”.
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本中请要求2010年12月28日提交的、标题为“PICTURE SEGMENTATION USINGGENERALIZED SLICES”的美国临时专利申请No.61/427,569和2011年12月23日提交的、标题为“METHOD AND SYSTEM FOR SELECTIVELY BREAKING PREDICTION IN VIDEO CODING”的美国专利申请No.13/336,475的优先权,通过引用方式将以上每个申请的全部内容并入本文中。This application claims priority to U.S. Provisional Patent Application No. 61/427,569, filed on December 28, 2010, entitled “PICTURE SEGMENTATION USING GENERALIZED SLICES,” and U.S. Patent Application No. 13/336,475, filed on December 23, 2011, entitled “METHOD AND SYSTEM FOR SELECTIVELY BREAKING PREDICTION IN VIDEO CODING,” each of which is incorporated herein by reference in its entirety.
技术领域Technical Field
本发明的实施例涉及视频压缩,并且更具体而言,涉及在视频图像的图像段边界处选择性地使用预测和环内滤波器机制。Embodiments of the present invention relate to video compression, and more particularly, to selective use of prediction and in-loop filter mechanisms at picture segment boundaries of video pictures.
背景技术Background Art
可以将数字视频能力并入到范围广义的设备中,包括数字电视、数字直接广播系统、无线广播系统、个人数字助理(PDA)、膝上型计算机或者桌上型计算机、摄像机、数字记录设备、视频游戏设备、视频游戏控制器、蜂窝或卫星无线电话等等。数字视频设备可以实现视频压缩技术,例如在类似MPEG-2、MPEG-4的标准中所描述的那些视频压缩技术,其中,可以从位于瑞士CH-1211日内瓦20,邮政信箱56,Voie-Creuse第一大道的国际标准组织(“ISO”)或www.iso.org或ITU-TH.264/MPEG-4第十部分,高级视频编码(“AVC”)获得,从位于瑞士CH-1211日内瓦20万国宫广场的国际电信联盟(“ITU”)或www.itu.int获得MPEG-2、MPEG-4标准,通过引用方式将以上每个标准的全部内容并入本文中,或者根据其它标准或非标准规范实现视频压缩技术,从而有效地编码和/或解码数字视频信息。其它的压缩技术仍然可能在将来被开发或者目前正处于开发中。例如,被称为HEVC/H.265的新视频压缩标准正处于JCT-VC委员会的开发中。在由Wiegand等人于2011年3月所著的“WD3:WorkingDraft3of High-Efficiency Video Coding,JCTVC-E603”中阐释了HEVC/H.265的工作草案被提出,“WD3:Working Draft 3 of High-Efficiency Video Coding,JCTVC-E603”在后文中称为“WD3”并且通过引用方式被整体并入到本文中。Digital video capabilities may be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, camcorders, digital recording devices, video game devices, video game consoles, cellular or satellite radiotelephones, etc. Digital video devices may implement video compression techniques, such as those described in standards like MPEG-2, MPEG-4, available from the International Standards Organization ("ISO"), PO Box 56, 1st Avenue Voie-Creuse, CH-1211 Geneva, Switzerland, or www.iso.org, or ITU-T H.264/MPEG-4 Part 10, Advanced Video Coding ("AVC"), available from the International Telecommunication Union ("ITU"), 20 Place des Palais, CH-1211 Geneva, Switzerland, or www.itu.int, each of which is incorporated herein by reference in its entirety, or according to other standard or non-standard specifications, to efficiently encode and/or decode digital video information. Other compression technologies may still be developed in the future or are currently under development. For example, a new video compression standard called HEVC/H.265 is under development by the JCT-VC committee. A working draft of HEVC/H.265 was proposed in "WD3: Working Draft 3 of High-Efficiency Video Coding, JCTVC-E603" by Wiegand et al., March 2011, which is hereinafter referred to as "WD3" and is incorporated herein by reference in its entirety.
视频编码器可以接收未编码的视频信息,以便于处理成任何合适的格式,该格式可以是符合(可从位于瑞士CH-1211日内瓦20万国宫广场的国际电信联盟(“ITU”)或www.itu.int获得的并且以引用的方式将其全部内容并入本申请的)ITU-R BT601的数字格式或一些其它数字格式。可以将未编码视频在空间上组织成布置在一个或多个二维矩阵中的像素值并且在时间上组织成一系列未编码图像,每个未编码图像包括一个或多个上述提到的二维像素值矩阵。此外,每个像素可以包括用于以数字格式表示色彩的大量独立的分量。用于被输入到视频编码器的未编码视频的一个通用格式针对具有四个像素的每个分组来说具有四个辉度样本(其包括有关像素的亮度/光亮度或暗度的信息)和两个色度样本(其包括颜色信息(例如,YCrCb 4:2:0))。A video encoder may receive unencoded video information for processing into any suitable format, which may be a digital format compliant with ITU-R BT601 (available from the International Telecommunication Union ("ITU") at 20 Place des Palais, Geneva, Switzerland, CH-1211, or www.itu.int, and incorporated herein by reference in its entirety), or some other digital format. The unencoded video may be spatially organized into pixel values arranged in one or more two-dimensional matrices and temporally organized into a series of unencoded images, each unencoded image comprising one or more of the aforementioned two-dimensional matrices of pixel values. Furthermore, each pixel may comprise a number of independent components for representing color in a digital format. One common format for unencoded video input to a video encoder has, for each group of four pixels, four luminance samples (which include information about the brightness/lightness or darkness of the pixel) and two chrominance samples (which include color information (e.g., YCrCb 4:2:0)).
视频编码器的一个功能是将未编码图像转译(更通常“转换”)为比特流、分组流、NAL单元流或其它合适的传输格式(在后文都被称为“比特流”),其目的在于例如减少编码到比特流中的冗余的数量,以因此增加传输速率,增加比特流的恢复,从而抑制在传输期间可能发生的比特错误或分组擦除(统称为差错恢复),或其它专用的目标。本发明的实施例提供冗余的去除或减少、差错恢复的增加以及并行处理架构中的视频编码器和/或相关联的解码器的实现中的至少一个。One function of a video encoder is to translate (more generally "convert") an uncoded image into a bitstream, packet stream, NAL unit stream, or other suitable transmission format (hereinafter referred to as a "bitstream"), with the goal of, for example, reducing the amount of redundancy encoded into the bitstream to thereby increase the transmission rate, increasing the bitstream's recovery to mitigate bit errors or packet erasures that may occur during transmission (collectively referred to as error recovery), or other specialized goals. Embodiments of the present invention provide at least one of removal or reduction of redundancy, increased error resilience, and implementation of a video encoder and/or associated decoder in a parallel processing architecture.
视频解码器的一个功能在于接收由符合相同的视频压缩标准的视频编码器产生的比特流形式的编码视频作为它的输入。视频编码器然后将接收到的编码比特流转译(更通常“转换”)为可以被显示、存储或以其它方式处理的未编码视频信息。One function of a video decoder is to receive as its input an encoded video in the form of a bitstream produced by a video encoder that conforms to the same video compression standard. The video encoder then translates (more commonly "converts") the received encoded bitstream into unencoded video information that can be displayed, stored, or otherwise processed.
可以使用硬件和/或软件配置(包括硬件和软件二者的组合)实现视频编码器和视频解码器。视频编码器和视频解码器中的任何一个或两个的实现可以包括可编程的硬件组件例如(例如,个人计算机(PC)中找到的那些)通用中央处理器(CPU)、嵌入式处理器、图形卡处理器、数字信号处理器(DSP)、现场可编程门阵列(FPGA)等等的使用。为了实现视频编码或解码的至少一部分,可能需要指令,并且可以使用一个或多个非瞬态的计算机可读介质来存储和分配那些指令。计算机可读介质选择包括压缩盘只读存储器(CD-ROM)、数字视频盘只读存储器(DVD-ROM)、记忆棒、嵌入式ROM等等。Video encoders and video decoders can be implemented using hardware and/or software configurations (comprising a combination of both hardware and software). The implementation of any one or both of a video encoder and a video decoder can include the use of programmable hardware components such as (e.g., those found in a personal computer (PC)) a general-purpose central processing unit (CPU), an embedded processor, a graphics card processor, a digital signal processor (DSP), a field programmable gate array (FPGA), etc. In order to implement at least a portion of video encoding or decoding, instructions may be required, and one or more non-transient computer-readable media can be used to store and distribute those instructions. Computer-readable media selections include compact disc read-only memory (CD-ROM), digital video disc read-only memory (DVD-ROM), memory stick, embedded ROM, etc.
视频压缩和解压缩指的是视频编码器和/或解码器中执行的特定操作。视频解码器可以执行编码操作的反向操作的全部或子集。除非另有说明,本文所描述的视频编码技术还旨在包括所描述的视频编码技术的反向操作(即相关的视频解码技术)。Video compression and decompression refer to specific operations performed in a video encoder and/or decoder. A video decoder may perform all or a subset of the inverse operations of the encoding operations. Unless otherwise specified, the video encoding techniques described herein are also intended to include the inverse operations of the described video encoding techniques (i.e., the related video decoding techniques).
可以把未压缩、数字表示的视频看作样本流,其中可以由视频显示器按扫描次序处理该样本。在该样本流中通常出现的一种类型的边界是样本流中的图像之间的边界。许多视频压缩标准识别该边界,并且一般例如通过在每个未编码图像的开始处插入图像报头或其它的元数据,在这些边界处分割编码比特流。在样本流中可能出现的其它边界包括片边界和瓦片边界,其可以出现在未编码图像内,如下所述。Uncompressed, digitally represented video can be viewed as a stream of samples that can be processed in scan order by a video display. One type of boundary that commonly occurs in this sample stream is the boundary between pictures in the sample stream. Many video compression standards recognize these boundaries and typically split the coded bitstream at these boundaries, for example, by inserting a picture header or other metadata at the beginning of each uncoded picture. Other boundaries that may occur in a sample stream include slice boundaries and tile boundaries, which may occur within uncoded pictures, as described below.
视频编码中的预测可以发生在多个等级上。Prediction in video coding can occur at several levels.
一个等级在后文被称作“熵编码等级”,并且在该等级上的预测被称作“编码预测”。在该等级中,熵编码符号的解码可能需要成功地解码先前的熵编码符号。所有或几乎所有的当前压缩标准都破坏图像和片等级处的编码预测。即,在检测到比特流(或等同物)中的图像或片报头之后,熵编码中使用的与熵编码相关状态被复位为初始化状态。熵编码预测的一个实例是ITU-T Rec.H.264中的CABAC状态的复位。A level is hereinafter referred to as the "entropy coding level," and prediction at this level is referred to as "coding prediction." Within this level, decoding of an entropy coded symbol may require successful decoding of a previous entropy coded symbol. All or nearly all current compression standards break coding prediction at the picture and slice levels. That is, upon detection of a picture or slice header in the bitstream (or equivalent), the entropy coding-related states used in entropy coding are reset to an initialized state. An example of entropy coding prediction is the resetting of the CABAC state in ITU-T Rec. H.264.
此外,可能存在这样一种编码机制,其中该解码机制未落入如上所述的与熵编码相关的预测的普通理解内,但是其仍然涉及与比特流相关联的重构控制信息而不是像素值。作为一个实例,即使一些较旧的标准(例如,ITU-T Rec.H.261标准)也允许相对于一个或多个先前编码的运动向量编码运动向量。块组(GOB)、片或图像报头的检测将该预测向量复位为(0,0)。Furthermore, there may be an encoding mechanism where the decoding mechanism does not fall within the common understanding of prediction associated with entropy coding as described above, but still involves reconstruction control information associated with the bitstream rather than pixel values. As an example, even some older standards (e.g., ITU-T Rec. H.261) allow motion vectors to be encoded relative to one or more previously encoded motion vectors. The detection of a group of blocks (GOB), slice, or picture header resets the prediction vector to (0, 0).
还存在跨越多幅图像的预测机制。例如,运动补偿可以将来自一幅或多幅参考图像的(有可能已被运动补偿的)像素值用于预测。通过宏块类型(或等同物)破坏该类型的预测。例如,帧内宏块通常不使用来自参考图像的预测,然而帧间宏块可能使用来自参考图像的预测。在这个意义上说,帧内片和帧间片仅仅是属于那些不同宏块类型的宏块的累积。There are also prediction mechanisms that span multiple pictures. For example, motion compensation can use (possibly motion-compensated) pixel values from one or more reference pictures for prediction. This type of prediction is broken down by macroblock type (or equivalent). For example, intra macroblocks typically do not use prediction from a reference picture, whereas inter macroblocks may. In this sense, intra and inter slices are simply accumulations of macroblocks belonging to those different macroblock types.
还存在这样一种预测等级,其中该预测等级包括基于在正在被编码的图像的重构过程期间已经被重构的像素值的预测。一个实例是帧内预测机制,例如,ITU-T Rec.H.263的附件I中描述的那个(类似的机制在其它视频编码标准中也是可用的)。There is also a level of prediction that includes predictions based on pixel values that have already been reconstructed during the reconstruction process of the image being coded. An example is an intra-frame prediction mechanism, such as that described in Annex I of ITU-T Rec. H.263 (similar mechanisms are also available in other video coding standards).
除了预测机制之外,几个视频编码标准指定了用于执行环内滤波的滤波器。一个实例是ITU-T Rec.H.263的附件J中指定的环内滤波器。In addition to prediction mechanisms, several video coding standards specify filters for performing in-loop filtering. One example is the in-loop filter specified in Annex J of ITU-T Rec. H.263.
针对一些应用,将正在被编码的图像分割成较小的数据块可能是有利的,其中,该分割可以发生在编码之前或编码期间。下面描述可以受益于图像分割的两种使用情况。For some applications, it may be advantageous to segment the image being encoded into smaller data blocks, where the segmentation may occur before or during encoding. Two use cases that may benefit from image segmentation are described below.
第一种该使用情况涉及并行处理。在以前,标清视频(例如720x 480或720x 576个像素)是广义的商业使用中的最大格式。最近,出现并且在各种各样的应用空间使用(高达1920x 1080个像素的)HD格式以及4k(4096x 2048个像素)、8k(8192x 4096个像素)以及更大的格式。尽管近些年来可负担的计算能力增加,但是由于与这些更新的或更大的格式中的一些格式相关联的图像尺寸非常大,所以调节并行处理的效率以编码并且解码过程通常是有利的。并行编码和解码可以例如发生在指令等级(例如使用SIMD)、在可以在不同阶段上同时处理多个视频编码单元的流水线中或者在由独立的计算引擎作为独立的实体(例如多核通用处理器)来处理视频编码子单元的集合的大型结构基础上。并行处理的最后一个格式可能需要图像分割。The first use case involves parallel processing. In the past, standard definition video (e.g., 720x480 or 720x576 pixels) was the largest format in widespread commercial use. More recently, HD formats (up to 1920x1080 pixels) have emerged and are used in a variety of application spaces, as well as 4k (4096x2048 pixels), 8k (8192x4096 pixels), and larger formats. Despite the increase in affordable computing power in recent years, due to the very large image sizes associated with some of these newer or larger formats, it is generally advantageous to leverage the efficiency of parallel processing for encoding and decoding processes. Parallel encoding and decoding can occur, for example, at the instruction level (e.g., using SIMD), in a pipeline that can process multiple video encoding units simultaneously at different stages, or on a large-scale architecture where a collection of video encoding subunits is processed by independent computing engines as independent entities (e.g., a multi-core general-purpose processor). The last format for parallel processing may require image segmentation.
第二种该使用情况涉及图像分割以便创建适用于在分组网络上的有效传输的比特流。基于IP或其他分组网络协议来传输编码视频的编解码器可能受到最大传输单元(“MTU”)尺寸约束的限制。对于编码片尺寸而言,有时候包括编码片的结果分组尽可能地接近MTU尺寸而不超过该尺寸是有利的,以便保持高的有效载荷/分组化开销比率,同时避免被网络分段(以及结果导致的更高的损失概率)。The second use case involves image segmentation to create a bitstream suitable for efficient transmission over a packet network. Codecs used to transmit coded video over IP or other packet network protocols may be limited by maximum transmission unit ("MTU") size constraints. With respect to coded slice size, it is sometimes advantageous for the resulting packets containing coded slices to be as close to the MTU size as possible without exceeding it, in order to maintain a high payload/packetization overhead ratio while avoiding network fragmentation (and the resulting higher probability of loss).
MTU尺寸从一个网络到另一个网络网络变化很大。可以例如由通常用于因特网的网络基础设施的最小MTU尺寸设置许多因特网连接的MTU尺寸,其中,该最小MTU尺寸通常对应于以太网中的极限并且可以大致为1500个字节。The MTU size varies widely from one network to another. The MTU size of many Internet connections may be set, for example, by the minimum MTU size of the network infrastructure typically used for the Internet, which typically corresponds to the limit in Ethernet and may be approximately 1500 bytes.
编码图像中的比特的数量依赖于许多因素如源图像的尺寸、希望的质量、就预测适宜性而言的内容复杂度、视频编码标准的编码效率以及其他因素。然而,即使在适中的设置质量和内容复杂度上,对于HD分辨率以及更高分辨率的序列,平均编码图像的尺寸轻易地超过MTU尺寸。视频会议编码器可能例如需要大约2Mbit/sec,以编码720p60视频序列。这导致大致33000比特或4215字节的平均编码图像尺寸,这比因特网的MTU尺寸的1500个字节多相当多。在更高的分辨率上,平均图像尺寸增加到显著高于因特网的MTU尺寸的值。假设与以上720p60中类似的压缩比,60fps上的4096x2048(4k)视频(4kp60)可能对于每个编码视频图像需要大于300000个比特或25个MUT大小的分组。The number of bits in a coded picture depends on many factors, such as the size of the source picture, the desired quality, the complexity of the content in terms of prediction suitability, the coding efficiency of the video coding standard, and other factors. However, even at moderate settings of quality and content complexity, the average coded picture size easily exceeds the MTU size for sequences at HD resolution and higher. A video conferencing encoder might, for example, require approximately 2 Mbit/sec to encode a 720p60 video sequence. This results in an average coded picture size of approximately 33,000 bits or 4,215 bytes, which is considerably more than the Internet's MTU size of 1,500 bytes. At higher resolutions, the average picture size increases to values significantly above the Internet's MTU size. Assuming a similar compression ratio as in 720p60 above, 4096x2048 (4k) video at 60 fps (4kp60) might require more than 300,000 bits or 25 MTU-sized packets per coded video picture.
在许多以前的视频编码标准(例如,直到和包括WD3)中,图像段(或至少一种形式的图像段)被称作“片”。在下面的描述中,可以将用于破坏至少一种形式的图像内预测、环内滤波或其它编码机制的任何种类的(例如基于视频编码的)图像分段可以通常称作“片”。同样地,诸如(从见以上用于H.264的ITU可得的)ITU.T Rec.H.261或ITU Rec.H.263中的块组(“GOB”)、H.264或MPEG标准族中的片之类的结构中的每一个结构可以构成遍及本文所使用的术语“片”。然而,RFC3984的分段单元或H.264的数据部分不能构成遍及本文所使用的术语“片”,因为他们没有破坏图像内的预测、环内滤波或另一种编码机制。In many previous video coding standards (e.g., up to and including WD3), a picture segment (or at least one form of picture segment) was referred to as a "slice." In the following description, any kind of picture segmentation (e.g., based on video coding) that is used to disrupt at least one form of intra-picture prediction, in-loop filtering, or other coding mechanism may be generally referred to as a "slice." Similarly, structures such as the group of blocks ("GOB") in ITU-T Rec. H.261 or ITU Rec. H.263 (available from the ITU for H.264, see above), slices in H.264 or the MPEG family of standards may each constitute the term "slice" as used throughout this document. However, the segmentation units of RFC3984 or the data portions of H.264 do not constitute the term "slice" as used throughout this document because they do not disrupt intra-picture prediction, in-loop filtering, or another coding mechanism.
参考图1,图1示出了使用片的图像分割的实例100。把图像101分成两个扫描次序片102、103。把片边界示出为粗线104。第二片103的第一个宏块105具有地址11。例如,当使用H.264标准生成用于传输图像101的对应的比特流106时,比特流106可以包括一个或多个参数集107紧接着两个片102、103的片报头108、110和片数据109、111,其中该一个或多个参数集107不包括有关片边界的信息,报头。放大示出了第二片103的片报头110。例如,由解码器通过至少两个因素的组合确定未编码片103的尺寸。首先,片报头110包括片103的第一宏块105的地址。其次,例如通过检测比特流中的新的片报头来确定片的结尾,或在所述实例中,通过比特流112中的编码图像的结束(即,在宏块24之后)来确定片的结束。第一宏块和该片的结尾之间的所有宏块构成了该片。应当注意的是,扫描次序修改(例如,H.264的灵活的宏块排序)可以通过创建间隙来改变片中的宏块的数量。Referring to FIG. 1 , an example 100 of image segmentation using slices is shown. An image 101 is divided into two scan-order slices 102 and 103. The slice boundary is shown as a thick line 104. The first macroblock 105 of the second slice 103 has an address 11. For example, when a corresponding bitstream 106 for transmitting image 101 is generated using the H.264 standard, bitstream 106 may include one or more parameter sets 107 followed by slice headers 108 and 110 and slice data 109 and 111 for both slices 102 and 103. The one or more parameter sets 107 do not include information regarding slice boundaries. The slice header 110 of the second slice 103 is shown enlarged. For example, the size of an uncoded slice 103 is determined by a decoder based on a combination of at least two factors. First, slice header 110 includes the address of the first macroblock 105 of slice 103. Next, the end of the slice is determined, for example, by detecting a new slice header in the bitstream, or in the example described, by the end of the coded picture in the bitstream 112 (i.e., after macroblock 24). All macroblocks between the first macroblock and the end of the slice constitute the slice. It should be noted that scan order modifications (e.g., H.264's flexible macroblock ordering) can change the number of macroblocks in a slice by creating gaps.
在媒介未感知分割机制例如由路由层的IP提供的那些分割机制上使用片的一个优点在于片通过破坏片之间的边界处的特定类型的预测从而可至少在一定程度上独立地解码片(如下面更详细地讨论的)。因此一个片的丢失不必然地致使编码图像的其它片不可用或者不可解码。根据分段机制的实现,分段的丢失可能相反地致使许多其它分段不可用,因为贯穿本文使用的术语分段没有破坏任何形式的预测。One advantage of using slices over a media-unaware segmentation mechanism, such as those provided by IP at the routing layer, is that slices can be decoded at least to a certain degree independently by breaking a certain type of prediction at the boundaries between slices (as discussed in more detail below). Thus, the loss of one slice does not necessarily render other slices of the coded picture unusable or undecodable. Depending on the implementation of the segmentation mechanism, the loss of a segment may instead render many other segments unusable, since the term segmentation, as used throughout this document, does not break any form of prediction.
WD4(从http://wftp3.itu.int/av-arch/jctvc-sit/2011_07_f_Forino/可获得的、由B.Bross等人所著的“WD4:Working Draft 4 of High-Efficiency Video Coding”)是与开发中的与数字视频编码标准相关的草案规范,其可以被称作高效视频编码(HEVC)或H.265。除了片之外,WD4还包括被称作“瓦片(title)”的图像分割机制。根据WD4,可以把源图像分成被称为瓦片的矩形单元,使得源图像的每个像素是瓦片的一部分(其它约束条件还可以适用)。因此,瓦片是图像的矩形部分。瓦片由高级语法结构中可用的坐标系确定瓦片边界,该高级语法结构在WD4中被称作参数集。下面更详细地描述瓦片。WD4 ("WD4: Working Draft 4 of High-Efficiency Video Coding" by B. Bross et al., available from http://wftp3.itu.int/av-arch/jctvc-sit/2011_07_f_Forino/) is a draft specification associated with a developing digital video coding standard, which may be referred to as High Efficiency Video Coding (HEVC) or H.265. In addition to slices, WD4 also includes an image segmentation mechanism called "tiles". According to WD4, a source image can be divided into rectangular units called tiles, so that each pixel of the source image is part of a tile (other constraints may also apply). Thus, a tile is a rectangular portion of an image. Tiles have tile boundaries determined by a coordinate system available in a high-level syntax structure, which is called a parameter set in WD4. Tiles are described in more detail below.
在除了图像间的预测的可能的之外,上述的图像间的预测机制或编码机制中的每一个可能被图像报头的解码(或等同物,例如具有帧数与先前的片不同的片的解码)破坏。是跨片边界还是跨瓦片边界破坏那些预测机制取决于视频压缩标准和使用的片类型。Each of the above inter-picture prediction mechanisms or coding mechanisms, with the exception of inter-picture prediction, may be violated by decoding of a picture header (or equivalent, e.g., decoding of a slice with a different frame number than the previous slice). Whether those prediction mechanisms are violated across slice boundaries or tile boundaries depends on the video compression standard and the slice type used.
在H.264中,可以相对于运动向量预测、帧内预测、CA-VLC和CABAC状态和H.264标准的其它方面,独立地解码片。仅允许图像间的预测(包括通过运动补偿的片边界之外的像素数据的输入)。虽然该解码独立性增加差错恢复,但是不允许前述跨片边界的预测降低编码效率。In H.264, slices can be decoded independently of motion vector prediction, intra-frame prediction, CA-VLC and CABAC states, and other aspects of the H.264 standard. Only inter-picture prediction (including the input of pixel data beyond slice boundaries via motion compensation) is allowed. While this decoding independence improves error resilience, disabling prediction across slice boundaries reduces coding efficiency.
在H.263中,视频编码器在选择通过使用片或具有非空GOB报头的GOB来破坏哪些预测机制时具有较多的灵活性。例如,存在包括在图像报头中的比特,当使用附件R时可选择该比特,该比特向解码器发信号通知跨片/GOB(具有非空报头)边界没有发生任何预测。跨具有非空报头的GOB并且跨片边界破坏特定(例如,运动向量机制),而不管附件R的状态。由附件R控制其它预测机制。例如,如果未被设置该比特,则运动向量可以指向到与当前参考图像中的片/具有非空报头的GOB共同位于的空间区域的外部,因此潜在地将用于运动补偿的样本值从不处于参考图像中的片/GOB的几何区域内的区域“输入”到当前片。此外,除非附件R是活动的,否则环路滤波可以包括片/GOB外部的样本值。类似地,图像报头中存在另一个启用或禁用帧内预测图像报头。In H.263, video encoders have more flexibility in choosing which prediction mechanisms to override by using slices or GOBs with non-empty headers. For example, there is a bit included in the picture header, selectable when using Annex R, that signals to the decoder that no prediction occurs across slice/GOB (with non-empty header) boundaries. Certain prediction mechanisms (e.g., motion vector mechanisms) are violated across GOBs with non-empty headers and across slice boundaries, regardless of the state of Annex R. Other prediction mechanisms are controlled by Annex R. For example, if this bit is not set, motion vectors can point outside the spatial region co-located with a slice/GOB with non-empty header in the current reference picture, potentially "importing" sample values used for motion compensation from areas outside the geometric region of the slice/GOB in the reference picture into the current slice. Furthermore, unless Annex R is active, loop filtering can include sample values outside of the slice/GOB. Similarly, another picture header exists in the picture header that enables or disables intra prediction.
然而,在多数标准中,以至少一个图像粒度上并且在一些情况中在序列粒度上做出破坏图像预测的决定。换言之,使用H.263作为一个实例,在给定图像中不能混合(分别)启用或禁用去块滤波器的片,也不可能在片等级是启用/禁用帧内预测。However, in most standards, the decision to break picture prediction is made at the granularity of at least one picture, and in some cases at the sequence granularity. In other words, using H.263 as an example, it is not possible to mix slices that have the deblocking filter enabled or disabled (respectively) in a given picture, nor is it possible to enable/disable intra prediction at the slice level.
如已经描述的,图像分割允许把图像分成小于整幅图像的空间区域。虽然用于图像分割的大多数常用的应用看起来与MTU尺寸匹配并且被并行化,但是图像分割还可以用于许多其它的目的,包括使段的尺寸和形状与内容适应的那些目的。感兴趣区域的编码是几个实例中的一个。在该情况,当应用不同编码工具(包括不同预测机制)时,图像的特定部分可能比其它部分(在花费较低数量比特进行编码而产生相似的视觉体验的意义上)被更有效地编码。例如,一些内容可能受益于去块滤波,并且不可能很好地响应帧内预测,然而相同图像中的其它内容无需去块滤波可能被更好地编码,但是能够受益于帧内预测。通过启用去块滤波和帧内预测,可以最好地编码第三内容。当图像被分成瓦片时,所有这些内容可以位于相同的图像中,者例如发生在采访情形或视频会议中。As already described, image segmentation allows an image to be divided into spatial regions that are smaller than the entire image. While the most common applications for image segmentation appear to be matched to the MTU size and parallelized, image segmentation can also be used for many other purposes, including those that adapt the size and shape of the segments to the content. Coding of regions of interest is one of several examples. In this case, when different coding tools (including different prediction mechanisms) are applied, certain parts of the image may be more efficiently encoded than other parts (in the sense of spending a lower number of bits to encode while producing a similar visual experience). For example, some content may benefit from deblocking filtering and may not respond well to intra-frame prediction, while other content in the same image may be better encoded without deblocking filtering, but can benefit from intra-frame prediction. By enabling deblocking filtering and intra-frame prediction, third content can be best encoded. When the image is divided into tiles, all of this content can be located in the same image, which occurs, for example, in an interview situation or video conferencing.
段边界处的预测破坏的现存机制的一个缺点是预测破坏的启用和/或禁用一般被硬编码成现存的视频编码标准,因此难以或不能例如基于编码内容的特性在段边界处选择性地破坏预测机制。One drawback of existing mechanisms for prediction breaking at segment boundaries is that enabling and/or disabling of prediction breaking is generally hard-coded into existing video coding standards, making it difficult or impossible to selectively break the prediction mechanism at segment boundaries, for example based on characteristics of the coded content.
因此,需要一种基于每片单独地或整体地启用或禁用预测和环内滤波机制的改进的方法和系统。因此,期望一种至少部分地解决以上或其它缺点的方案。Therefore, there is a need for an improved method and system for enabling or disabling prediction and in-loop filtering mechanisms individually or globally on a per-slice basis.Therefore, a solution that at least partially addresses the above or other drawbacks is desired.
此外,需要基于每幅图像(或图像组、序列组等)单独地或整体地启用或禁用跨无报头(或等同物)图像段边界(例如瓦片边界)的预测机制和/或环内滤波机制。因此,期望一种至少部分地解决以上或其它缺点的方案。Furthermore, there is a need to enable or disable prediction mechanisms and/or in-loop filtering mechanisms across headerless (or equivalent) picture segment boundaries (e.g., tile boundaries) on a per-picture (or group of pictures, sequence group, etc.) or global basis. Therefore, a solution that at least partially addresses the above and other drawbacks is desired.
发明内容Summary of the Invention
本发明的实施例提供用于编码和/或解码这样一种视频图像的方法和系统,其中在该视频图像中可以选择性地启用或禁用图像段的多个预测和环内滤波工具。Embodiments of the present invention provide methods and systems for encoding and/or decoding a video image in which multiple prediction and in-loop filtering tools for image segments can be selectively enabled or disabled.
根据本发明的一个方案,编码器可以对于一个或多个预测工具指示该工具可以取得来自当前正在被处理的图像段的外部的信息作为用于该图像段中的处理的参考信息。编码器可以提供用于单个预测工具(例如,熵预测、帧内预测、运动补偿预测、运动向量预测,在后文作预测工具)和/或单个滤波工具(例如,自适应内插滤波、自适应环路滤波、去块、滤波、样本自适应偏移,在后文作环路滤波工具)以及其它的该指示。可替换地,编码器可以提供提供用于多个预定义工具或可以包括任何上述预测和环路滤波工具的预定义工具组以及其它的该指示。这样做可以用于支持编码器和解码器的并行化以及特定应用情况,例如软持续呈现(在压缩域中将编码图像拼接在一起)。According to one aspect of the present invention, an encoder can indicate to one or more prediction tools that the tool can access information external to the image segment currently being processed as reference information for processing within that image segment. The encoder can provide this indication for a single prediction tool (e.g., entropy prediction, intra prediction, motion compensated prediction, motion vector prediction, hereinafter referred to as a prediction tool) and/or a single filtering tool (e.g., adaptive interpolation filtering, adaptive loop filtering, deblocking, filtering, sample adaptive offset, hereinafter referred to as a loop filtering tool), as well as other indications. Alternatively, the encoder can provide this indication for multiple predefined tools, or a predefined tool set that can include any of the above prediction and loop filtering tools, as well as other indications. This can be used to support parallelization of encoders and decoders and specific application cases, such as soft persist rendering (stitching coded images together in the compressed domain).
根据本发明的一个方案,当使用无报头图像段(例如,瓦片)时,编码器可以指示预测工具、环路滤波工具或多个预先定义的工具组,而不管该工具可能把跨水平的、垂直的或水平和垂直二者的瓦片边界的信息用作参考信息。According to one aspect of the present invention, when using headerless picture segments (e.g., tiles), an encoder can instruct a prediction tool, a loop filtering tool, or a plurality of predefined groups of tools, regardless of whether the tool may use information across horizontal, vertical, or both horizontal and vertical tile boundaries as reference information.
作为一个实例,在H.264或HEVC的特定情况中,编码器可以设置用于预测和环内滤波工具的“编码中断指示”标志的值,例如:片/瓦片边界外部的帧内预测参考样本值;片/瓦片边界外部(即,通过运动补偿)的向量参考样本值;片/瓦片边界外部的CABAC状态的使用;片/瓦片边界外部的CA-VLC状态的使用;片/瓦片边界外部的PIPE或类似的V2V熵编码状态的使用(仅HEVC);以及环内滤波器(如自适应内插滤波器、自适应环路滤波器、去块环路滤波器或样本自适应偏移)的片/瓦片边界外部的状态和样本值的使用。As an example, in the specific case of H.264 or HEVC, the encoder can set the value of the "coding interruption indication" flag for prediction and in-loop filtering tools, such as: intra-frame prediction reference sample values outside slice/tile boundaries; vector reference sample values outside slice/tile boundaries (i.e., through motion compensation); use of CABAC state outside slice/tile boundaries; use of CA-VLC state outside slice/tile boundaries; use of PIPE or similar V2V entropy coding state outside slice/tile boundaries (HEVC only); and use of states and sample values outside slice/tile boundaries for in-loop filters (such as adaptive interpolation filter, adaptive loop filter, deblocking loop filter, or sample adaptive offset).
根据本发明的一个方案,不以标志的形式而是通过其它数据结构的操作指示编码工具使用或其它启用,例如在一些情况中“编码中断指示”整数可以将多个前述标志或那些标志的优选的置换组合成单个符号。According to one aspect of the invention, use or other enabling of an encoding tool is indicated not in the form of a flag but through the operation of other data structures, e.g., in some cases a "coding interrupt indication" integer may combine multiple aforementioned flags or preferred permutations of those flags into a single symbol.
根据本发明的一个方案,指向片边界外部的运动向量的最大长度可以被编码成整型的合适的熵编码表示,因此不仅指示在长达由所使用的等级所允许的距离上不使用运动补偿,而且指示所允许的最大值,这可以例如助于解码器实现中的资源分配。According to one aspect of the invention, the maximum length of a motion vector pointing outside a slice boundary may be encoded as a suitable entropy coded representation of an integer, thereby indicating not only that motion compensation is not used up to the distance allowed by the used level, but also the maximum value allowed, which may, for example, aid resource allocation in a decoder implementation.
根据本发明的一个方案,可以将前述编码中断指示标志或其它数据编码中断指示结构中的至少一个存储在片报头、图像报头、参数集或等同物中。According to one aspect of the present invention, at least one of the aforementioned encoding interruption indication flags or other data encoding interruption indication structures may be stored in a slice header, a picture header, a parameter set, or the like.
根据本发明的一个方案,解码器可以通过跨片/瓦片边界而不是其它可能合适的边界地破坏所指示的预测工具,对标志或其它数据结构的出现做出反应。According to one aspect of the invention, a decoder may react to the presence of a flag or other data structure by breaking the indicated prediction tool across slice/tile boundaries rather than other boundaries that might be appropriate.
在一个广义的方案中,提供了一种用于解码包括多个段的编码视频图像的方法。对于编码视频图像的不具有相关联的段报头的至少一个段,该方法可以包括:从编码视频图像获得将要被应用于编码视频图像的至少一个预测或环内滤波操作的至少一个指示,并且响应于该至少一个指示,控制至少一个预测或环内滤波操作。在一些情况中,编码视频图像可以包括至少两个没有相关联的段报头的两段。In one broad aspect, a method for decoding a coded video picture comprising a plurality of segments is provided. For at least one segment of the coded video picture that does not have an associated segment header, the method may include obtaining, from the coded video picture, at least one indication of at least one prediction or in-loop filtering operation to be applied to the coded video picture, and controlling the at least one prediction or in-loop filtering operation in response to the at least one indication. In some cases, the coded video picture may include at least two segments that do not have associated segment headers.
在另一个广泛的方面中,提供了一种用于编码包括多个段的视频图像的方法。该方法包括:对于视频图像的不具有相关联的段报头的至少一个段,获得将要被应用于不具有相关联的段报头的至少一段的至少一个预测或环内滤波操作的至少一个指示;并且响应于至少一个指示,在编码视频图像期间控制至少一个预测或环内滤波操作。在一些情况中,视频图像可以包括至少两个没有相关联的段报头的段。In another broad aspect, a method for encoding a video picture comprising a plurality of segments is provided. The method includes: obtaining, for at least one segment of the video picture that does not have an associated segment header, at least one indication of at least one prediction or in-loop filtering operation to be applied to the at least one segment that does not have an associated segment header; and controlling, during encoding of the video picture, the at least one prediction or in-loop filtering operation in response to the at least one indication. In some cases, the video picture may include at least two segments that do not have an associated segment header.
在又另一个广泛的方面中,提供了一种非瞬态的计算机可读介质,其上存储有计算机可执行的指令,该指令用于规划一个或多个处理器以执行编码包括多个段的视频图像的方法。该方法可以包括:对于编码视频图像的不具有相关联的段报头的至少一个段,从编码视频图像获取将要被应用于编码视频图像的至少一个预测或环内滤波操作的至少一个指示,且响应于至少一个指示,控制至少一个预测或环内滤波操作。在一些情况中,编码视频图像可以包括至少两个没有相关联的段报头的段。In yet another broad aspect, a non-transitory computer-readable medium having computer-executable instructions stored thereon is provided for programming one or more processors to perform a method for encoding a video image comprising a plurality of segments. The method may include, for at least one segment of the encoded video image that does not have an associated segment header, obtaining from the encoded video image at least one indication of at least one prediction or in-loop filtering operation to be applied to the encoded video image, and controlling the at least one prediction or in-loop filtering operation in response to the at least one indication. In some cases, the encoded video image may include at least two segments that do not have an associated segment header.
在又另一个广泛的方面中,提供了一种非瞬态的计算机可读介质,其上存储有计算机可执行的指令,该指令用于规划一个或多个处理器以执行编码包括多个段的视频图像的方法。该方法可以包括:对于视频图像的不具有相关联的段报头的至少一个段,获得将要被应用于不具有相关联的段报头的至少一段的至少一个预测或环内滤波操作的至少一个指示,且响应于至少一个指示,在编码视频图像期间控制至少一个预测或环内滤波操作。在一些情况中,视频图像可以包括至少两个没有相关联的段报头的段。In yet another broad aspect, a non-transitory computer-readable medium having computer-executable instructions stored thereon is provided for programming one or more processors to perform a method for encoding a video image comprising a plurality of segments. The method may include, for at least one segment of the video image that does not have an associated segment header, obtaining at least one indication of at least one prediction or in-loop filtering operation to be applied to the at least one segment that does not have an associated segment header, and controlling the at least one prediction or in-loop filtering operation during encoding of the video image in response to the at least one indication. In some cases, the video image may include at least two segments that do not have an associated segment header.
在又另一个广义的方案中,提供数据处理系统,该数据处理系统包括被配置为执行用于编码包括多个段的视频图像的方法的处理器和加速器硬件中的至少一个。该方法可以包括:对于编码视频图像的不具有相关联的段报头的至少一个段,从编码视频图像获得将要被应用于编码视频图像的至少一个预测或环内滤波操作的至少一个指示,并且响应于至少一个指示,从而控制至少一个预测或环内滤波操作。在一些情况中,编码视频图像可以包括至少两个没有相关联的段报头的段。In yet another broad aspect, a data processing system is provided that includes at least one of a processor and accelerator hardware configured to perform a method for encoding a video image comprising a plurality of segments. The method may include, for at least one segment of the encoded video image that does not have an associated segment header, obtaining from the encoded video image at least one indication of at least one prediction or in-loop filtering operation to be applied to the encoded video image, and controlling the at least one prediction or in-loop filtering operation in response to the at least one indication. In some cases, the encoded video image may include at least two segments that do not have an associated segment header.
在又另一个广义的方案中,提供数据处理系统,该数据处理系统包括被配置为执行用于编码包括多个段的视频图像的方法的处理器和加速器硬件中的至少一个。该方法可以包括:对于视频图像的不具有相关联的段报头的至少一个段,获得将要被应用于不具有相关联的段报头的至少一段的至少一个预测或环内滤波操作的至少一个指示,并且响应于至少一个指示,从而在编码视频图像期间控制至少一个预测或环内滤波操作。在一些情况中,视频图像可以包括至少两个没有相关联的段报头的段。In yet another broad aspect, a data processing system is provided that includes at least one of a processor and accelerator hardware configured to perform a method for encoding a video image comprising a plurality of segments. The method may include, for at least one segment of the video image that does not have an associated segment header, obtaining at least one indication of at least one prediction or in-loop filtering operation to be applied to the at least one segment that does not have an associated segment header, and controlling the at least one prediction or in-loop filtering operation during encoding of the video image in response to the at least one indication. In some cases, the video image may include at least two segments that do not have an associated segment header.
在一些实施例中,根据以上方案中的任何一个方案,至少一个预测或环内滤波操作可以包括熵预测、帧内预测、运动向量预测、运动补偿预测、自适应环路滤波、自适应内插滤波、去块滤波或样本自适应偏移中的至少一个。In some embodiments, according to any of the above schemes, at least one prediction or in-loop filtering operation may include at least one of entropy prediction, intra-frame prediction, motion vector prediction, motion compensated prediction, adaptive loop filtering, adaptive interpolation filtering, deblocking filtering or sample adaptive offset.
在一些实施例中,根据以上方案中的任何一个方案,可以从至少一个组合指示获得多个指示中的至少一个方案。In some embodiments, according to any of the above aspects, at least one aspect of the plurality of indications may be obtained from at least one combined indication.
在一些实施例中,根据以上方案中的任何一个方案,至少一个指示可以被编码为用于指示运动向量的最大长度的向量。In some embodiments, according to any of the above schemes, the at least one indication may be encoded as a vector indicating a maximum length of a motion vector.
在一些实施例中,根据以上方案中的任何一个方案,可以将至少一个指示编码到参数集中。In some embodiments, at least one indication may be encoded into a parameter set according to any of the above aspects.
根据本发明的进一步方案,提供了一种装置(例如,数据处理系统)、一种用于改变该装置的方法以及制品(例如,其上记录和/或存储有用于执行本文描述的任何方法的程序指令的非瞬态的计算机可读介质或产品)。According to a further aspect of the present invention, a device (e.g., a data processing system), a method for changing the device, and an article of manufacture (e.g., a non-transitory computer-readable medium or product having recorded and/or stored thereon program instructions for executing any method described herein) are provided.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
本发明的实施例的各种特征和优点将从结合附图所进行的下文的详细描述变得显而易见,其中:Various features and advantages of embodiments of the present invention will become apparent from the following detailed description taken in conjunction with the accompanying drawings, in which:
图1是示出了根据本发明的一个实施例的具有扫描次序片的示例性图像和表示编码图像的比特流的图示;1 is a diagram showing an exemplary picture with scan order slices and a bitstream representing an encoded picture according to one embodiment of the present invention;
图2是示出了根据本发明的一个实施例的瓦片和片的图示;FIG2 is a diagram showing tiles and slices according to one embodiment of the present invention;
图3是示出了根据本发明的一个实施例的编码比特流的框图;FIG3 is a block diagram illustrating an encoded bit stream according to an embodiment of the present invention;
图4是示出了根据本发明的一个实施例的编码比特流的框图;FIG4 is a block diagram illustrating an encoded bit stream according to an embodiment of the present invention;
图5是示出了根据本发明的一个实施例的编码比特流的框图;FIG5 is a block diagram illustrating an encoded bit stream according to an embodiment of the present invention;
图6是示出了根据本发明的一个实施例的示例性解码器的操作的流程图;FIG6 is a flow chart illustrating the operation of an exemplary decoder according to one embodiment of the present invention;
图7是示出了根据本发明的一个实施例当解码片时示例性解码器的操作的流程图;7 is a flowchart illustrating the operation of an exemplary decoder when decoding slices according to one embodiment of the present invention;
图8是示出了根据本发明的实施例的基于数据处理系统(例如,个人计算机(“PC”))实现的框图;8 is a block diagram illustrating an implementation based on a data processing system (eg, a personal computer ("PC")) according to an embodiment of the present invention;
应当注意的是,在整个附图中,由相同的附图标记识别相同的特征。It should be noted that throughout the drawings, like features are identified by like reference numerals.
具体实施方式DETAILED DESCRIPTION
在下文中,阐述了细节以提供本发明的理解。在一些实例中,未详细地描述或显示特定的软件、电路、结构和方法,以免模糊本发明。在本文中使用术语“数据处理系统”来指代用于处理数据的任意机器,包括本文所述的计算机系统、无线设备和网络配置。可以用任意计算机编程语言来实现本发明的实施方式,只要数据处理系统的操作系统提供可以支持这些实施方式的要求的设施。也可以用硬件或用硬件和软件的组合实现本发明的实施方式。In the following, details are set forth to provide an understanding of the present invention. In some instances, specific software, circuits, structures, and methods are not described or shown in detail to avoid obscuring the present invention. The term "data processing system" is used herein to refer to any machine for processing data, including the computer systems, wireless devices, and network configurations described herein. Embodiments of the present invention can be implemented in any computer programming language, as long as the operating system of the data processing system provides facilities that can support the requirements of these embodiments. Embodiments of the present invention can also be implemented in hardware or a combination of hardware and software.
本发明的至少一些实施例涉及结合同视频压缩中的图像分割,选择性地破坏预测机制并且/或者选择性地禁用环内滤波机制。At least some embodiments of the present invention are directed to selectively disrupting prediction mechanisms and/or selectively disabling in-loop filtering mechanisms in conjunction with image segmentation in video compression.
下文使用术语如“段”或“图像段”来指代比整个图像更小的任何一个或多个宏块或等同物(例如,WD4中的树块),并且在“段”或“图像段”边界处,破坏至少一种形式的预测并且/或者禁用至少一种形式的环内滤波。如下所述的类H.264的片以及类WD4的瓦片)(其中tile_boundary_independence_idc等于1)是段的非限制性实例。Terms such as "segment" or "picture segment" are used below to refer to any one or more macroblocks or equivalent (e.g., treeblocks in WD4) that are smaller than an entire picture, and at a "segment" or "picture segment" boundary, at least one form of prediction is broken and/or at least one form of in-loop filtering is disabled. H.264-like slices and WD4-like tiles (where tile_boundary_independence_idc is equal to 1) as described below are non-limiting examples of segments.
图2示出了实例200,其中,图像201被描绘为黑体实线的垂直分割的瓦片边界204分成两个瓦片202、203。瓦片可以在图像201内与片共存。例如,图像201如同被瓦片边界204分割成瓦片202、203一样,同时被片边界205分割成两个片。如WD4中所述的瓦片(tile_boundary_independence_idc等于1)可以在一个或多个方案中概括另一种类型的被称为列的图像段,其中在于2011年12月23日提交的、题目为“METHOD AND SYSTEM FORSELECTIVELY BREAKING PREDICTION IN VIDEO CODING”的美国专利申请NO.13/336,675中进一步描述了列,通过引用方式将其全部内容并入本文中。FIG2 illustrates an example 200 in which an image 201 is divided into two tiles 202 and 203 by a vertically dividing tile boundary 204 depicted as a solid bold line. Tiles can coexist with slices within image 201. For example, image 201 can be divided into two slices by slice boundary 205, just as it is divided into tiles 202 and 203 by tile boundary 204. Tiles (tile_boundary_independence_idc equal to 1) as described in WD4 can, in one or more schemes, generalize to another type of image segment called a column, which is further described in U.S. patent application Ser. No. 13/336,675, filed Dec. 23, 2011, entitled “METHOD AND SYSTEM FOR SELECTIVELY BREAKING PREDICTION IN VIDEO CODING,” which is incorporated herein by reference in its entirety.
对应于图像201中的传输的比特流206可以包括例如包括用于识别瓦片边界204的瓦片边界信息208的参数集207或其它高级语法元素。然而,比特流的除了参数集207之外的部分不包括关于瓦片边界的任何信息。解码器可以通过关联当前正在处理的宏块的内部状态信息和从参数集207知道的瓦片尺寸的信息,识别编码宏块(还称为最大编码单元(LCU)或(WD4中的)树块)所属于的瓦片。The transmitted bitstream 206 corresponding to the image 201 may include, for example, a parameter set 207 or other high-level syntax elements including tile boundary information 208 for identifying tile boundaries 204. However, portions of the bitstream other than the parameter set 207 do not include any information about tile boundaries. The decoder can identify the tile to which a coded macroblock (also called a largest coding unit (LCU) or a treeblock (in WD4)) belongs by associating internal state information of the macroblock currently being processed with information about the tile size known from the parameter set 207.
瓦片和其它矩形图像分割机制(例如,矩形片(ITU-T Rec.H.263的附件K的子模式))之间的一个区别在于瓦片(与矩形片不同)不需要报头。在不包括报头的情况下,可以改为在参数集中定义瓦片的物理尺寸。在一些情况(tile_boundary_independence_idc等于1)下,根据WD4的瓦片边界中断所有图像内的预测机制,但是允许参考参考图像中的这样一种样本,其中该样本跟与正在被执行运动补偿的瓦片相关联的样本处于不同位置。此外,瓦片边界不中断包括去块滤波器、样本自适应偏移滤波器和自适应环路滤波器的环内滤波。One difference between tiles and other rectangular picture partitioning mechanisms, such as rectangular slices (a submode of Annex K of ITU-T Rec. H.263), is that tiles (unlike rectangular slices) do not require a header. In the case where a header is not included, the physical size of the tile may instead be defined in a parameter set. In some cases (tile_boundary_independence_idc is equal to 1), tile boundaries according to WD4 interrupt all intra-picture prediction mechanisms, but allow referencing a sample in a reference picture that is at a different location than the sample associated with the tile for which motion compensation is being performed. Furthermore, tile boundaries do not interrupt in-loop filtering including the deblocking filter, sample adaptive offset filter, and adaptive loop filter.
然而,编码器或解码器使用瓦片破坏不同的预测机制集合也可能是方便的或期望的。例如,在极高分辨率的情况下,将视频图像分为受制于这样一种要求的瓦片是有利的,其中在该要求中不允许运动向量指向瓦片边界外部,并且/或者编码器和解码器将瓦片边界视为图像边界(类似于H.263的附件R)或类似地,因此,例如,避免不仅跨瓦片边界的运动补偿而且避免环内滤波。However, it may also be convenient or desirable for an encoder or decoder to use tiles to break up different sets of prediction mechanisms. For example, at very high resolutions, it may be advantageous to divide the video image into tiles that are subject to a requirement that motion vectors are not allowed to point outside tile boundaries and/or the encoder and decoder treat tile boundaries as picture boundaries (similar to Annex R of H.263) or similar, thus, for example, avoiding not only motion compensation across tile boundaries but also in-loop filtering.
在其它情况下,编码器或解码器能够处理除了符号的熵编码之外的全分辨率视频编码可能是方便的或期望的。该编码器或解码器可能例如包括基于样本处理的专用信号处理硬件,但是在单核不能够处理负载(已知在HEVC中特别是CABAC熵编码的计算要求高)的情况下可以将通用多核CPU用于熵编码和/或解码。因此,为了支持该使用情况,可能需要在瓦片边界处破坏熵编码,然而其它图像内或图像间的预测机制可能能够跨片和/或瓦片边界。In other cases, it may be convenient or desirable for the encoder or decoder to be able to handle full-resolution video encoding in addition to entropy coding of symbols. The encoder or decoder may, for example, include dedicated signal processing hardware based on sample processing, but a general-purpose multi-core CPU may be used for entropy encoding and/or decoding in cases where a single core is unable to handle the load (given that CABAC entropy coding in HEVC is computationally demanding, in particular). Therefore, to support this use case, entropy coding may need to be broken at tile boundaries, whereas other intra- or inter-image prediction mechanisms may be able to cross slice and/or tile boundaries.
在其它情况中,编码器或解码器允许跨瓦片边界的有限的处理器间协调可能是方便的或期望的。在该情况中,对像素值的参考是不可能的,然而在处理器之间的通信信道可获得对控制信息(例如运动向量预测所必须的信息)的参考。在该情况中,帧内预测是不可能的,但是可以使用运动向量预测。In other cases, it may be convenient or desirable for the encoder or decoder to allow limited inter-processor coordination across tile boundaries. In this case, reference to pixel values is not possible, but the communication channel between the processors can obtain reference to control information (such as information necessary for motion vector prediction). In this case, intra prediction is not possible, but motion vector prediction can be used.
可能存在这样一种编码工具,其中该编码工具不直接地与预测相关但是仍然可以有利地跨片或瓦片边界地中断该编码工具。例如,于2011年11月1日提交的、题目为“ADAPTIVE INTERPOLATION IN DIGITAL VIDEO CODING”的共同未决的美国专利申请No.13/286,828公开了自适应内插滤波器,其中编码器可选择该自适应内插滤波器属性和系数,其中通过引用方式将该申请的全部内容并入本文中。限制将片外部的样本用于内插滤波是有利的。同样地,WD4包括自适应内插滤波器,其中至少部分地从特定像素获得该自适应内插滤波器的控制。限于仅在片或瓦片边界内的获得像素是有利的。(与滤波器控制信息的获得相反)将滤波本身仅限于片或瓦片边界内的像素也是有利的。WD4还包括其它环路滤波器,例如自适应环路滤波器(与滤波所有样本有关)、去块滤波器(与滤波块边界有关)和被称作自适应样本偏移的滤波机制。所有这些滤波器均可以共享与AIF相同的性能。例如,在如WD4中指定的自适应环路滤波器的情况下,(可能独立地)禁用到用于跨瓦片边界地获得滤波器抽头的信息的通路并且禁用跨瓦片边界自身的过滤是有利的。There may be a coding tool that is not directly related to prediction but that can still be advantageously interrupted across slice or tile boundaries. For example, co-pending U.S. patent application No. 13/286,828, filed on November 1, 2011, entitled "ADAPTIVE INTERPOLATION IN DIGITAL VIDEO CODING," which is incorporated herein by reference in its entirety, discloses an adaptive interpolation filter whose properties and coefficients can be selected by an encoder. It is advantageous to limit the use of samples outside the slice for interpolation filtering. Similarly, WD4 includes an adaptive interpolation filter where control of the adaptive interpolation filter is at least partially derived from specific pixels. It is advantageous to limit the acquisition to pixels only within the slice or tile boundaries. It is also advantageous to limit the filtering itself (as opposed to the acquisition of filter control information) to pixels within the slice or tile boundaries. WD4 also includes other loop filters, such as an adaptive loop filter (related to filtering all samples), a deblocking filter (related to filtering block boundaries), and a filtering mechanism called adaptive sample offset. All of these filters can share the same performance as AIF. For example, in the case of an adaptive loop filter as specified in WD4, it is advantageous to (possibly independently) disable the access to the information for obtaining filter taps across tile boundaries and disable filtering across tile boundaries themselves.
可以由图像(或更高)等级语法结构(例如,当使用WD4瓦片时的参数集)、由段报头信息(例如,H.263附件K的矩形片)、由比特流中的段报头的布置和编码器/解码器状态的组合(例如,当未使用灵活的宏块排序时的H.264片)或两个或更多个前述机制的组合(即,FMO定义片组,并且在片组内通过比特流中片报头布置(通过其地址识别片的第一宏块)和片组内的宏块地址隐含的优势的组合定义图像段,直到通过比特流解析或其它手段检测到片的结尾为止)对段边界进行定义。Segment boundaries may be defined by picture (or higher) level syntax structures (e.g., parameter sets when WD4 tiles are used), by segment header information (e.g., rectangular slices of H.263 Annex K), by a combination of the arrangement of segment headers in the bitstream and encoder/decoder state (e.g., H.264 slices when flexible macroblock ordering is not used), or by a combination of two or more of the foregoing mechanisms (i.e., FMO defines slice groups, and within a slice group, picture segments are defined by a combination of the arrangement of slice headers in the bitstream (identifying the first macroblock of a slice by its address) and the implicit dominance of macroblock addresses within the slice group, until the end of the slice is detected by bitstream parsing or other means).
现在描述允许预测工具选择瓦片边界的第一机制,紧接着是允许预测工具选择片边界的机制。最后,描述了两个机制的互作用。We now describe the first mechanism that allows the prediction tool to select tile boundaries, followed by the mechanism that allows the prediction tool to select slice boundaries. Finally, we describe the interaction of the two mechanisms.
参考图3中的实例300,图3示出了包括参数集302和两个编码片304、305的编码比特流301。编码片304、305可以属于一个或两个编码图像。在WD4中,可以由具有LCU地址0的片报头识别图像边界。参数集302可以包括瓦片控制信息303(例如瓦片边界),并且在该实例中假设参数集302中的信息涉及两个编码片(即,片报头中的参数集参考包括相同的索引)。在许多基于WD4和H.264系统中,参数集涉及数十、数百或更多片。Referring to example 300 in FIG3 , FIG3 shows a coded bitstream 301 including a parameter set 302 and two coded slices 304, 305. The coded slices 304, 305 may belong to one or two coded pictures. In WD4, picture boundaries may be identified by a slice header with an LCU address of 0. The parameter set 302 may include tile control information 303 (e.g., tile boundaries), and in this example it is assumed that the information in the parameter set 302 relates to two coded slices (i.e., the parameter set reference in the slice header includes the same index). In many WD4 and H.264 based systems, parameter sets relate to tens, hundreds, or more slices.
根据一个实施例,参数集302可以包括多个预测工具指示标志(PTI)。例如,当设置(即启用)PTI时,可以允许跨段边界的预测而不管哪个编码或解码工具与标志相关联;否则当PTI未被设置(即禁用)时,该预测可能被禁止。可以例如对于熵编码预测306、帧内预测307、运动向量预测308、运动补偿预测309、自适应环路滤波310、自适应内插滤波311、去块滤波312、样本自适应偏移313以及有可能视频编码机制中定义的其它预测和环内滤波工具,定义标志。According to one embodiment, parameter set 302 may include multiple prediction tool indication flags (PTIs). For example, when PTI is set (i.e., enabled), prediction across segment boundaries may be allowed regardless of which encoding or decoding tool is associated with the flag; otherwise, when PTI is not set (i.e., disabled), such prediction may be prohibited. Flags may be defined, for example, for entropy coded prediction 306, intra-frame prediction 307, motion vector prediction 308, motion compensated prediction 309, adaptive loop filtering 310, adaptive interpolation filtering 311, deblocking filtering 312, sample adaptive offset 313, and possibly other prediction and in-loop filtering tools defined in the video coding mechanism.
对于涉及对该参数集进行参考的所有片和图像的单独的预测和环内滤波机制包括PTI可能有助于使比特流适合于编码和/或解码环境如编码器或解码器的硬件架构。由于标志可能是可以应用于许多片或图像的参数集的一部分,所以与参数集中PTI的提供的利益相比,参数集中PTI的开销可以忽略不计。Including PTI for separate prediction and in-loop filtering mechanisms for all slices and pictures that refer to the parameter set may help adapt the bitstream to the encoding and/or decoding environment, such as the hardware architecture of the encoder or decoder. Since the flag may be part of a parameter set that can be applied to many slices or pictures, the overhead of PTI in the parameter set may be negligible compared to the benefits provided by PTI.
参考图4中描绘的实例400,图4示出了编码比特流401,编码比特流401包括参数集402和包括两个片403、404的编码图像。每个片从报头405、406开始。片报头405被放大以示出其信息的部分。Referring to the example 400 depicted in Figure 4, there is shown a coded bitstream 401 comprising a parameter set 402 and a coded picture comprising two slices 403, 404. Each slice starts with a header 405, 406. The slice header 405 is enlarged to show part of its information.
根据实施例,片报头405可以包括多个预测工具指示标志(PTI)。例如,当设置一个或多个PTI时,可以允许跨段边界的预测和/或环内滤波,而不管哪个编码或解码工具与标志相关联;否则当被设置(即禁用)PTI时,可以禁止该预测。可以对于例如熵预测407、帧内预测408、运动向量预测409、运动补偿预测410、自适应环路滤波411、自适应内插滤波412、去块滤波413、样本自适应偏移414以及有可能视频编码机制中定义的的其它预测和环内滤波工具定义标志。According to an embodiment, the slice header 405 may include multiple prediction tool indicator flags (PTIs). For example, when one or more PTIs are set, prediction and/or in-loop filtering across segment boundaries may be allowed, regardless of which encoding or decoding tool is associated with the flag; otherwise, when the PTI is set (i.e., disabled), the prediction may be prohibited. Flags may be defined for, for example, entropy prediction 407, intra-frame prediction 408, motion vector prediction 409, motion compensated prediction 410, adaptive loop filtering 411, adaptive interpolation filtering 412, deblocking filtering 413, sample adaptive offset 414, and possibly other prediction and in-loop filtering tools defined in the video coding mechanism.
对于涉及给定片的单独的预测和环内滤波机制包括PTI,可以有助于使比特流适应内容,因此提高编码效率。Including PTI for separate prediction and in-loop filtering mechanisms involving a given slice may help adapt the bitstream to the content, thus improving coding efficiency.
现在描述上述的两个机制可以如何交互。We now describe how the two mechanisms described above can interact.
参考图5中示出的实例500,图5示出了包括参数集502和两个片503、504的编码比特流501,两个片503、504中的每个片从对应的片报头505、506开始。Referring to the example 500 shown in FIG. 5 , FIG. 5 shows an encoded bitstream 501 comprising a parameter set 502 and two slices 503 , 504 , each of the two slices 503 , 504 starting from a corresponding slice header 505 , 506 .
在507处被放大示出的参数集502包括例如瓦片控制信息508或与无报头段边界相关的其它信息,例如,该边界能够指示如图2所示的垂直瓦片边界204。此外,参数集502可能包括一个或多个PTI。这里示出了三个PTI,一个PTI与熵预测509相关联,一个PTI与帧内预测510相关联并且一个PTI与运动补偿511相关联。这些标志可以控制瓦片边界204处的解码器预测。可以例如由瓦片控制信息508设置瓦片边界204,使得图像201被垂直分成两个瓦片202、203。这里描述的机制还可以用于瓦片边界的其它配置(包括垂直和水平边界的组合)。The parameter set 502, shown enlarged at 507, includes, for example, tile control information 508 or other information related to a headerless segment boundary, which can, for example, indicate a vertical tile boundary 204 as shown in FIG. 2 . Furthermore, the parameter set 502 may include one or more PTIs. Three PTIs are shown here, one associated with entropy prediction 509, one associated with intra-frame prediction 510, and one associated with motion compensation 511. These flags can control decoder prediction at the tile boundary 204. The tile boundary 204 can be set, for example, by the tile control information 508 so that the image 201 is vertically divided into two tiles 202, 203. The mechanisms described herein can also be used for other configurations of tile boundaries (including combinations of vertical and horizontal boundaries).
编码图像还可以例如包括两个编码片503、504,每个编码片从对应的片报头505、506开始。如图2所示,对应于编码片503、504的(未编码)片可能例如分别包括宏块地址1到14和15到24的空间区域。片报头506被放大示出在512处,并且可能包括多个PTI。示出了两个PTI,一个PTI与帧内预测513相关联,另一个PTI与自适应环路滤波(ALF)514相关联。然而,应当注意的是,可能存在但不要求的参数集502或片报头506的PTI之间的重叠。The coded image may also include, for example, two coded slices 503 and 504, each beginning with a corresponding slice header 505 and 506. As shown in FIG2, the (uncoded) slices corresponding to the coded slices 503 and 504 may, for example, include spatial regions of macroblock addresses 1 to 14 and 15 to 24, respectively. The slice header 506 is shown enlarged at 512 and may include multiple PTIs. Two PTIs are shown, one associated with intra-frame prediction 513 and the other associated with adaptive loop filtering (ALF) 514. However, it should be noted that overlap between the parameter sets 502 or the PTIs of the slice header 506 may exist but is not required.
根据实施例,参数集502的PTI 509、510、511控制由瓦片控制信息508定义的跨瓦片边界204的预测和环内滤波。According to an embodiment, the PTIs 509 , 510 , 511 of the parameter set 502 control prediction and in-loop filtering across tile boundaries 204 defined by the tile control information 508 .
根据一个实施例,片报头512的PTI 513、514控制跨片503、504之间的边界的预测和环内滤波。例如,片504的片边界具有除了由虚线粗体片边界线205标记的图像边界之外的一个边界。According to one embodiment, the PTI 513, 514 of the slice header 512 controls prediction and in-loop filtering across the boundary between slices 503, 504. For example, the slice boundary of slice 504 has a boundary in addition to the picture boundary marked by the dashed bold slice boundary line 205.
因此,在实例200中,瓦片边界中断一些预测和环内滤波机制(以便于允许图像编码分布在几个处理器之间),同时在片报头506的控制下在片边界处选择性地中断其它预测和环内滤波机制(因此通过破坏预测和环内滤波机制来给出编码器全控制,以便于针对编码的内容能够选择任何特定的预测和环内滤波机制的组合,包括对给定的应用或用途是期望的或方便的组合)。Thus, in example 200, some prediction and in-loop filtering mechanisms are interrupted at tile boundaries (so as to allow image encoding to be distributed among several processors), while other prediction and in-loop filtering mechanisms are selectively interrupted at slice boundaries under the control of the slice header 506 (thereby giving the encoder full control by breaking up the prediction and in-loop filtering mechanisms so that any particular combination of prediction and in-loop filtering mechanisms can be selected for the content being encoded, including combinations that are desirable or convenient for a given application or use).
如果涉及相同预测或环内滤波机制的PTI出现在参数集502和片报头506中,并且在对应的瓦片和片边界是对齐的情况下,则至少两个解码器反应是可能的。可以通过配置文件/等级选择,使得该选择在标志中被静态地指定,或者基于基于参数集中的控制信息或其它高级语法内容使得该选择在标志中被动态地指定。If PTIs involving the same prediction or in-loop filtering mechanism appear in the parameter set 502 and the slice header 506, and if the corresponding tile and slice boundaries are aligned, then at least two decoder reactions are possible. This can be selected by the profile/level so that the selection is statically specified in the flags, or dynamically specified in the flags based on control information in the parameter set or other high-level syntax content.
一个选择是参数集502中的PTI覆盖片报头506中抵触的信息。该选择可以具有使解码器确定能够向各个处理器或内核分配段而无需实现用于允许在这些段之间共享信息的机制。One option is for the PTI in the parameter set 502 to override the conflicting information in the slice header 506. This option may have the effect of enabling the decoder to determine that segments can be allocated to individual processors or cores without having to implement a mechanism to allow information to be shared between these segments.
另一个选择是片报头508中的PTI覆盖参数集502中抵触的信息。该选择可以在它的工具的选择中允许更大的编码器灵活性。其它的反应也是可能的。Another option is for the PTI in the slice header 508 to override the conflicting information in the parameter set 502. This option may allow the encoder greater flexibility in its choice of tools. Other responses are also possible.
(如果优化编码标志位于片报头508或参数集502中,则)为了优化编码标志,在一些情况中,指定下面的任何标准是有利的:To optimize the coding flag (if the optimized coding flag is in the slice header 508 or parameter set 502), in some cases it may be advantageous to specify any of the following criteria:
(1)如果指示了特定类和/或等级,则一些PTI可能不是参数集或片报头的一部分,因为在该类和/或等级中,预测或环内滤波工具是不可用的。(1) If a specific class and/or level is indicated, some PTIs may not be part of the parameter set or slice header because in that class and/or level, prediction or in-loop filtering tools are not available.
(2)如果例如在特定配置文件中确定无需或者甚至不希望用于独立地接通/断开这些单独的PTI的灵活性,则将两个或多个PTI“捆绑”成单个组合PTI。[93](3)在一些情况中,PTI可能最好不被编码成布尔型(即,二进制)参数。例如,可以至少部分地由这样一种运动向量的长度确定运动补偿的情况中对处理器间协调的需求,其中该运动向量指向由片或瓦片覆盖的共置的空间区域外部。因此,在一个实施例中,PTI信息还可能被编码成整数或其它非布尔型参数,因此指示适当预测值范围,例如指向段边界外部的运动向量的最大长度。(2) If, for example, it is determined in a particular profile that the flexibility to independently switch these individual PTIs on/off is not needed or even desired, then two or more PTIs are "bundled" into a single combined PTI. [93] (3) In some cases, it may be better not to encode the PTI as a Boolean (i.e., binary) parameter. For example, the need for inter-processor coordination in the case of motion compensation may be determined at least in part by the length of a motion vector that points outside the co-located spatial region covered by a slice or tile. Therefore, in one embodiment, the PTI information may also be encoded as an integer or other non-Boolean parameter, thereby indicating an appropriate prediction value range, such as the maximum length of a motion vector that points outside a segment boundary.
(4)在一些情况中,PTI值可能不需要在物理意义上出现在比特流中,因为可以从比特流的其它特性获得它们的值。例如,片内可能不需要包括与运动补偿相关的PTI,因为根据由标准进行的设计,运动补偿不可能发生在片内。(4) In some cases, PTI values may not need to be physically present in the bitstream, as their values can be derived from other characteristics of the bitstream. For example, PTI related to motion compensation may not need to be included within a slice, as motion compensation cannot occur within a slice, as designed by the standard.
现在描述根据实施例可能适用于PTI信息的前述任何配置的编码器的操作。The operation of an encoder according to an embodiment, which may be applicable to any of the aforementioned configurations of PTI information, will now be described.
参考图6,在一个实施例中,编码器可能根据流程图600进行操作。在编码视频序列的第一片之前,编码器可能确定(601)用于PTI的与序列相关的设置以及图像序列中的视频图像的瓦片布局。该确定可以考虑编码器的硬件架构、解码器的硬件架构、由硬件架构建议或指示的可能的瓦片布局、关于传输网络(如果有的话)的知识,例如MTU的尺寸等等。在一些情况下,可能由编码器在该确定中可以考虑的系统等级标准批准PTI值。例如,未来的数字TV标准可设想需要将控制跨瓦片边界的预测和环内滤波的特定瓦片布局和特定PTI设置用于特定(更高)分辨率,以使得成本有效的多处理器/多内核实现成为可能。可能仅需要在序列等级上确定所有PTI的子集。6 , in one embodiment, an encoder may operate according to a flowchart 600. Before encoding the first slice of a video sequence, the encoder may determine (601) sequence-dependent settings for PTI and tile layout of video images in the image sequence. This determination may take into account the hardware architecture of the encoder, the hardware architecture of the decoder, possible tile layouts suggested or indicated by the hardware architecture, knowledge about the transmission network (if any), such as the size of the MTU, and the like. In some cases, the PTI value may be sanctioned by a system-level standard that the encoder may take into account in this determination. For example, future digital TV standards may envision the need for specific tile layouts and specific PTI settings that control prediction and in-loop filtering across tile boundaries for specific (higher) resolutions to enable cost-effective multi-processor/multi-core implementations. It may be necessary to determine only a subset of all PTIs at the sequence level.
之前已经描述了那些设置的几个选择。Several options for those settings have been described previously.
在确定之后,编码器可以(602)将与序列相关的PTI编码到合适的高级语法结构例如序列或图像参数集、序列、GOP或图像报头中。编码器还可以(通过视频编码标准的语法结构)选择在该编码期间不定义PTI。After determining, the encoder can (602) encode the PTI associated with the sequence into a suitable high-level syntax structure such as a sequence or picture parameter set, sequence, GOP or picture header. The encoder can also choose not to define PTI during the encoding (through the syntax structure of the video coding standard).
与序列相关的PTI可以对于至少一个完整视频图像保持恒定(除非被基于片报头的PTI被覆盖,如稍后描述的),但是在许多情况下可以对于至少一个“序列”(视频流中的两个IDR图像和领先的IDR图像之间的所有图像)并且可能在全视频编码会话期间保持恒定。例如,可以至少部分地由在会话期间不可能改变硬件限制来驱动与序列相关的PTI。自此以后为了方便起见假设该后一种的情况。The sequence-dependent PTI can remain constant for at least one complete video picture (unless overridden by a slice header-based PTI, as described later), but in many cases can remain constant for at least one "sequence" (all pictures between two IDR pictures and the preceding IDR picture in the video stream) and possibly during the entire video encoding session. For example, the sequence-dependent PTI can be driven, at least in part, by hardware limitations that cannot be changed during a session. This latter case is assumed hereafter for convenience.
编码器通过对片进行编码来继续。为了这样做,编码器可以确定(603)片等级PTI,其可以跟如已经描述的与序列相关的PTI交互。片等级PTI可以被编码成(604)片报头的编码的一部分。The encoder continues by encoding the slice. To do so, the encoder may determine (603) a slice-level PTI, which may interact with the sequence-related PTI as already described. The slice-level PTI may be encoded as part of the encoding of the (604) slice header.
然后可以根据正在应用哪个编码标准(例如,WD4或H.264),同时考虑跨如由PTI指示的片和瓦片边界的预测和/或环内滤波机制的破坏,编码(605)该片。The slice may then be encoded (605) depending on which coding standard is being applied (eg, WD4 or H.264), while taking into account the breakdown of prediction and/or in-loop filtering mechanisms across slice and tile boundaries as indicated by the PTI.
对下一个片继续(606)编码。The encoding continues (606) with the next slice.
现在描述根据实施例可适用于前述任何的PTI信息配置的解码器的操作。The operation of a decoder according to an embodiment applicable to any of the aforementioned PTI information configurations will now be described.
图7是可用于本发明的一个实施例的解码器的流程图700。解码器可能接收(701)来自比特流的NAL单元,并且确定NAL单元的类型。如果NAL单元类型用于指示参数集(702),则解码器可能根据所采用的视频编码标准来执行参数集解析和存储(703)(其它高级语法结构(例如,序列、GOP或图像报头)还可以用于该目的)。FIG7 is a flowchart 700 of a decoder that may be used in accordance with one embodiment of the present invention. The decoder may receive (701) a NAL unit from a bitstream and determine the type of the NAL unit. If the NAL unit type indicates a parameter set (702), the decoder may perform parameter set parsing and storage (703) according to the employed video coding standard (other high-level syntax structures, such as sequence, GOP, or picture headers, may also be used for this purpose).
如果NAL单元类型用于指示片数据(704)(未描述其它情况),则解码器可能解析片报头(705)并且然后根据其中编码的信息如PTI信息进行响应。例如,片报头可能包括参数集参考,并且该参数可能被“激活”(706),如视频编码标准中所描述的,即指示了参考的参数集的值变得有效。由于PTI可能是参数集的一部分,通过该激活(706),PTI的值也可能变得有效。If the NAL unit type is used to indicate slice data (704) (other cases are not described), the decoder may parse the slice header (705) and then respond based on the information encoded therein, such as PTI information. For example, the slice header may include a parameter set reference, and the parameter may be "activated" (706), as described in video coding standards, indicating that the value of the referenced parameter set becomes valid. Since the PTI may be part of a parameter set, the value of the PTI may also become valid through this activation (706).
片报头还可能包括其本身的PTI,如已经描述的,该PTI可能不同于参数集中包括的PTI。已经描述了在编码中片报头中的PTI信息和编码在参数集中的PTI信息之间如何进行裁决的选择。例如,通过将基于片报头的PTI(如果存在)与参数集报头PTI(如果存在)关联,并且考虑在视频编码标准的其它部分中可能存在的任何限制条件(例如通过配置文件和等级做出的的PTI的限制和/或默认设置),解码器可能确定(707)将要用于解码主题片的最终PTI设置。应当注意的是,取决于参数集的PTI设置和片报头的PTI设置(包括片边界与瓦片边界是对齐的特定情况),针对片的不同边缘,PTI可能不同。The slice header may also include its own PTI, which, as already described, may be different from the PTI included in the parameter set. It has been described how the selection of how to arbitrate between the PTI information in the slice header and the PTI information encoded in the parameter set during encoding is performed. For example, by associating the slice header-based PTI (if present) with the parameter set header PTI (if present), and taking into account any constraints that may exist in other parts of the video coding standard (e.g., constraints and/or default settings for PTI made by profiles and levels), the decoder may determine (707) the final PTI setting to be used for decoding the subject slice. It should be noted that depending on the PTI setting of the parameter set and the PTI setting of the slice header (including the special case where slice boundaries are aligned with tile boundaries), the PTI may be different for different edges of the slice.
考虑最终PTI设置,解码器可能使用跨如由被编码到PTI中的信息指示的片或瓦片边界的预测和/或环内滤波技术来解码(708)片。Given the final PTI setting, the decoder may decode (708) the slice using prediction and/or in-loop filtering techniques across slice or tile boundaries as indicated by information encoded into the PTI.
对下一个NAL单元继续(709)该过程。The process continues (709) with the next NAL unit.
图7未示出除了片或参数集NAL单元之外的NAL单元的处理。FIG7 does not illustrate the processing of NAL units other than slice or parameter set NAL units.
图8是示出了根据本发明的实施方式基于数据处理系统(例如个人计算机(“PC”))800的实现的方框图。到此时为止,为了方便起见,未详细地将描述明确地与编码器和/或解码器的可能的物理实现相关。基于软件和/或组件的组合的许多不同的物理实现是可能的。在一些实现中,在许多情况中由于与成本效率和/或功耗效率相关的原因,可以例如使用定制或门阵列集成电路实现视频编码器和/或解码器。FIG8 is a block diagram illustrating an implementation based on a data processing system (e.g., a personal computer (“PC”)) 800 according to an embodiment of the present invention. Thus far, for convenience, the description has not been specifically related to possible physical implementations of the encoder and/or decoder. Many different physical implementations based on software and/or combinations of components are possible. In some implementations, the video encoder and/or decoder may be implemented, for example, using custom or gate array integrated circuits, in many cases for reasons related to cost efficiency and/or power efficiency.
另外,使用通用处理架构(它的一个实例是数据处理系统800)的基于软件的实现是可能的。例如通过使用个人计算机或类似的设备(例如机顶盒、膝上电脑、移动设备),如下所述该实现策略可以是可能的。如图8中所示的,根据所述实施方式,可以用包括这样一种指令的计算机可读介质801(例如CD-ROM、半导体ROM、记忆棒)的形式提供用于PC或类似的设备的编码器和/或解码器,其中,该指令被配置为允许处理器802单独地或与加速器硬件(例如图形处理器)803组合地结合耦合度处理器802和/或加速器硬件803的存储器804来执行编码或解码。可以将处理器802、存储器804和加速器硬件803耦合到可用于向/从前述设备传递比特流和未压缩视频的总线805。依赖于实现,可以将用于比特流和未压缩视频的输入/输出的外围设备耦合到总线805。可以例如经过合适的接口如帧接收器807或USB链路808将照相机806依附到总线805以便未压缩视频的实时输入。类似的接口可用于未压缩视频存储设备如VTR。可以经过显示器设备如计算机监视器或TV屏幕809输出未压缩视频。DVDRW驱动器或等效物(例如CD-ROM、CD-RW蓝光光盘、记忆棒)810可用于实现比特流的输入和/或输出。最后,为了在网络812上的实时传输,网络接口811可用于依赖于到网络812的接入链路的容量以及网络812自身来传递比特流和/或未压缩视频。In addition, software-based implementations using a general-purpose processing architecture (one example of which is a data processing system 800) are possible. For example, this implementation strategy may be possible using a personal computer or similar device (e.g., a set-top box, laptop, mobile device). As shown in FIG8 , according to the embodiment, an encoder and/or decoder for a PC or similar device may be provided in the form of a computer-readable medium 801 (e.g., a CD-ROM, semiconductor ROM, memory stick) comprising instructions, wherein the instructions are configured to allow a processor 802 to perform encoding or decoding alone or in combination with accelerator hardware (e.g., a graphics processor) 803 in conjunction with a memory 804 coupled to the processor 802 and/or accelerator hardware 803. The processor 802, memory 804, and accelerator hardware 803 may be coupled to a bus 805 that can be used to transfer bitstreams and uncompressed video to/from the aforementioned devices. Depending on the implementation, peripheral devices for input/output of bitstreams and uncompressed video may be coupled to the bus 805. A camera 806 can be attached to bus 805, for example, via a suitable interface such as a frame grabber 807 or a USB link 808, for real-time input of uncompressed video. Similar interfaces can be used for uncompressed video storage devices such as VTRs. Uncompressed video can be output via a display device such as a computer monitor or TV screen 809. A DVDRW drive or equivalent (e.g., a CD-ROM, CD-RW Blu-ray disc, memory stick) 810 can be used to enable input and/or output of bitstreams. Finally, for real-time transmission over a network 812, a network interface 811 can be used to deliver bitstreams and/or uncompressed video, depending on the capacity of the access link to the network 812 and the network 812 itself.
根据各种各样的实施方式,可以由各自的软件模块实现上述方法。根据其他实施方式,可以由各自的硬件模块实现上述方法。根据其他实施方式,可以软件模块与硬件模块的组合实现上述方法。According to various embodiments, the above methods may be implemented by respective software modules, according to other embodiments, the above methods may be implemented by respective hardware modules, or according to other embodiments, the above methods may be implemented by a combination of software modules and hardware modules.
虽然为了方便起见主要参考一种示例性方法来描述实施方式,但是上文参考数据处理系统800所讨论的装置可以根据所述实施方式被编程为使得所述方法能够实施。此外,用于数据处理系统800的制品如包括记录在其上的程序指令的预记录存储介质或其他类似的计算机可读介质或产品可以指导数据处理系统800助于所述方法的实现。将要理解,除了所述方法之外,该装置和制品也全部落入所述实施方式的范围之中。Although the embodiments are described primarily with reference to an exemplary method for convenience, the apparatus discussed above with reference to the data processing system 800 can be programmed to implement the method according to the embodiments. In addition, an article of manufacture for the data processing system 800, such as a pre-recorded storage medium or other similar computer-readable medium or product including program instructions recorded thereon, can direct the data processing system 800 to facilitate implementation of the method. It will be understood that, in addition to the method, such apparatus and articles of manufacture also fall within the scope of the embodiments.
具体而言,根据本发明的一个实施方式,可以将当被执行时导致由数据处理系统800执行本文所述的方法的指令序列包括在数据载体产品中。该数据载体产品可以被加载到数据处理系统800中并且被数据处理系统800运行。另外,根据本发明的一个实施方式,可以将该当被执行时导致由数据处理系统800执行本文所述的方法的指令序列包括在计算机程序或软件产品中。该计算机程序或软件产品可以被加载到数据处理系统800中并且被数据处理系统800运行。此外,根据本发明的一个实施方式,可以将当被执行时导致由数据处理系统800执行本文所述的方法的指令序列包括在可以包括协处理器或存储器的集成电路产品(例如硬件模块或多个模块)中。该集成电路产品可以被安装到数据处理系统800中。Specifically, according to one embodiment of the present invention, a sequence of instructions that, when executed, causes the data processing system 800 to perform the method described herein may be included in a data carrier product. The data carrier product may be loaded into the data processing system 800 and run by the data processing system 800. In addition, according to one embodiment of the present invention, a sequence of instructions that, when executed, causes the data processing system 800 to perform the method described herein may be included in a computer program or software product. The computer program or software product may be loaded into the data processing system 800 and run by the data processing system 800. In addition, according to one embodiment of the present invention, a sequence of instructions that, when executed, causes the data processing system 800 to perform the method described herein may be included in an integrated circuit product (e.g., a hardware module or modules) that may include a coprocessor or memory. The integrated circuit product may be installed in the data processing system 800.
以上的实施例可以有助于一种用于选择性地破坏视频编码中的预测和/或环内滤波的改进的系统和方法,并且可以提供一个或多个优点。侧如,对于单独的预测和环内滤波机制包括这样一种PTI可能有助于使比特流适合于编码和/或解码环境如编码器或解码器的硬件架构,其中该PTI涉及对该参数集进行参考的所有片和图像。此外,对于单独的预测和环内滤波机包括涉及给定片的PTI可以有助于使比特流适应内容,因此提高编码效率。The above embodiments may contribute to an improved system and method for selectively disrupting prediction and/or in-loop filtering in video coding, and may provide one or more advantages. For example, including a PTI for individual prediction and in-loop filtering mechanisms, where the PTI relates to all slices and pictures that reference the parameter set, may help adapt the bitstream to the encoding and/or decoding environment, such as the hardware architecture of an encoder or decoder. Furthermore, including a PTI for individual prediction and in-loop filtering mechanisms that relates to a given slice may help adapt the bitstream to the content, thereby improving coding efficiency.
本文所述的发明的实施方式仅适用于示例。因此,可以对于这些实施方式做出对细节的各种各样的变化和/或修改,全部该变化和/或修改都落入本发明的范围中。The embodiments of the invention described herein are intended to be examples only. Therefore, various changes and/or modifications to the details may be made to these embodiments, all of which fall within the scope of the invention.
Claims (6)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US61/427,569 | 2010-12-28 | ||
| US13/336,475 | 2011-12-23 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1234565A1 HK1234565A1 (en) | 2018-02-15 |
| HK1234565B true HK1234565B (en) | 2020-09-04 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12368862B2 (en) | Method and system for selectively breaking prediction in video coding | |
| HK1234565B (en) | Method and system for selectively breaking prediction in video coding | |
| HK1234564B (en) | Method and system for selectively breaking prediction in video coding | |
| HK1234559B (en) | Method and system for selectively breaking prediction in video coding | |
| HK1237571B (en) | Method and system for selectively breaking prediction in video coding | |
| HK1237569B (en) | Method and system for selectively breaking prediction in video coding | |
| HK1235947B (en) | Method and device for encoding or decoding, and storage medium | |
| HK1234566B (en) | Method and system for selectively breaking prediction in video coding | |
| HK1235589B (en) | Method and device for encoding or decoding, and storage medium | |
| HK1234566A1 (en) | Method and system for selectively breaking prediction in video coding | |
| HK1237571A1 (en) | Method and system for selectively breaking prediction in video coding | |
| HK1235947A1 (en) | Method and device for encoding or decoding, and storage medium | |
| HK1235589A1 (en) | Method and device for encoding or decoding, and storage medium | |
| HK1237569A1 (en) | Method and system for selectively breaking prediction in video coding | |
| HK1234565A1 (en) | Method and system for selectively breaking prediction in video coding | |
| HK1234559A1 (en) | Method and system for selectively breaking prediction in video coding | |
| HK1234564A1 (en) | Method and system for selectively breaking prediction in video coding |