[go: up one dir, main page]

WO2018229327A1 - Procédé, appareil et produit-programme informatique destinés au codage et au décodage vidéo - Google Patents

Procédé, appareil et produit-programme informatique destinés au codage et au décodage vidéo Download PDF

Info

Publication number
WO2018229327A1
WO2018229327A1 PCT/FI2018/050346 FI2018050346W WO2018229327A1 WO 2018229327 A1 WO2018229327 A1 WO 2018229327A1 FI 2018050346 W FI2018050346 W FI 2018050346W WO 2018229327 A1 WO2018229327 A1 WO 2018229327A1
Authority
WO
WIPO (PCT)
Prior art keywords
intra prediction
base
prediction
block
directions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/FI2018/050346
Other languages
English (en)
Inventor
Jani Lainema
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2018229327A1 publication Critical patent/WO2018229327A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Definitions

  • the present solution generally relates to video encoding and decoding.
  • Background This section is intended to provide a background or context to the invention that is recited in the claims.
  • the description herein may include concepts that could be pursued, but are not necessarily the ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
  • a video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
  • the encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
  • a method comprising determining a set of base intra prediction directions in an image; selecting a base intra prediction direction from the set of base intra prediction directions for an intra prediction block; determining if mode rotation is applied for said intra prediction block; if mode rotation is applied for said intra prediction block, determining an effective prediction direction, where the effective prediction direction is defined to be between said selected base prediction direction and next base prediction direction in the set of base intra prediction directions either in clockwise or counterclockwise direction; if mode rotation is not applied for said intra prediction block, using said selected intra prediction direction as an effective prediction direction; and generating sample prediction for said intra prediction block using the effective prediction direction.
  • an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to determine a set of base intra prediction directions in an image; select a base intra prediction direction from the set of base intra prediction directions for an intra prediction block; determine if mode rotation is applied for said intra prediction block; if mode rotation is applied for said intra prediction block, determine an effective prediction direction, where the effective prediction direction is defined to be between said selected base prediction direction and next base prediction direction in the set of base intra prediction directions either in clockwise or counterclockwise direction; if mode rotation is not applied for said intra prediction block, use said selected intra prediction direction as an effective prediction direction; and generate sample prediction for said intra prediction block using the effective prediction direction.
  • a computer program product embodied on a non-transitory computer readable medium, comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to determine a set of base intra prediction directions in an image; select a base intra prediction direction from the set of base intra prediction directions for an intra prediction block; determine if mode rotation is applied for said intra prediction block; if mode rotation is applied for said intra prediction block, determine an effective prediction direction, where the effective prediction direction is defined to be between said selected base prediction direction and next base prediction direction in the set of base intra prediction directions either in clockwise or counterclockwise direction; if mode rotation is not applied for said intra prediction block, use said selected intra prediction direction as an effective prediction direction; and generate sample prediction for said intra prediction block using the effective prediction direction.
  • the set of base intra prediction directions is determined by defining a set of static directions with displacement parameters.
  • a displacement parameter describes how much each sample line of the image is offset with respect to previous sample line of the image.
  • determining if mode rotation is applied for said intra prediction block is based on dimensions of said intra prediction block to direction of the prediction.
  • determining if mode rotation is applied for said intra prediction block is based on base prediction direction of said intra prediction block.
  • determining if mode rotation is applied for said intra prediction block is based on signalling in a video bitstream.
  • Fig. 1 shows an apparatus according to an embodiment in a simplified block chart
  • Fig. 2 shows a layout of an apparatus according to an embodiment
  • Fig. 3 shows an example of a system
  • Fig. 4 shows an encoder according to an embodiment
  • Fig. 5 shows a decoder according to an embodiment
  • Fig. 6a illustrates a set of traditional directional prediction directions
  • Fig. 6b illustrates a set of rotated directions
  • Fig. 7 illustrates an example of block location based mode rotation
  • Fig. 8 is a flowchart of a method according to an embodiment.
  • the present embodiments aim to improve intra prediction methods.
  • Figure 1 shows a block diagram of a video coding system according to an example embodiment as a schematic block diagram of an electronic device 50, which may incorporate a codec.
  • the electronic device may comprise an encoder or a decoder.
  • Figure 2 shows a layout of an apparatus according to an embodiment.
  • the electronic device 50 may for example be a mobile terminal or a user equipment of a wireless communication system or a camera device.
  • the electronic device 50 may be also comprised at a local or a remote server or a graphics processing unit of a computer.
  • the device may be also comprised as part of a head- mounted display device.
  • the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
  • the apparatus 50 may further comprise a display 32 in the form of a liquid crystal display.
  • the display may be any suitable display technology suitable to display an image 30 or video.
  • the apparatus 50 may further comprise a keypad 34.
  • any suitable data or user interface mechanism may be employed.
  • the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
  • the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
  • the apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
  • the apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
  • the apparatus may further comprise a camera 42 capable of recording or capturing images and/or video.
  • the camera 42 is a multi-lens camera system having at least two camera sensors.
  • the camera is capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing.
  • the apparatus may receive the video and/or image data for processing from another device prior to transmission and/or storage.
  • the apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices.
  • the apparatus may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB (Universal Serial Bus)/firewire wired connection.
  • the apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50.
  • the apparatus or the controller 56 may comprise one or more processors or processor circuitry and be connected to memory 58 which may store data in the form of image, video and/or audio data, and/or may also store instructions for implementation on the controller 56 or to be executed by the processors or the processor circuitry.
  • the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of image, video and/or audio data or assisting in coding and decoding carried out by the controller.
  • the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC (Universal Integrated Circuit Card) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network.
  • the 30 apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).
  • the apparatus may comprise one or more wired interfaces configured to transmit and/or receive data over a wired connection, for example an electrical cable or an optical fiber connection.
  • wired interface may be configured to operate according to one or more digital display interface standards, such as for example High-Definition Multimedia Interface (HDMI), Mobile High-definition Link (MHL), or Digital Visual Interface (DVI).
  • HDMI High-Definition Multimedia Interface
  • MHL Mobile High-definition Link
  • DVI Digital Visual Interface
  • the system 10 comprises multiple communication devices which can communicate through one or more networks.
  • the system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
  • the system 10 may include both wired and wireless communication devices and/or apparatus 50 suitable for implementing embodiments of the invention.
  • the system shown in Figure 3 shows a mobile telephone network 1 1 and a representation of the internet 28.
  • Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
  • the example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22.
  • PDA personal digital assistant
  • IMD integrated messaging device
  • the apparatus 50 may be stationary or mobile when carried by an individual who is moving.
  • the apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.
  • the embodiments may also be implemented in a set-top box; i.e. a digital TV receiver which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware or software or combination of the encoder/decoder implementations, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.
  • Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24.
  • the base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 1 1 and the internet 28.
  • the system may include additional communication devices and communication devices or various types.
  • the communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MSS), email, instant messaging service (IMS), Bluetooth, IEEE 802.1 1 and any similar wireless communication technology.
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • FDMA frequency division multiple access
  • TCP-IP transmission control protocol-internet protocol
  • SMS short messaging service
  • MSS multimedia messaging service
  • email instant messaging service
  • IMS instant messaging service
  • Bluetooth IEEE 802.1 1 and any similar wireless communication technology.
  • a communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable
  • a video codec comprises an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
  • the encoder may discard some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate).
  • An image codec or a picture codec is similar to a video codec, but it encodes each input picture independently from other input pictures and decodes each coded picture independently from other coded pictures. It needs to be understood that whenever a video codec, video encoding or encoder, or video decoder or decoding is referred below, the text similarly applies to an image codec, image encoding or encoder, or image decoder or decoding, respectively.
  • a picture given as an input to an encoder may also referred to as a source picture, and a picture decoded by a decoded may be referred to as a decoded picture.
  • the source and decoded pictures are each comprised of one or more sample arrays, such as one of the following sets of sample arrays:
  • RGB Green, Blue and Red
  • Term pixel may refer to the set of spatially collocating samples of the sample arrays of the color components. Sometimes, depending on the context, term pixel may refer to a sample of one sample array only.
  • these arrays may be referred to as luma (or L or Y) and chroma, where the two chroma arrays may be referred to as Cb and Cr; regardless of the actual color representation method in use.
  • the actual color representation method in use can be indicated e.g. in a coded video bitstream.
  • a component may be defined as an array or single sample from one of the three sample arrays arrays (luma and two chroma) or the array or a single sample of the array that compose a picture in monochrome format.
  • a picture may either be a frame or a field, while in some coding systems a picture may be constrained to be a frame.
  • a frame comprises a matrix of luma samples and possibly the corresponding chroma samples.
  • a field is a set of alternate sample rows of a frame and may be used as encoder input, when the source signal is interlaced.
  • Chroma sample arrays may be absent (and hence monochrome sampling may be in use) or chroma sample arrays may be subsampled when compared to luma sample arrays.
  • Chroma formats may be summarized as follows:
  • each of the two chroma arrays has half the height and half the width of the luma array.
  • each of the two chroma arrays has the same height and half the width of the luma array.
  • each of the two chroma arrays has the same height and width as the luma array.
  • Spatial resolution of a picture may be defined as the number of pixels or samples representing the picture in horizontal and vertical direction.
  • spatial resolution of a first picture may be defined to be the same as that of a second picture, when their sampling grids are the same, i.e. the same sampling interval is used both in the first picture and in the second picture.
  • the latter definition may be applied for example when the first picture and the second picture cover different parts of a picture.
  • a region of a picture may be defined to have a first resolution when the first region comprises a first number of pixels or samples.
  • the same region may be defined to have a second resolution when the region comprises a second number of pixels.
  • resolution can be defined as the number of pixels with respect to the area covered by the pixels, or, by pixels per degree.
  • luma and chroma sample arrays are coded in an interleaved manner, e.g. interleaved block-wise.
  • each one of them is separately processed (by the encoder and/or the decoder) as a picture with monochrome sampling.
  • Video encoders may encode the video information in two phases.
  • pixel values in a certain picture area are predicted.
  • the prediction may be performed for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded), which may be referred to as inter prediction or inter-picture prediction.
  • the prediction may be performed for example by spatial means (using the pixel values around the block to be coded in a specified manner), which may be referred to as intra prediction or spatial prediction.
  • prediction may be absent or the prediction signal may be pre- defined (e.g. a zero-valued block).
  • the prediction error i.e. the difference between the predicted block of pixels and the original block of pixels
  • the prediction error is coded. This may done for example by transforming the difference in pixel values using a specified transform (e.g. Discrete Cosine Transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients.
  • DCT Discrete Cosine Transform
  • encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate).
  • pixel values are coded without transforming them for example using differential pulse code modulation and entropy coding, such as Huffman coding or arithmetic coding.
  • Figure 4 illustrates an image to be encoded (l n ); a predicted representation of an image block (P' n ); a prediction error signal (D n ); a reconstructed prediction error signal (D' n ); a preliminary reconstructed image (l' n ); a final reconstructed image (R' n ); a transform (T) and inverse transform (T ⁇ 1 ); a quantization (Q) and inverse quantization (Cr 1 ); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pintra); mode selection (MS) and filtering (F).
  • video pictures are divided into coding units (CU) covering the area of the picture.
  • a CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in the said CU.
  • PU prediction units
  • TU transform units
  • a CU consists of a square block of samples with a size selectable from a predefined set of possible CU sizes.
  • a CU with the maximum allowed size is typically named as LCU (largest coding unit) or CTU (coding tree unit) and the video picture is divided into non-overlapping CTUs.
  • a CTU can be further split into a combination of smaller CUs, e.g.
  • Each resulting CU typically has at least one PU and at least one TU associated with it.
  • Each PU and TU can be further split into smaller PUs and TUs in order to increase granularity of the prediction and prediction error coding processes, respectively.
  • Each PU has prediction information associated with it defining what kind of a prediction is to be applied for the pixels within that PU (e.g. motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs).
  • each TU is associated with information describing the prediction error decoding process for the samples within the said TU (including e.g. DCT coefficient information).
  • the decoder reconstructs the output video by applying prediction means similar to the encoder to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation) and prediction error decoding (inverse operation of the prediction error coding recovering the quantized prediction error signal in spatial pixel domain). After applying prediction and prediction error decoding means the decoder sums up the prediction and prediction error signals (pixel values) to form the output video frame.
  • the decoder (and encoder) can also apply additional filtering means to improve the quality of the output video before passing it for display and/or storing it as prediction reference for the forthcoming frames in the video sequence.
  • Figure 5 illustrates a predicted representation of an image block (P' n ); a reconstructed prediction error signal (D' n ); a preliminary reconstructed image (l' n ); a final reconstructed image (R' n ); an inverse transform (T ⁇ 1 ); an inverse quantization (Q ⁇ 1 ); an entropy decoding (E ⁇ 1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
  • a color palette based coding can be used.
  • Palette based coding refers to a family of approaches for which a palette, i.e. a set of colors and associated indexes, is defined and the value for each sample within a coding unit is expressed by indicating its index in the palette.
  • Palette based coding can achieve good coding efficiency in coding units with a relatively small number of colors (such as image areas which are representing computer screen content, like text or simple graphics).
  • different kinds of palette index prediction approaches can be utilized, or the palette indexes can be run-length coded to be able to represent larger homogenous image areas efficiently.
  • escape coding can be utilized. Escape coded samples are transmitted without referring to any of the palette indexes. Instead their values are indicated individually for each escape coded sample.
  • the motion information may be indicated with motion vectors associated with each motion compensated image block.
  • Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder side) or decoded (in the decoder side) and the prediction source block in one of the previously coded or decoded pictures.
  • the predicted motion vectors may be created in a predefined way, for example calculating the median of the encoded or decoded motion vectors of the adjacent blocks.
  • Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor.
  • the reference index of previously coded/decoded picture can be predicted.
  • the reference index can be predicted from adjacent blocks and/or or co-located blocks in temporal reference picture.
  • high efficiency video codecs may employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vector and corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction.
  • predicting the motion field information is carried out using the motion field information of adjacent blocks and/or co-located blocks in temporal reference pictures and the used motion field information is signaled among a list of motion field candidate list filled with motion field information of available adjacent/co-located blocks.
  • Video codecs may support motion compensated prediction from one source image (uni-prediction) and two sources (bi-prediction).
  • uni-prediction a single motion vector is applied whereas in the case of bi-prediction two motion vectors are signaled and the motion compensated predictions from two sources are averaged to create the final sample prediction.
  • weighted prediction the relative weights of the two predictions can be adjusted, or a signaled offset can be added to the prediction signal.
  • the displacement vector indicates where from the same picture a block of samples can be copied to form a prediction of the block to be coded or decoded.
  • This kind of intra block copying methods can improve the coding efficiency substantially in presence of repeating structures within the frame - such as text or other graphics.
  • Video encoders may utilize Lagrangian cost functions to find optimal coding modes, e.g. the desired Macroblock mode and associated motion vectors. This kind of cost function uses a weighting factor ⁇ to tie together the (exact or estimated) image distortion due to lossy coding methods and the (exact or estimated) amount of information that is required to represent the pixel values in an image area:
  • Scalable video coding may refer to coding structure where one bitstream can contain multiple representations of the content, for example, at different bitrates, resolutions or frame rates. In these cases the receiver can extract the desired representation depending on its characteristics (e.g. resolution that matches best the display device). Alternatively, a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on e.g. the network characteristics or processing capabilities of the receiver.
  • a meaningful decoded representation can be produced by decoding only certain parts of a scalable bit stream.
  • a scalable bitstream typically consists of a "base layer" providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers.
  • the coded representation of that layer typically depends on the lower layers.
  • the motion and mode information of the enhancement layer can be predicted from lower layers.
  • the pixel data of the lower layers can be used to create prediction for the enhancement layer.
  • a video signal can be encoded into a base layer and one or more enhancement layers.
  • An enhancement layer may enhance, for example, the temporal resolution (i.e., the frame rate), the spatial resolution, or simply the quality of the video content represented by another layer or part thereof.
  • Each layer together with all its dependent layers is one representation of the video signal, for example, at a certain spatial resolution, temporal resolution and quality level.
  • a scalable video codec for quality scalability also known as Signal-to-Noise or SNR
  • spatial scalability may be implemented as follows.
  • a base layer a conventional non-scalable video encoder and decoder is used.
  • the reconstructed/decoded pictures of the base layer are included in the reference picture buffer for an enhancement layer.
  • the base layer decoded pictures may be inserted into a reference picture list(s) for coding/decoding of an enhancement layer picture similarly to the decoded reference pictures of the enhancement layer.
  • the encoder may choose a base-layer reference picture as inter prediction reference and indicate its use typically with a reference picture index in the coded bitstream.
  • the decoder decodes from the bitstream, for example from a reference picture index, that a base-layer picture is used as inter prediction reference for the enhancement layer.
  • a decoded base-layer picture is used as prediction reference for an enhancement layer, it is referred to as an inter-layer reference picture.
  • quality scalability following scalability modes exist:
  • Base layer pictures are coded at a lower resolution than enhancement layer pictures.
  • Base layer pictures are coded at lower bit-depth (e.g. 8 bits) than enhancement layer pictures (e.g. 10 or 12 bits).
  • Enhancement layer pictures provide higher fidelity in chroma (e.g. coded in 4:4:4 chroma format) than base layer pictures (e.g. 4:2:0 format).
  • enhancement layer pictures have a richer/broader color representation range than that of the base layer pictures - for example the enhancement layer may have UHDTV (ITU-R BT.2020) color gamut and the base layer may have the ITU-R BT.709 color gamut.
  • UHDTV ITU-R BT.2020
  • the base layer represents a first view
  • an enhancement layer represents a second view
  • a layer or some layers of a bitstream may represent texture view(s), while other layer or layers may represent depth view(s).
  • ROI scalability may be defined as a type of scalability wherein an enhancement layer enhances only part of a reference- layer picture e.g. spatially, quality-wise, in bit-depth, and/or along other scalability dimensions.
  • ROI scalability may be used together with other types of scalabilities, it may be considered to form a different categorization of scalability types.
  • an enhancement layer can be transmitted to enhance the quality and/or a resolution of a region in the base layer.
  • a decoder receiving both enhancement and base layer bitstream might decode both layers and overlay the decoded pictures on top of each other and display the final picture.
  • Interlaced-to-progressive scalability also known as field-to-frame scalability: coded interlaced source content material of the base layer is enhanced with an enhancement layer to represent progressive source content.
  • Hybrid codec scalability also known as coding standard scalability:
  • hybrid codec scalability the bitstream syntax, semantics and decoding process of the base layer and the enhancement layer are specified in different video coding standards.
  • base layer pictures are coded according to a different coding standard or format than enhancement layer pictures. It should be understood that many of the scalability types may be combined and applied together.
  • base layer information could be used to code enhancement layer to minimize the additional bitrate overhead.
  • Scalability can be enabled in two basic ways. Either by introducing new coding modes for performing prediction of pixel values or syntax from lower layers of the scalable representation or by placing the lower layer pictures to the reference picture buffer (decoded picture buffer, DPB) of the higher layer.
  • the first approach is more flexible and thus can provide better coding efficiency in most cases.
  • the second, reference frame based scalability, approach can be implemented very efficiently with minimal changes to single layer codecs while still achieving majority of the coding efficiency gains available.
  • a reference frame based scalability codec can be implemented by utilizing the same hardware or software implementation for all the layers, just taking care of the DPB management by external means.
  • images can be split into independently codable and decodable image segments (slices or tiles).
  • Slices typically refer to image segments constructed of certain number of basic coding units that are processed in default coding or decoding order, while tiles typically refer to image segments that have been defined as rectangular image regions that are processed at least to some extend as individual frames.
  • Known intra prediction methods rely on a fixed set of available prediction directions. This limits accuracy of the sample prediction process or means that extra bits need to be spent if one decides to increase number and accuracy of intra prediction directions.
  • H.264/AVC and H.265/HEVC video coding standards use traditional spatial intra prediction with square prediction blocks and intra prediction directions ranging from -45 to +45 degrees from above the block and -45 to +45 degrees from left of the block.
  • the Joint Exploration Test model 6 (JEM6) supports rectangular (square and non-square) prediction blocks with intra prediction directions with the same range of angles as H.265/HEVC, but introduces more prediction directions in between the H.265/HEVC directions.
  • a method comprises selecting available intra prediction directions adaptively from a set of base prediction directions or a set of prediction directions that is generated by rotating based prediction direction by a fraction relative to the difference of two base prediction directions.
  • Figure 6a illustrates traditional directional prediction directions from top-right direction (dashed arrows) and Figure 6b a set of refinement directions generated by rotating the original directions counterclockwise (solid arrows).
  • the decision between the two prediction direction sets can be done based on characteristics or location of the prediction blocks, bitstream signalling or a combination of those.
  • a codec operating according the present embodiments can increase the internal accuracy of directional intra prediction with minimal or no impact on bitrate requirements.
  • o the effective prediction direction is defined to be between said selected base prediction direction and next base prediction direction in the set of base intra prediction directions either in clockwise or counterclockwise direction;
  • Determination of the set of base intra prediction directions can be done for example by defining a set of static directions with displacement parameters describing how much each line of samples is offset with respect to a previous line of samples.
  • the displacement can be described for example in the units of 1/32 fractions of a sample.
  • An example set using the directionalities defined in the H.265 video coding standard could then be as has been shown in Table 1 below, illustrating a mapping of mode indexes to displacement parameters defining the intra prediction directions in H.265 given in 1/32 fractional sample accuracy:
  • the displacement parameters can be defined in 1/64 sample fractions and there can be 65 directional intra prediction modes with for example with mode indexes 2 to 66 as defined in JEM 6.0 software.
  • the other example has been illustrates in Table 2 showing mapping of mode indexes to displacement parameters defining the intra prediction direction in JEM given in 1/64 fractional sample accuracy.
  • the two first modes (Planar mode with index 0 and DC prediction mode with index 1 ) are non-directional modes and can be excluded from the mode rotation.
  • the selected base intra prediction direction can be determined in different ways.
  • a rate-distortion optimized mode decision can be performed to select the best mode for the prediction block.
  • Information relating to the mode selection can then be encoded in the bitstream, and the decoder can determine the selected base prediction direction by decoding syntax elements associated with the selection. Determining if mode rotation is applied can be done in different ways. It can for example depend on the dimensions of the block or location of the block and there can be additional signalling for blocks that qualify for mode rotation to indicate if a rotated mode or a base mode is used for the block. Dimensions of the block to the direction of the prediction can be used to determine if a mode rotation is applicable for a block.
  • the height of the block can be compared to a threshold value and in the case of horizontal prediction directions the width of the block can be compared to a threshold value and based on the outcome of the comparison, a block can be qualified as a candidate for mode rotation.
  • the minimum or the maximum of the width and height of the block can be compared to a threshold when determining if mode rotation is applicable to a block.
  • Location of the block can be used to determine if a mode rotation is applicable to the block.
  • the parity of the block coordinates can be used to enable mode rotation.
  • Figure 7 illustrates an example of block location based mode rotation.
  • Base direction (B) are used for blocks with even sum of horizontal and vertical grid coordinates on their top-left corner and rotated directions (R) are used for blocks with odd locations.
  • Figure 7 illustrates the selection process in a case where blocks with even sum of block coordinates use base intra prediction modes and blocks with odd sum of block coordinates use rotated set of intra prediction modes.
  • the selected projection can be used in determining blocks for which the mode rotation is applicable and the selection can be based e.g.
  • the effective prediction direction and associated displacement parameter can be calculated using the displacement parameter of the indicated base prediction direction.
  • the displacement parameter of the rotated angle can be calculated as an average of the displacement parameters of the indicated based intra prediction mode, and an intra prediction mode with an index one larger or one smaller than that of the base intra prediction mode. Whether to use one larger or one smaller index for the reference mode can depend e.g. on mode index or on absolute value of the displacement parameter of the base intra prediction mode.
  • the rotation can also be done in a predefined direction, e.g. always clockwise or it can be done e.g.
  • the calculation of the effective prediction direction for rotated modes can be done either during the intra prediction process, or those can be pre-calculated and stored in a memory to be used at the time of the intra prediction process. If pre-calculated displacement parameters are used, they could be defined to be for example as shown in Table 3 illustrating example of displacements for base prediction directions and associated rotated prediction directions in 1/64 fractional sample accuracy:
  • the block can be predicted using traditional spatial intra prediction means. That is, samples of the prediction block can be calculated by extrapolating reference samples on the borders of the block using the selected effective displacement parameter.
  • a method for encoding according to an embodiment is illustrated in Figure 8 as a flowchart.
  • the method according to example of Figure 8 comprises determine a set of base intra prediction directions 810; determine the selected base intra prediction direction from the set of base intra prediction directions for an intra prediction block 820; determine if mode rotation is applied for said intra prediction block 830; if mode rotation is applied for said intra prediction block, determine an effective prediction direction 840, where the effective prediction direction is defined to be between said selected base prediction direction and next base prediction direction either in clockwise or counterclockwise direction; if mode rotation is not applied for said intra prediction block, use said selected intra prediction direction as the effective prediction direction 850; and generate sample prediction for said intra prediction block using said effective prediction direction 860.
  • the method of Figure 8 can be complemented with any one or more features that are derivable from the description above.
  • the method of Figure 8 can be complemented with one or more of the following embodiments.
  • the base prediction direction can be used to determine if a mode rotation is applicable for a block.
  • mode rotation can be allowed for a block if the base prediction direction is close enough to either directly horizontal or vertical direction mode.
  • the video layer or color channel may determine if a mode rotation is applicable for a block.
  • mode rotation can be switched off for the chrominance channels of YCbCr/YUV video, as content of those channels is typically of smoother nature than that of the luminance channel and thus typically does not require as accurate sample prediction. It can also be signalled e.g. in a video parameter set or a picture parameter set on which channels mode rotation is applicable.
  • a mode rotation may be applied based on analysis of intra prediction reference samples. For example, if such analysis indicates high frequency content in the reference samples, a mode rotation indicator can be signalled in the bitstream in order to take advantage of the fine accuracy intra prediction in the presence of strong edges or texture in the content.
  • a mode rotation may be applied based on analysis of residual signal or residual transform coefficients. For example, if such analysis indicates high frequency content in the residual signal, a mode rotation indicator can be signalled in the bitstream in order to take advantage of the fine accuracy intra prediction in the presence of strong edges or texture in the content.
  • a mode rotation indicator may be signalled in the bitstream for the blocks for which the mode rotation is applicable.
  • the mode rotation indicator can be hidden in the bitstream, for example in the parity of the transform coefficients of the residual signal.
  • a mode rotation may be applied based on the projection format of 360 degree or wide angle video.
  • mode rotation can be enabled for certain areas of the picture where geometric distortions due to the projection benefit from increased number of effective prediction directions.
  • An apparatus comprises means for implementing the method.
  • the apparatus comprises means for determining a set of base intra prediction directions; means for determining the selected base intra prediction direction from the set of base intra prediction directions for an intra prediction block; means for determining if mode rotation is applied for said intra prediction block; means for determining if mode rotation is applied for said intra prediction block, and then for determining the effective prediction direction, where the effective prediction direction is defined to be between said selected base prediction direction and next base prediction direction either in clockwise or counterclockwise direction; means for determining if mode rotation is not applied for said intra prediction block, and then for using said selected intra prediction direction as the effective prediction direction; means for generating sample prediction for said intra prediction block using said effective prediction direction.
  • the means of the apparatus can be implemented as at least one processor and a memory including computer program code.
  • a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
  • a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.
  • the different functions discussed herein may be performed in a different order and/or concurrently with other.
  • one or more of the above-described functions and embodiments may be optional or may be combined.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé et un équipement technique, le procédé consistant à déterminer un ensemble de directions de prédiction intra de base dans une image ; à sélectionner une direction de prédiction intra de base à partir de l'ensemble de directions de prédiction intra de base pour un bloc de prédiction intra ; à déterminer si une rotation de mode est appliquée audit bloc de prédiction intra ; si une rotation de mode est appliquée audit bloc de prédiction intra, à déterminer une direction de prédiction active, la direction de prédiction active étant définie de manière à être entre ladite direction de prédiction de base sélectionnée et la direction de prédiction de base suivante dans l'ensemble de directions de prédiction intra de base, soit dans le sens des aiguilles d'une montre, soit dans le sens inverse des aiguilles d'une montre ; si une rotation de mode n'est pas appliquée audit bloc de prédiction intra, à utiliser ladite direction de prédiction intra sélectionnée comme direction de prédiction active ; et à générer une prédiction d'échantillon pour ledit bloc de prédiction intra à l'aide de la direction de prédiction active.
PCT/FI2018/050346 2017-06-16 2018-05-09 Procédé, appareil et produit-programme informatique destinés au codage et au décodage vidéo Ceased WO2018229327A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20175564 2017-06-16
FI20175564 2017-06-16

Publications (1)

Publication Number Publication Date
WO2018229327A1 true WO2018229327A1 (fr) 2018-12-20

Family

ID=64660040

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2018/050346 Ceased WO2018229327A1 (fr) 2017-06-16 2018-05-09 Procédé, appareil et produit-programme informatique destinés au codage et au décodage vidéo

Country Status (1)

Country Link
WO (1) WO2018229327A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113192148A (zh) * 2021-04-12 2021-07-30 中山大学 基于调色盘的属性预测方法、装置、设备及介质
CN113678446A (zh) * 2019-03-12 2021-11-19 鸿颖创新有限公司 用于编码视频数据的装置和方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011188130A (ja) * 2010-03-05 2011-09-22 Nippon Hoso Kyokai <Nhk> イントラ予測装置、符号化器、復号器及びプログラム
US20110317757A1 (en) * 2010-06-25 2011-12-29 Qualcomm Incorporated Intra prediction mode signaling for finer spatial prediction directions
US20140064368A1 (en) * 2011-06-24 2014-03-06 Mitsubishi Electric Corporation Image encoding device, image decoding device, image encoding method, image decoding method, and image prediction device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011188130A (ja) * 2010-03-05 2011-09-22 Nippon Hoso Kyokai <Nhk> イントラ予測装置、符号化器、復号器及びプログラム
US20110317757A1 (en) * 2010-06-25 2011-12-29 Qualcomm Incorporated Intra prediction mode signaling for finer spatial prediction directions
US20140064368A1 (en) * 2011-06-24 2014-03-06 Mitsubishi Electric Corporation Image encoding device, image decoding device, image encoding method, image decoding method, and image prediction device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113678446A (zh) * 2019-03-12 2021-11-19 鸿颖创新有限公司 用于编码视频数据的装置和方法
CN113678446B (zh) * 2019-03-12 2024-01-30 鸿颖创新有限公司 对比特流进行解码的电子装置和方法
US11943437B2 (en) 2019-03-12 2024-03-26 FG Innovation Company Limited Device and method for coding video data
CN113192148A (zh) * 2021-04-12 2021-07-30 中山大学 基于调色盘的属性预测方法、装置、设备及介质

Similar Documents

Publication Publication Date Title
EP3120548B1 (fr) Décodage vidéo utilisant une palette à long terme
US10368097B2 (en) Apparatus, a method and a computer program product for coding and decoding chroma components of texture pictures for sample prediction of depth pictures
US11924457B2 (en) Method and apparatus for affine based inter prediction of chroma subblocks
US20140092977A1 (en) Apparatus, a Method and a Computer Program for Video Coding and Decoding
US20150312568A1 (en) Method and technical equipment for video encoding and decoding
US20150326864A1 (en) Method and technical equipment for video encoding and decoding
WO2018229327A1 (fr) Procédé, appareil et produit-programme informatique destinés au codage et au décodage vidéo
WO2017093604A1 (fr) Procédé, appareil et produit-programme informatique destinés au codage et au décodage vidéo
WO2016051362A1 (fr) Procédé et équipement pour l&#39;encodage et le décodage d&#39;un vecteur de copie intra bloc
WO2025256837A1 (fr) Procédé, appareil et produit programme d&#39;ordinateur pour le codage et le décodage d&#39;un matériel vidéo
WO2025256838A1 (fr) Procédé, appareil et produit-programme d&#39;ordinateur pour le codage et le décodage d&#39;un matériel vidéo
US20250008159A1 (en) Arithmetic coding with spatial tuning
WO2025153216A1 (fr) Prédiction intra avec extrapolation basée sur un biais
WO2025119565A1 (fr) Procédé, appareil et produit-programme informatique de codage et de décodage vidéo
WO2026012672A1 (fr) Appareil, procédé et programme informatique pour le codage et le décodage de vidéo
WO2026012671A1 (fr) Appareil, procédé et programme informatique pour codage et décodage de vidéo
WO2023242466A1 (fr) Procédé, appareil et produit-programme informatique de codage vidéo
WO2025195685A1 (fr) Procédé, appareil et produit programme d&#39;ordinateur destinés au codage et décodage vidéo
WO2025261661A1 (fr) Appareil, procédé et programme informatique pour le codage et le décodage de vidéos
WO2025261660A1 (fr) Appareil, procédé et programme informatique pour le codage et le décodage de vidéo
WO2025261659A1 (fr) Appareil, procédé et programme informatique pour le codage et le décodage de vidéo
EP4420350A1 (fr) Codage vidéo utilisant des unités parallèles
WO2025201742A1 (fr) Appareil, procédé et programme informatique pour le codage et le décodage de vidéo
WO2025201741A1 (fr) Procédé et appareil de décodage vidéo
WO2025003557A1 (fr) Procédé, appareil et produit-programme informatique de codage et de décodage vidéo

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18818896

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18818896

Country of ref document: EP

Kind code of ref document: A1