[go: up one dir, main page]

WO2010021664A1 - Depth coding - Google Patents

Depth coding Download PDF

Info

Publication number
WO2010021664A1
WO2010021664A1 PCT/US2009/004540 US2009004540W WO2010021664A1 WO 2010021664 A1 WO2010021664 A1 WO 2010021664A1 US 2009004540 W US2009004540 W US 2009004540W WO 2010021664 A1 WO2010021664 A1 WO 2010021664A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
target
prediction mode
neighbor
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2009/004540
Other languages
French (fr)
Inventor
Dong Tian
Purvin Bibhas Pandit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Licensing SAS
Original Assignee
Thomson Licensing SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing SAS filed Critical Thomson Licensing SAS
Publication of WO2010021664A1 publication Critical patent/WO2010021664A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/196Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
    • H04N19/197Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters including determination of the initial value of an encoding parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • Implementations are described that relate to coding systems. Various particular implementations relate to depth coding
  • 3D Video is a key technology that serves a wide variety of applications, including home entertainment and surveillance.
  • depth data is typically associated with each view.
  • Depth data is typically essential for view synthesis when using the technique of depth image based rendering (DIBR). Even a small improvement in the quality of a depth map may result in a much better synthesized view.
  • DIBR depth image based rendering
  • a neighbor-block prediction mode is accessed for a neighbor block in a picture.
  • the neighbor block is a spatial neighbor of a target block in the picture.
  • the neighbor-block prediction mode indicates how the neighbor-block has been predicted in an encoding process of the neighbor block.
  • a target-block prediction mode is determined for the target block based on the neighbor-block prediction mode.
  • the target-block prediction mode indicates how the target-block is to be predicted in an encoding process of the target block.
  • the target block is predicted using the target-block prediction mode, to produce a target-block prediction.
  • the target-block prediction is evaluated. It is determined whether to use the target-block prediction, as a prediction for the target block, based on a result of the evaluating.
  • Prediction mode information is encoded for the target block.
  • the prediction mode information indicates that the target block of data was predicted using a prediction mode that was determined based on the neighbor-block prediction mode.
  • prediction mode information is accessed for reconstructing a target block of data from a picture.
  • the prediction mode information indicates that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block.
  • the neighbor block is a spatial neighbor of the target block in the picture.
  • the neighbor-block prediction mode of the neighbor block is determined.
  • the target-block prediction mode of the target block is determined based on the neighbor-block prediction mode.
  • the target block is reconstructed by predicting the target block using the determined target-block prediction mode.
  • implementations may be configured or embodied in various manners.
  • an implementation may be performed as a method, or embodied as apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal.
  • apparatus such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal.
  • Figure 1 is a diagram of an implementation of an encoder.
  • Figure 2 is a diagram of an implementation of a decoder.
  • Figure 3 is a diagram of an implementation of a video transmission system.
  • Figure 4 is a diagram of an implementation of a video receiving system.
  • Figure 5 is a diagram of an implementation of a video processing device.
  • Figure 6 is a diagram of an implementation of a system for transmitting and receiving multi-view video with depth information.
  • Figure 7 is a diagram of an implementation of an encoding process.
  • Figure 8 is a diagram of an implementation of a decoding process.
  • Figure 9 is a diagram of an implementation of a rate distortion calculation process.
  • Figure 10 is a diagram showing an implementation of a depth map.
  • Figure 11 is a diagram of an implementation of an encoding process.
  • Figure 12 is a diagram of an implementation of a decoding process.
  • a coding framework for a macroblock of a picture for example, a frame or a field
  • a macroblock of a picture for example, a frame or a field
  • No residue is stored or transmitted. Accordingly, a single piece of information is stored or transmitted (and possibly encoded) to indicate the mode, and therefore prediction, for the whole macroblock.
  • Various implementations evaluate the borrowed coding mode before using the borrowed coding mode to encode the macroblock.
  • One implementation uses the borrowed coding mode to generate a prediction (for example, the borrowed coding mode may instruct that the pixels are to be copied from the left border). If the prediction is good enough, then the borrowed coding mode is used, again without any residue.
  • Other implementations also store or transmit the residue.
  • Various implementations apply this framework to l-pictures, as well as P or B pictures.
  • Various implementations apply this framework to depth data, such as, for example, a depth map, and/or video data.
  • depth data such as, for example, a depth map, and/or video data.
  • the inventors have determined that using the same mode as a spatial neighbor can be beneficial, for example, when the picture has large flat areas. Large flat areas can occur in depth maps, for example, in the background, particularly if the depth values are quantized into a common quantization bin. Large flat areas can be common in video, for example, for high resolution in which a given object, for example, occupies a large number of pixels.
  • the inventors have further determined that using spatially neighboring macroblocks to determine a mode, and to determine a prediction can often be more useful than basing a prediction from temporally collocated macroblocks from prior depth maps in a sequence.
  • This advantage arises when, for example, depth maps exhibit flicker between pictures (depth maps) due to, for example, poor depth measurements. In such cases, using temporally collocated macroblocks to form a prediction may result in a large residue.
  • any video coding method and/or standard can be used to encode the depth map.
  • depth maps typically include large flat areas due to uniform depth for objects and clear edges due to depth discontinuities.
  • edges in a corresponding depth map should be well preserved while maintaining good compression efficiency.
  • a depth map is typically sent to the decoder so as to facilitate depth image based rendering (DIBR) and/or to allow synthesizing additional views used as references in encoding.
  • DIBR depth image based rendering
  • Various implementations described in this application can be used for depth information, which generally includes information about the depth of all or part of a video picture regardless of the manner in which that depth information is presented or organized.
  • depth information is a depth map, which generally provides information about the depth of an entire video picture.
  • Implementations also can use different frameworks to handle depth signal for compression purpose, including in-band /out-band transmission.
  • depth data can also be dealt with as a new video data component.
  • a piecewise constant function and piecewise linear functions can also be used to model depth images.
  • Implementations also can jointly code multiple depth maps, and/or jointly code depth information with video information. The latter being based, at least in part, on the observation that a depth map is normally provided along with the color video, and that one might reuse the motion information from the corresponding color video because the depth sequence may share the same temporal motion.
  • Implementations can use the same motion vector and coding mode from the color/texture video, or produce a set of candidate modes and motion vectors (MVs) and perform a selection from the set based on rate distortion optimization.
  • MVs candidate modes and motion vectors
  • depth information is a general term referring to various kinds of information about depth.
  • depth map generally refers to a per-pixel depth image.
  • Other types of depth information include, for example, using a single depth value for each coded block rather than for each coded pixel.
  • FIG. 1 shows an exemplary video encoder 100 to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the video encoder 100 includes a frame ordering buffer 110 having an output in signal communication with a non-inverting input of a combiner 185.
  • An output of the combiner 185 is connected in signal communication with a first input of a transformer and quantizer 125.
  • An output of the transformer and quantizer 125 is connected in signal communication with a first input of an entropy coder 145 and a first input of an inverse transformer and inverse quantizer 150.
  • An output of the entropy coder 145 is connected in signal communication with a first non-inverting input of a combiner 190.
  • An output of the combiner 190 is connected in signal communication with a first input of an output buffer 135.
  • a first output of an encoder controller 105 is connected in signal communication with a second input of the frame ordering buffer 110, a second input of the inverse transformer and inverse quantizer 150, an input of a picture-type decision module 115, a first input of a macroblock-type (MB-type) decision module 120, a second input of an intra prediction module 160, a second input of a deblocking filter 165, a first input of a motion compensator 170, a first input of a motion estimator 175, and a second input of a reference picture buffer 180.
  • MB-type macroblock-type
  • a second output of the encoder controller 105 is connected in signal communication with a first input of a Supplemental Enhancement Information (SEI) inserter 130, a second input of the transformer and quantizer 125, a second input of the entropy coder 145, a second input of the output buffer 135, and an input of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140.
  • SEI Supplemental Enhancement Information
  • An output of the SEI inserter 130 is connected in signal communication with a second non-inverting input of the combiner 190.
  • a first output of the picture-type decision module 115 is connected in signal communication with a third input of the frame ordering buffer 110.
  • a second output of the picture-type decision module 115 is connected in signal communication with a second input of a macroblock-type decision module 120.
  • PPS PPS inserter 140 is connected in signal communication with a third non-inverting input of the combiner 190.
  • An output of the inverse quantizer and inverse transformer 150 is connected in signal communication with a first non-inverting input of a combiner 119.
  • An output of the combiner 119 is connected in signal communication with a first input of the intra prediction module 160 and a first input of the deblocking filter 165.
  • An output of the deblocking filter 165 is connected in signal communication with a first input of a reference picture buffer 180.
  • An output of the reference picture buffer 180 is connected in signal communication with a second input of the motion estimator 175 and a third input of the motion compensator 170.
  • a first output of the motion estimator 175 is connected in signal communication with a second input of the motion compensator 170.
  • a second output of the motion estimator 175 is connected in signal communication with a third input of the entropy coder 145.
  • An output of the motion compensator 170 is connected in signal communication with a first input of a switch 197.
  • An output of the intra prediction module 160 is connected in signal communication with a second input of the switch 197.
  • An output of the macroblock-type decision module 120 is connected in signal communication with a third input of the switch 197.
  • the third input of the switch 197 determines whether or not the "data" input of the switch (as compared to the control input, i.e., the third input) is to be provided by the motion compensator 170 or the intra prediction module 160.
  • the output of the switch 197 is connected in signal communication with a second non-inverting input of the combiner 119 and an inverting input of the combiner 185.
  • a first input of the frame ordering buffer 110 and an input of the encoder controller 105 are available as inputs of the encoder 100, for receiving an input picture.
  • a second input of the Supplemental Enhancement Information (SEI) inserter 130 is available as an input of the encoder 100, for receiving metadata.
  • An output of the output buffer 135 is available as an output of the encoder 100, for outputting a bitstream.
  • SEI Supplemental Enhancement Information
  • FIG. 2 shows an exemplary decoder 200 to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the video decoder 200 includes an input buffer 210 having an output connected in signal communication with a first input of the entropy decoder 245.
  • a first output of the entropy decoder 245 is connected in signal communication with a first input of an inverse transformer and inverse quantizer 250.
  • An output of the inverse transformer and inverse quantizer 250 is connected in signal communication with a second non-inverting input of a combiner 225.
  • An output of the combiner 225 is connected in signal communication with a second input of a deblocking filter 265 and a first input of an intra prediction module 260.
  • a second output of the deblocking filter 265 is connected in signal communication with a first input of a reference picture buffer 280.
  • An output of the reference picture buffer 280 is connected in signal communication with a second input of a motion compensator 270.
  • a second output of the entropy decoder 245 is connected in signal communication with a third input of the motion compensator 270, a first input of the deblocking filter 265, and a third input of the intra predictor 260.
  • a third output of the entropy decoder 245 is connected in signal communication with an input of a decoder controller 205.
  • a first output of the decoder controller 205 is connected in signal communication with a second input of the entropy decoder 245.
  • a second output of the decoder controller 205 is connected in signal communication with a second input of the inverse transformer and inverse quantizer 250.
  • a third output of the decoder controller 205 is connected in signal communication with a third input of the deblocking filter 265.
  • a fourth output of the decoder controller 205 is connected in signal communication with a second input of the intra prediction module 260, a first input of the motion compensator 270, and a second input of the reference picture buffer 280.
  • An output of the motion compensator 270 is connected in signal communication with a first input of a switch 297.
  • An output of the intra prediction module 260 is connected in signal communication with a second input of the switch 297.
  • An output of the switch 297 is connected in signal communication with a first non-inverting input of the combiner 225.
  • An input of the input buffer 210 is available as an input of the decoder 200, for receiving an input bitstream.
  • a first output of the deblocking filter 265 is available as an output of the decoder 200, for outputting an output picture.
  • FIG. 3 shows an exemplary video transmission system 300 to which the present principles may be applied, in accordance with an implementation of the present principles.
  • the video transmission system 300 may be, for example, a head-end or transmission system for transmitting a signal using any of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast.
  • the transmission may be provided over the Internet or some other network.
  • the video transmission system 300 is capable of generating and delivering video content encoded using inter-view skip mode with depth. This is achieved by generating an encoded signal(s) including depth information or information capable of being used to synthesize the depth information at a receiver end that may, for example, have a decoder.
  • the video transmission system 300 includes an encoder 310 and a transmitter 320 capable of transmitting the encoded signal.
  • the encoder 310 receives video information and generates an encoded signal(s) there from using inter-view skip mode with depth.
  • the encoder 310 may be, for example, the encoder 300 described in detail above.
  • the encoder 310 may include sub-modules, including for example an assembly unit for receiving and assembling various pieces of information into a structured format for storage or transmission.
  • the various pieces of information may include, for example, coded or uncoded video, coded or uncoded depth information, and coded or uncoded elements such as, for example, motion vectors, coding mode indicators, and syntax elements.
  • the transmitter 320 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers.
  • the transmitter may include, or interface with, an antenna (not shown). Accordingly, implementations of the transmitter 320 may include, or be limited to, a modulator.
  • FIG. 4 shows an exemplary video receiving system 400 to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the video receiving system 400 may be configured to receive signals over a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast.
  • the signals may be received over the Internet or some other network.
  • the video receiving system 400 may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage.
  • the video receiving system 400 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.
  • the video receiving system 400 is capable of receiving and processing video content including video information.
  • the video receiving system 400 includes a receiver 410 capable of receiving an encoded signal, such as for example the signals described in the implementations of this application, and a decoder 420 capable of decoding the received signal.
  • the receiver 410 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal.
  • the receiver 410 may include, or interface with, an antenna (not shown). Implementations of the receiver 410 may include, or be limited to, a demodulator.
  • the decoder 420 outputs video signals including video information and depth information.
  • the decoder 420 may be, for example, the decoder 400 described in detail above.
  • FIG. 5 shows an exemplary video processing device 500 to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the video processing device 500 may be, for example, a set top box or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage.
  • the video processing device 500 may provide its output to a television, computer monitor, or a computer or other processing device.
  • the video processing device 500 includes a front-end (FE) device 505 and a decoder 510.
  • the front-end device 505 may be, for example, a receiver adapted to receive a program signal having a plurality of bitstreams representing encoded pictures, and to select one or more bitstreams for decoding from the plurality of bitstreams. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal, decoding one or more encodings (for example, channel coding and/or source coding) of the data signal, and/or error-correcting the data signal.
  • the front-end device 505 may receive the program signal from, for example, an antenna (not shown). The front-end device 505 provides a received data signal to the decoder 510.
  • the decoder 510 receives a data signal 520.
  • the data signal 520 may include, for example, one or more Advanced Video Coding (AVC), Scalable Video Coding (SVC), or Multi-view Video Coding (MVC) compatible streams.
  • AVC refers more specifically to the existing International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the "H.264/MPEG-4 AVC Standard” or variations thereof, such as the "AVC standard” or simply "AVC”).
  • MVC refers more specifically to a multi-view video coding ("MVC") extension (Annex H) of the AVC standard, referred to as H.264/MPEG-4 AVC, MVC extension (the "MVC extension” or simply “MVC”).
  • MVC refers more specifically to a scalable video coding (“SVC”) extension
  • the decoder 510 decodes all or part of the received signal 520 and provides as output a decoded video signal 530.
  • the decoded video 530 is provided to a selector 550.
  • the device 500 also includes a user interface 560 that receives a user input 570.
  • the user interface 560 provides a picture selection signal 580, based on the user input 570, to the selector 550.
  • the picture selection signal 580 and the user input 570 indicate which of multiple pictures, sequences, scalable versions, views, or other selections of the available decoded data a user desires to have displayed.
  • the selector 550 provides the selected picture(s) as an output 590.
  • the selector 550 uses the picture selection information 580 to select which of the pictures in the decoded video 530 to provide as the output 590.
  • the selector 550 includes the user interface 560, and in other implementations no user interface 560 is provided because the selector 550 receives the user input 570 directly without a separate interface function being performed.
  • the selector 550 may be implemented in software or as an integrated circuit, for example.
  • the selector 550 is incorporated with the decoder 510, and in another implementation, the decoder 510, the selector 550, and the user interface 560 are all integrated.
  • front-end 505 receives a broadcast of various television shows and selects one for processing. The selection of one show is based on user input of a desired channel to watch.
  • front-end device 505 receives the user input 570.
  • the front-end 505 receives the broadcast and processes the desired show by demodulating the I l relevant part of the broadcast spectrum, and decoding any outer encoding of the demodulated show.
  • the front-end 505 provides the decoded show to the decoder 510.
  • the decoder 510 is an integrated unit that includes devices 560 and 550.
  • the decoder 510 thus receives the user input, which is a user-supplied indication of a desired view to watch in the show.
  • the decoder 510 decodes the selected view, as well as any required reference pictures from other views, and provides the decoded view 590 for display on a television (not shown).
  • the user may desire to switch the view that is displayed and may then provide a new input to the decoder 510.
  • the decoder 510 decodes both the old view and the new view, as well as any views that are in between the old view and the new view. That is, the decoder 510 decodes any views that are taken from cameras that are physically located in between the camera taking the old view and the camera taking the new view.
  • the front-end device 505 also receives the information identifying the old view, the new view, and the views in between. Such information may be provided, for example, by a controller (not shown in Figure 5) having information about the locations of the views, or the decoder 510.
  • Other implementations may use a front-end device that has a controller integrated with the front-end device.
  • the decoder 510 provides all of these decoded views as output 590.
  • a post-processor (not shown in Figure 5) interpolates between the views to provide a smooth transition from the old view to the new view, and displays this transition to the user. After transitioning to the new view, the post-processor informs (through one or more communication links not shown) the decoder 510 and the front-end device 505 that only the new view is needed. Thereafter, the decoder 510 only provides as output 590 the new view.
  • the system 500 may be used to receive multiple views of a sequence of images, and to present a single view for display, and to switch between the various views in a smooth manner.
  • the smooth manner may involve interpolating between views to move to another view.
  • the system 500 may allow a user to rotate an object or scene, or otherwise to see a three-dimensional representation of an object or a scene.
  • the rotation of the object for example, may correspond to moving from view to view, and interpolating between the views to obtain a smooth transition between the views or simply to obtain a three-dimensional representation. That is, the user may "select" an interpolated view as the "view" that is to be displayed.
  • 3D Video is a new framework that includes a coded representation for multiple view video and depth information and targets the generation of high-quality 3D rendering at the receiver. This enables 3D visual experiences with auto-multiscopic displays.
  • Figure 6 shows an exemplary system 600 for transmitting and receiving multi-view video with depth information, to which the present principles may be applied, according to an embodiment of the present principles.
  • video data is indicated by a solid line
  • depth data is indicated by a dashed line
  • meta data is indicated by a dotted line.
  • the system 600 may be, for example, but is not limited to, a free-viewpoint television system.
  • the system 600 includes a three-dimensional (3D) content producer 620, having a plurality of inputs for receiving one or more of video, depth, and meta data from a respective plurality of sources.
  • 3D three-dimensional
  • Such sources may include, but are not limited to, a stereo camera 611 , a depth camera 612, a multi-camera setup 613, and 2-dimensional/3-dimensional (2D/3D) conversion processes 614.
  • One or more networks 630 may be used for transmit one or more of video, depth, and meta data relating to multi-view video coding (MVC) and digital video broadcasting (DVB).
  • MVC multi-view video coding
  • DVD digital video broadcasting
  • a depth image-based renderer 650 performs depth image-based rendering to project the signal to various types of displays.
  • the depth image-based renderer 650 is capable of receiving display configuration information and user preferences.
  • An output of the depth image-based renderer 650 may be provided to one or more of a 2D display 661 , an M-view 3D display 662, and/or a head-tracked stereo display 663.
  • a first embodiment, and variations of this embodiment will now be discussed.
  • intra skip The Inter skip mode in H.264/AVC is well known. Currently this mode is only supported in inter slices.
  • CAVLC context adaptive variable length coding
  • CABAC context adaptive binary arithmetic coding
  • Figure 7 shows an exemplary method 700 for intra slice encoding with Intra skip mode, in accordance with an embodiment of the present principles.
  • the picture to be encoded is read.
  • the encoding of a current macroblock is initialized.
  • the best rate distortion (RD) cost for INTRA16x16 is calculated.
  • the best RD cost for INTRA ⁇ x ⁇ is calculated.
  • the best RD cost for INTRA4x4 is calculated.
  • the RD cost for PCM is calculated.
  • the RD cost for Intra skip is calculated.
  • the best Intra mode is selected from among 16x16, 8x8, 4x4, PCM, and Intra skip.
  • the current macroblock (MB) is encoded with the best Intra mode.
  • the encoded picture is written into a bitstream, and the bitstream is sent over a network(s).
  • step 740 may involve, for example, forming a residue between the current block and the prediction for the current block, and encoding the residue.
  • FIG. 8 shows an exemplary method 800 for intra slice decoding with Intra skip mode, in accordance with an embodiment of the present principles.
  • a bitstream for a slice is read or received.
  • mb_skip_run is parsed.
  • decoding of the Intra MB is continued as in the AVC Standard. i UWoU I J I
  • step 820 it is determined whether or not there are any more macroblocks in this (the current) slice. If so, then control is returned to step 805. Otherwise, the method is terminated.
  • step 825 it is determined whether or not mb_skip_run is greater than zero. If so, then control is passed to a step 830. Otherwise, control is passed to the step 820.
  • step 830 it is determined whether or not a left macroblock exists (in the current slice, with respect to the current macroblock). If so, then control is passed to a step 835. Otherwise, control is passed to a step 845.
  • the Intra prediction mode from the left macroblock is copied.
  • the Intra MB is decoded using the derived Intra prediction modes.
  • a value of the syntax element mb_skip_run is decremented.
  • step 845 it is determined whether or not an above macroblock exists (in the current slice, with respect to the current macroblock). If so, then control is passed to a step 850. Otherwise,. control is passed to a step 855. At step 850, the Intra Prediction mode from the above macroblock is copied.
  • bitstream is indicated as being invalid.
  • Figure 9 shows an exemplary method for calculating rate distortion (RD) with respect to steps 710, 715, 720, 725, and 730 of method 700 of Figure 7.
  • RD rate distortion
  • the Intra prediction mode from the left macroblock is copied.
  • the prediction block is found using the derived prediction mode.
  • the distortion is calculated using the prediction block as a reconstruction.
  • the RD cost is calculated using the distortion and zero bitrate.
  • step 930 it is determined whether or not an above macroblock exists (in the current slice, with respect to the current macroblock). If so, then control is passed to a step 935. Otherwise, control is passed to a step 940.
  • the Intra prediction mode from the above macroblock is copied.
  • the rate distortion (RD) is set to be infinite.
  • Intra skip mode is added as another mode in the intra slices. Intra skip mode is in addition to the existing intra prediction modes: 16x16; 8x8; and 4x4. As shown in Figure 7, in a typical exhaustive search encoder, the Intra skip mode is added as a new module to calculate the rate distortion cost. Eventually the mode with the lowest RD cost is selected as the coding mode.
  • Figure 9 shows one embodiment of how the encoder calculates the prediction block for the Intra skip mode and the corresponding RD cost. This can be described as follows:
  • Step 1 Check if left macroblock exists
  • Step 2 If left macroblock exists, copy the macroblock type and intra prediction modes to the current macroblock and go to Step 5, else go to Step 3.
  • Step 3. Check if above macroblock exists Step 4. If above macroblock exists, copy the macroblock type and intra prediction modes to the current macroblock and go to Step 5, else go to
  • Step 7. Step 5. Obtain the prediction samples using the macroblock type and intra prediction modes that were copied. Step 6. Calculate the RD cost of coding using Intra skip mode and go to Step 8
  • Step 7 Set the RD cost of a very large value.
  • Step 8 Return RD cost.
  • the first change, for enabling Intra skip mode at the decoder in an Intra slice, is to remove the Intra slice detection condition in slice_data() as shown in Table 1.
  • Table 1 shows syntax changes for slice_data(), in accordance with an embodiment of the present principles.
  • the syntax that is deleted is shown with strikethrough, italics, and larger font. That is, Intra slices can have skip MB as Inter slices.
  • the value of mb_skip_run indicates how many macroblocks are to be skipped for coding under CAVLC mode. Under CABAC mode, mb_skip_flag is set to 1 when the current macroblock is skipped for coding. Otherwise, mb_skip_flag is set to 0.
  • CABAC context table To encode mb_skip_flag efficiently under CABAC mode, a new CABAC context table should be designed carefully. Herein, we do not specify any particular CABAC table. However, it is to be appreciated that, given the teachings of the present principles provided herein, one of ordinary skill in this and related arts will readily contemplate CABAC context tables to which the present principles may be applied, while maintaining the spirit of the present principles.
  • the decoding process for an Intra slice is proposed to be modified as follows. If a macroblock is detected to be an Intra skip MB, then the decoder first determines which neighbor macroblock is to be used to derive the intra prediction mode. In one example, the macroblock on the left is always selected if it is available. Otherwise, the above macroblock is selected. If neither the left macroblock nor the above macroblock is available, then the bitstream is claimed invalid. After the deriving macroblock is determined, then the macroblock type and intra prediction mode is copied from this macroblock. The decoder can set the prediction block using this derived prediction mode. Then the prediction block is assumed to be the reconstruction of the skipped macroblock.
  • the decoding procedure is depicted in Figure 8. In one implementation relating to, for example, a joint scalable video model
  • JSVM JSVM decoder
  • the decoder first parses the syntax elements of all macroblocks within the slice, then decodes the macroblocks. When the decoder parses the bitstream, the macroblocks with Intra skip mode are marked. For INTRA16x16, the intra prediction mode is signaled by the mb_type syntax. Thus, if the neighboring macroblock is in
  • the intra prediction mode of the current (Intra skip) macroblock can be derived during the parsing process.
  • the neighbor macroblock is in INTRA4x4 or INTRA ⁇ x ⁇ mode, then the real intra prediction mode is unknown during the parsing process. Instead, the decoder waits until the decoding process to copy the intra prediction mode to the current (Intra skip) macroblock.
  • Figure 10 shows an exemplary depth map 1000 to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the exemplary depth map 200 relates to a scene of "breakdancers". Simulation results show a 0.497 dB improvement for Breakdancers depth.
  • the second embodiment, and variations, may generally be referred to as Extended P skip mode and B skip mode.
  • the Intra skip mode works only for I (Intra) slices.
  • the derivation of mb_type of a skipped macroblock in a P /B slice is modified.
  • all neighboring (e.g., any combination of left, above, above right, and above left) macroblocks are coded in Intra mode, then the current skipped macroblock is derived as an Intra macroblock, instead of an Inter macroblock. All the decoding procedure defined for embodiment 1 can be reproduced for P slices in this embodiment.
  • lntra_skip_flag which indicates whether the skipped macroblock is an Intra skip or Inter skip.
  • Table 2 shows syntax changes in slice_data(), in accordance with an embodiment of the present principles. If intra_skip_flag is set to 1, then the syntax mb_skip_run indicates the run length of Intra skip MBs under CAVLC and mb_skip_flag indicates that the current macroblock is an Intra skip macroblock under CABAC.
  • intra_skip_flag is set to 0
  • the syntax mb_skip_run indicates the run length of Inter skip macroblocks under CAVLC and the syntax mb_skip_flag indicates that the current macroblock is an Intra skip macroblock under CABAC.
  • intra_skip_flag The condition for the presence of intra_skip_flag is explained here. If the slice type is Intra, the flag need not be presented. If the previous macroblock is signaled as skip even if it is an Inter slice, then the flag need not be presented either.
  • Figure 11 shows an exemplary method for encoding a P or B slice, in accordance with an embodiment of the present principles
  • an encoding off the current macroblock (MB) is initialized.
  • the best rate distortion (RD) cost for INTRA16x16 is calculated
  • the best RD cost for INTRA ⁇ x ⁇ is calculated.
  • the best RD cost for INTRA4x4 is calculated
  • the RD cost for PCM is calculated.
  • the RD cost for Intra skip is calculated.
  • the best Intra mode is selected from among 16x16, 8x8, 4x4, PCM, and Intra skip.
  • the best RD cost is calculated for Inter modes.
  • step 1 145 it is determined whether or not the best Inter mode is better than the nest Intra mode. If so, then control is passed to a step 1150. Otherwise, control is passed to a step 1160. At step 1150, the current macroblock is encoded with the best Inter mode.
  • step 1155 it is determined whether or not there are any more macroblocks (to be encoded). If so, then control is returned to step 1 105. Otherwise, the method is terminated.
  • FIG. 12 shows an exemplary method for decoding a P or B slice, in accordance with an embodiment of the present principles.
  • a bitstream for a slice is read or received.
  • mb_skip_run is parsed.
  • intra_skip_flag is derived/parsed.
  • step 1220 decoding of the Intra MB is continued as in the AVC Standard.
  • step 1225 it is determined whether or not there are any more macroblocks in this (the current) slice. If so, then control is returned to step 1205. Otherwise, the method is terminated.
  • step 1230 it is determined whether or not mb_skip_run > zero. If so, then control is passed to a step 1235. Otherwise, control is passed to step 1225.
  • step 1235 it is determined whether or not lntra_skip_flag is equal to zero. If so, then control is passed to a step 1245. Otherwise, control is passed to a step 1240.
  • the macroblock is decoded using the derived Intra prediction modes.
  • the macroblock is decoded using the derived Inter prediction modes.
  • a value of the syntax element mb_skip_run is decremented.
  • such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C).
  • This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
  • Implementations may signal information using a variety of techniques including, but not limited to, in-band information, out-of-band information, datastream data, implicit signaling, and explicit signaling.
  • In-band information and explicit signaling may include, for various implementations and/or standards, slice headers, SEI messages, other high level syntax, and non-high-level syntax. Accordingly, although implementations described herein may be described in a particular context, such descriptions should in no way be taken as limiting the features and concepts to such implementations or contexts.
  • implementations and features described herein may be used in the context of the MPEG-4 AVC Standard, or the MPEG-4 AVC Standard with the MVC extension, or the MPEG-4 AVC Standard with the SVC extension. However, these implementations and features may be used in the context of another standard and/or recommendation (existing or future), or in a context that does not involve a standard and/or recommendation.
  • implementations may borrow various pieces of information from a neighboring portion.
  • the information may include, for example, one or more pixel values, a motion vector, and/or a coding mode.
  • the neighbor may be, for example, all or part of (i) a temporally neighboring picture, (ii) a spatially neighboring portion, and (iii) a neighboring view from the same or a different instant in time.
  • the portion may be, for example, (i) a full picture, such as, for example, a full frame or field, (ii) a slice, (iii) a macroblock, and (iv) a partition or other sub-block portion.
  • various modes can be used. For example, if the coding mode borrowed from a spatially neighboring macroblock indicates that pixels are to be copied the macroblock that is immediately above the macroblock being coded, then the prediction is formed by copying (in columns) the pixels in the row that is immediately above the macroblock being coded.
  • Various modes may be used (and therefore borrowed), such as, for example, horizontal copy, vertical copy, DC coefficient copy, and other known modes.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal.
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device.
  • processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding.
  • equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices.
  • the equipment may be mobile and even installed in a mobile vehicle.
  • the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory ("RAM"), or a read-only memory (“ROM").
  • the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two.
  • a processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process.
  • a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
  • implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment.
  • Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries may be, for example, analog or digital information.
  • the signal may be transmitted over a variety of different wired or wireless links, as is known.
  • the signal may be stored on a processor-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Various implementations are described. Several implementations relate to depth coding. According to a general aspect, prediction mode information is accessed for reconstructing a target block of data from a picture. The prediction mode information indicates that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block. The neighbor block is a spatial neighbor of the target block in the picture. The neighbor-block prediction mode of the neighbor block is determined. The target-block prediction mode of the target block is determined based on the neighbor-block prediction mode. The target block is reconstructed by predicting the target block using the determined target-block prediction mode.

Description

DEPTH CODING
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application Serial No. 61/189,585, filed on August 20, 2008, titled "Depth Coding", the contents of which are hereby incorporated by reference in their entirety for all purposes.
TECHNICAL FIELD
Implementations are described that relate to coding systems. Various particular implementations relate to depth coding
BACKGROUND
It has been well recognized that 3D Video (3DV) is a key technology that serves a wide variety of applications, including home entertainment and surveillance. In addition to view data, depth data is typically associated with each view. Depth data is typically essential for view synthesis when using the technique of depth image based rendering (DIBR). Even a small improvement in the quality of a depth map may result in a much better synthesized view.
In such 3DV applications, the amount of video and depth data involved can be enormous. Thus, it is desirable to have new coding tools that help to improve the compression efficiency of not only the current video signal but also depth map sequences.
SUMMARY According to a general aspect, a neighbor-block prediction mode is accessed for a neighbor block in a picture. The neighbor block is a spatial neighbor of a target block in the picture. The neighbor-block prediction mode indicates how the neighbor-block has been predicted in an encoding process of the neighbor block. A target-block prediction mode is determined for the target block based on the neighbor-block prediction mode. The target-block prediction mode indicates how the target-block is to be predicted in an encoding process of the target block. The target block is predicted using the target-block prediction mode, to produce a target-block prediction. The target-block prediction is evaluated. It is determined whether to use the target-block prediction, as a prediction for the target block, based on a result of the evaluating. Prediction mode information is encoded for the target block. The prediction mode information indicates that the target block of data was predicted using a prediction mode that was determined based on the neighbor-block prediction mode. According to another general aspect, prediction mode information is accessed for reconstructing a target block of data from a picture. The prediction mode information indicates that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block. The neighbor block is a spatial neighbor of the target block in the picture. The neighbor-block prediction mode of the neighbor block is determined. The target-block prediction mode of the target block is determined based on the neighbor-block prediction mode. The target block is reconstructed by predicting the target block using the determined target-block prediction mode.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular manner, it should be clear that implementations may be configured or embodied in various manners. For example, an implementation may be performed as a method, or embodied as apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a diagram of an implementation of an encoder. Figure 2 is a diagram of an implementation of a decoder.
Figure 3 is a diagram of an implementation of a video transmission system.
Figure 4 is a diagram of an implementation of a video receiving system.
Figure 5 is a diagram of an implementation of a video processing device.
Figure 6 is a diagram of an implementation of a system for transmitting and receiving multi-view video with depth information.
Figure 7 is a diagram of an implementation of an encoding process.
Figure 8 is a diagram of an implementation of a decoding process.
Figure 9 is a diagram of an implementation of a rate distortion calculation process. Figure 10 is a diagram showing an implementation of a depth map. Figure 11 is a diagram of an implementation of an encoding process. Figure 12 is a diagram of an implementation of a decoding process.
DETAILED DESCRIPTION
In at least one implementation, we propose a coding framework for a macroblock of a picture (for example, a frame or a field) that borrows the coding mode of a spatially neighboring macroblock. No residue is stored or transmitted. Accordingly, a single piece of information is stored or transmitted (and possibly encoded) to indicate the mode, and therefore prediction, for the whole macroblock. Various implementations evaluate the borrowed coding mode before using the borrowed coding mode to encode the macroblock. One implementation uses the borrowed coding mode to generate a prediction (for example, the borrowed coding mode may instruct that the pixels are to be copied from the left border). If the prediction is good enough, then the borrowed coding mode is used, again without any residue. Other implementations also store or transmit the residue. Various implementations apply this framework to l-pictures, as well as P or B pictures. Various implementations apply this framework to depth data, such as, for example, a depth map, and/or video data. The inventors have determined that using the same mode as a spatial neighbor can be beneficial, for example, when the picture has large flat areas. Large flat areas can occur in depth maps, for example, in the background, particularly if the depth values are quantized into a common quantization bin. Large flat areas can be common in video, for example, for high resolution in which a given object, for example, occupies a large number of pixels. The inventors have further determined that using spatially neighboring macroblocks to determine a mode, and to determine a prediction can often be more useful than basing a prediction from temporally collocated macroblocks from prior depth maps in a sequence. This advantage arises when, for example, depth maps exhibit flicker between pictures (depth maps) due to, for example, poor depth measurements. In such cases, using temporally collocated macroblocks to form a prediction may result in a large residue.
In general, any video coding method and/or standard can be used to encode the depth map. Compared to a normal video signal, depth maps typically include large flat areas due to uniform depth for objects and clear edges due to depth discontinuities. In order to maintain a high quality for synthesized views, it is generally desired that the edges in a corresponding depth map should be well preserved while maintaining good compression efficiency.
The inventors have noted that in 3DV applications, a depth map is typically sent to the decoder so as to facilitate depth image based rendering (DIBR) and/or to allow synthesizing additional views used as references in encoding. Various implementations described in this application can be used for depth information, which generally includes information about the depth of all or part of a video picture regardless of the manner in which that depth information is presented or organized. One example of depth information is a depth map, which generally provides information about the depth of an entire video picture.
Implementations also can use different frameworks to handle depth signal for compression purpose, including in-band /out-band transmission. Optionally, depth data can also be dealt with as a new video data component. A piecewise constant function and piecewise linear functions can also be used to model depth images. Implementations also can jointly code multiple depth maps, and/or jointly code depth information with video information. The latter being based, at least in part, on the observation that a depth map is normally provided along with the color video, and that one might reuse the motion information from the corresponding color video because the depth sequence may share the same temporal motion. Implementations can use the same motion vector and coding mode from the color/texture video, or produce a set of candidate modes and motion vectors (MVs) and perform a selection from the set based on rate distortion optimization.
It is to be appreciated that while one or more embodiments are described herein with respect to the AVC Standard, the present principles are not limited solely to the same and, thus, given the teachings of the present principles provided herein, may be readily applied to multi-view video coding (MVC), current and future 3DV Standards, as well as other video coding standards, specifications, and/or recommendations, while maintaining the spirit of the present principles. As mentioned above, "depth information" is a general term referring to various kinds of information about depth. One type of depth information is a "depth map", which generally refers to a per-pixel depth image. Other types of depth information include, for example, using a single depth value for each coded block rather than for each coded pixel. r UUoU l J I
Figure 1 shows an exemplary video encoder 100 to which the present principles may be applied, in accordance with an embodiment of the present principles. The video encoder 100 includes a frame ordering buffer 110 having an output in signal communication with a non-inverting input of a combiner 185. An output of the combiner 185 is connected in signal communication with a first input of a transformer and quantizer 125. An output of the transformer and quantizer 125 is connected in signal communication with a first input of an entropy coder 145 and a first input of an inverse transformer and inverse quantizer 150. An output of the entropy coder 145 is connected in signal communication with a first non-inverting input of a combiner 190. An output of the combiner 190 is connected in signal communication with a first input of an output buffer 135.
A first output of an encoder controller 105 is connected in signal communication with a second input of the frame ordering buffer 110, a second input of the inverse transformer and inverse quantizer 150, an input of a picture-type decision module 115, a first input of a macroblock-type (MB-type) decision module 120, a second input of an intra prediction module 160, a second input of a deblocking filter 165, a first input of a motion compensator 170, a first input of a motion estimator 175, and a second input of a reference picture buffer 180.
A second output of the encoder controller 105 is connected in signal communication with a first input of a Supplemental Enhancement Information (SEI) inserter 130, a second input of the transformer and quantizer 125, a second input of the entropy coder 145, a second input of the output buffer 135, and an input of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140.
An output of the SEI inserter 130 is connected in signal communication with a second non-inverting input of the combiner 190.
A first output of the picture-type decision module 115 is connected in signal communication with a third input of the frame ordering buffer 110. A second output of the picture-type decision module 115 is connected in signal communication with a second input of a macroblock-type decision module 120. An output of the Sequence Parameter Set (SPS) and Picture Parameter Set
(PPS) inserter 140 is connected in signal communication with a third non-inverting input of the combiner 190.
An output of the inverse quantizer and inverse transformer 150 is connected in signal communication with a first non-inverting input of a combiner 119. An output of the combiner 119 is connected in signal communication with a first input of the intra prediction module 160 and a first input of the deblocking filter 165. An output of the deblocking filter 165 is connected in signal communication with a first input of a reference picture buffer 180. An output of the reference picture buffer 180 is connected in signal communication with a second input of the motion estimator 175 and a third input of the motion compensator 170. A first output of the motion estimator 175 is connected in signal communication with a second input of the motion compensator 170. A second output of the motion estimator 175 is connected in signal communication with a third input of the entropy coder 145. An output of the motion compensator 170 is connected in signal communication with a first input of a switch 197. An output of the intra prediction module 160 is connected in signal communication with a second input of the switch 197. An output of the macroblock-type decision module 120 is connected in signal communication with a third input of the switch 197. The third input of the switch 197 determines whether or not the "data" input of the switch (as compared to the control input, i.e., the third input) is to be provided by the motion compensator 170 or the intra prediction module 160. The output of the switch 197 is connected in signal communication with a second non-inverting input of the combiner 119 and an inverting input of the combiner 185. A first input of the frame ordering buffer 110 and an input of the encoder controller 105 are available as inputs of the encoder 100, for receiving an input picture. Moreover, a second input of the Supplemental Enhancement Information (SEI) inserter 130 is available as an input of the encoder 100, for receiving metadata. An output of the output buffer 135 is available as an output of the encoder 100, for outputting a bitstream.
FIG. 2 shows an exemplary decoder 200 to which the present principles may be applied, in accordance with an embodiment of the present principles. The video decoder 200 includes an input buffer 210 having an output connected in signal communication with a first input of the entropy decoder 245. A first output of the entropy decoder 245 is connected in signal communication with a first input of an inverse transformer and inverse quantizer 250. An output of the inverse transformer and inverse quantizer 250 is connected in signal communication with a second non-inverting input of a combiner 225. An output of the combiner 225 is connected in signal communication with a second input of a deblocking filter 265 and a first input of an intra prediction module 260. A second output of the deblocking filter 265 is connected in signal communication with a first input of a reference picture buffer 280. An output of the reference picture buffer 280 is connected in signal communication with a second input of a motion compensator 270. A second output of the entropy decoder 245 is connected in signal communication with a third input of the motion compensator 270, a first input of the deblocking filter 265, and a third input of the intra predictor 260. A third output of the entropy decoder 245 is connected in signal communication with an input of a decoder controller 205. A first output of the decoder controller 205 is connected in signal communication with a second input of the entropy decoder 245. A second output of the decoder controller 205 is connected in signal communication with a second input of the inverse transformer and inverse quantizer 250. A third output of the decoder controller 205 is connected in signal communication with a third input of the deblocking filter 265. A fourth output of the decoder controller 205 is connected in signal communication with a second input of the intra prediction module 260, a first input of the motion compensator 270, and a second input of the reference picture buffer 280.
An output of the motion compensator 270 is connected in signal communication with a first input of a switch 297. An output of the intra prediction module 260 is connected in signal communication with a second input of the switch 297. An output of the switch 297 is connected in signal communication with a first non-inverting input of the combiner 225.
An input of the input buffer 210 is available as an input of the decoder 200, for receiving an input bitstream. A first output of the deblocking filter 265 is available as an output of the decoder 200, for outputting an output picture.
Figure 3 shows an exemplary video transmission system 300 to which the present principles may be applied, in accordance with an implementation of the present principles. The video transmission system 300 may be, for example, a head-end or transmission system for transmitting a signal using any of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The transmission may be provided over the Internet or some other network.
The video transmission system 300 is capable of generating and delivering video content encoded using inter-view skip mode with depth. This is achieved by generating an encoded signal(s) including depth information or information capable of being used to synthesize the depth information at a receiver end that may, for example, have a decoder.
The video transmission system 300 includes an encoder 310 and a transmitter 320 capable of transmitting the encoded signal. The encoder 310 receives video information and generates an encoded signal(s) there from using inter-view skip mode with depth. The encoder 310 may be, for example, the encoder 300 described in detail above. The encoder 310 may include sub-modules, including for example an assembly unit for receiving and assembling various pieces of information into a structured format for storage or transmission. The various pieces of information may include, for example, coded or uncoded video, coded or uncoded depth information, and coded or uncoded elements such as, for example, motion vectors, coding mode indicators, and syntax elements.
The transmitter 320 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers. The transmitter may include, or interface with, an antenna (not shown). Accordingly, implementations of the transmitter 320 may include, or be limited to, a modulator.
Figure 4 shows an exemplary video receiving system 400 to which the present principles may be applied, in accordance with an embodiment of the present principles. The video receiving system 400 may be configured to receive signals over a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The signals may be received over the Internet or some other network.
The video receiving system 400 may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage. Thus, the video receiving system 400 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.
The video receiving system 400 is capable of receiving and processing video content including video information. The video receiving system 400 includes a receiver 410 capable of receiving an encoded signal, such as for example the signals described in the implementations of this application, and a decoder 420 capable of decoding the received signal.
The receiver 410 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal. The receiver 410 may include, or interface with, an antenna (not shown). Implementations of the receiver 410 may include, or be limited to, a demodulator.
The decoder 420 outputs video signals including video information and depth information. The decoder 420 may be, for example, the decoder 400 described in detail above.
Figure 5 shows an exemplary video processing device 500 to which the present principles may be applied, in accordance with an embodiment of the present principles. The video processing device 500 may be, for example, a set top box or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage. Thus, the video processing device 500 may provide its output to a television, computer monitor, or a computer or other processing device.
The video processing device 500 includes a front-end (FE) device 505 and a decoder 510. The front-end device 505 may be, for example, a receiver adapted to receive a program signal having a plurality of bitstreams representing encoded pictures, and to select one or more bitstreams for decoding from the plurality of bitstreams. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal, decoding one or more encodings (for example, channel coding and/or source coding) of the data signal, and/or error-correcting the data signal. The front-end device 505 may receive the program signal from, for example, an antenna (not shown). The front-end device 505 provides a received data signal to the decoder 510.
The decoder 510 receives a data signal 520. The data signal 520 may include, for example, one or more Advanced Video Coding (AVC), Scalable Video Coding (SVC), or Multi-view Video Coding (MVC) compatible streams. AVC refers more specifically to the existing International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the "H.264/MPEG-4 AVC Standard" or variations thereof, such as the "AVC standard" or simply "AVC").
MVC refers more specifically to a multi-view video coding ("MVC") extension (Annex H) of the AVC standard, referred to as H.264/MPEG-4 AVC, MVC extension (the "MVC extension" or simply "MVC"). SVC refers more specifically to a scalable video coding ("SVC") extension
(Annex G) of the AVC standard, referred to as H.264/MPEG-4 AVC, SVC extension (the "SVC extension" or simply "SVC").
The decoder 510 decodes all or part of the received signal 520 and provides as output a decoded video signal 530. The decoded video 530 is provided to a selector 550. The device 500 also includes a user interface 560 that receives a user input 570. The user interface 560 provides a picture selection signal 580, based on the user input 570, to the selector 550. The picture selection signal 580 and the user input 570 indicate which of multiple pictures, sequences, scalable versions, views, or other selections of the available decoded data a user desires to have displayed. The selector 550 provides the selected picture(s) as an output 590. The selector 550 uses the picture selection information 580 to select which of the pictures in the decoded video 530 to provide as the output 590.
In various implementations, the selector 550 includes the user interface 560, and in other implementations no user interface 560 is provided because the selector 550 receives the user input 570 directly without a separate interface function being performed. The selector 550 may be implemented in software or as an integrated circuit, for example. In one implementation, the selector 550 is incorporated with the decoder 510, and in another implementation, the decoder 510, the selector 550, and the user interface 560 are all integrated. In one application, front-end 505 receives a broadcast of various television shows and selects one for processing. The selection of one show is based on user input of a desired channel to watch. Although the user input to front-end device 505 is not shown in Figure 5, front-end device 505 receives the user input 570. The front-end 505 receives the broadcast and processes the desired show by demodulating the I l relevant part of the broadcast spectrum, and decoding any outer encoding of the demodulated show. The front-end 505 provides the decoded show to the decoder 510. The decoder 510 is an integrated unit that includes devices 560 and 550. The decoder 510 thus receives the user input, which is a user-supplied indication of a desired view to watch in the show. The decoder 510 decodes the selected view, as well as any required reference pictures from other views, and provides the decoded view 590 for display on a television (not shown).
Continuing the above application, the user may desire to switch the view that is displayed and may then provide a new input to the decoder 510. After receiving a "view change" from the user, the decoder 510 decodes both the old view and the new view, as well as any views that are in between the old view and the new view. That is, the decoder 510 decodes any views that are taken from cameras that are physically located in between the camera taking the old view and the camera taking the new view. The front-end device 505 also receives the information identifying the old view, the new view, and the views in between. Such information may be provided, for example, by a controller (not shown in Figure 5) having information about the locations of the views, or the decoder 510. Other implementations may use a front-end device that has a controller integrated with the front-end device.
The decoder 510 provides all of these decoded views as output 590. A post-processor (not shown in Figure 5) interpolates between the views to provide a smooth transition from the old view to the new view, and displays this transition to the user. After transitioning to the new view, the post-processor informs (through one or more communication links not shown) the decoder 510 and the front-end device 505 that only the new view is needed. Thereafter, the decoder 510 only provides as output 590 the new view.
The system 500 may be used to receive multiple views of a sequence of images, and to present a single view for display, and to switch between the various views in a smooth manner. The smooth manner may involve interpolating between views to move to another view. Additionally, the system 500 may allow a user to rotate an object or scene, or otherwise to see a three-dimensional representation of an object or a scene. The rotation of the object, for example, may correspond to moving from view to view, and interpolating between the views to obtain a smooth transition between the views or simply to obtain a three-dimensional representation. That is, the user may "select" an interpolated view as the "view" that is to be displayed. δυ j |
12
Returning to a description of the present principles and environments in which they may be applied, it is to be appreciated that advantageously, the present principles may be applied to 3D Video (3DV). 3D Video is a new framework that includes a coded representation for multiple view video and depth information and targets the generation of high-quality 3D rendering at the receiver. This enables 3D visual experiences with auto-multiscopic displays.
Figure 6 shows an exemplary system 600 for transmitting and receiving multi-view video with depth information, to which the present principles may be applied, according to an embodiment of the present principles. In Figure 6, video data is indicated by a solid line, depth data is indicated by a dashed line, and meta data is indicated by a dotted line. The system 600 may be, for example, but is not limited to, a free-viewpoint television system. At a transmitter side 610, the system 600 includes a three-dimensional (3D) content producer 620, having a plurality of inputs for receiving one or more of video, depth, and meta data from a respective plurality of sources. Such sources may include, but are not limited to, a stereo camera 611 , a depth camera 612, a multi-camera setup 613, and 2-dimensional/3-dimensional (2D/3D) conversion processes 614. One or more networks 630 may be used for transmit one or more of video, depth, and meta data relating to multi-view video coding (MVC) and digital video broadcasting (DVB). At a receiver side 640, a depth image-based renderer 650 performs depth image-based rendering to project the signal to various types of displays. The depth image-based renderer 650 is capable of receiving display configuration information and user preferences. An output of the depth image-based renderer 650 may be provided to one or more of a 2D display 661 , an M-view 3D display 662, and/or a head-tracked stereo display 663.
A first embodiment, and variations of this embodiment, will now be discussed. When coding depth maps using AVCs intra only coding mode we observed that most of the macroblocks (MBs) that are selected have a high correlation with their neighboring macroblock (MB). The inventors further recognized that because the depth map has special characteristics as described above (flat areas), most of the modes selected were INTRA16x16 except at depth discontinuities. The first embodiment, and variations of the first embodiment, are related to these discoveries and recognitions. We generally refer to this first embodiment, and variations of this first embodiment, as intra skip. The Inter skip mode in H.264/AVC is well known. Currently this mode is only supported in inter slices. When this mode is selected, potentially, only a skip run value is sent to the decoder with context adaptive variable length coding (CAVLC) mode. Under context adaptive binary arithmetic coding (CABAC) mode, a flag mb_skip_flag is signaled for each macroblock. Once a macroblock is signaled as a skip MB, the decoder derives all the remaining information and this macroblock is decoded without residue.
In a similar manner, we propose to introduce a variation of this mode in the intra slices keeping in mind our observations described above. In this embodiment, we do not allow Intra skip mode in inter slices. Additionally, this embodiment describes changes that we propose when the bitstream is operating in both CAVLC mode and CABAC mode.
Figure 7 shows an exemplary method 700 for intra slice encoding with Intra skip mode, in accordance with an embodiment of the present principles. At step 701 , the picture to be encoded is read. At step 705, the encoding of a current macroblock is initialized. At step 710, the best rate distortion (RD) cost for INTRA16x16 is calculated. At step 715, the best RD cost for INTRAδxδ is calculated. At step 720, the best RD cost for INTRA4x4 is calculated. At step 725, the RD cost for PCM is calculated. At step 730, the RD cost for Intra skip is calculated. At step 735, the best Intra mode is selected from among 16x16, 8x8, 4x4, PCM, and Intra skip. At step 740, the current macroblock (MB) is encoded with the best Intra mode. At step 745, it is determined whether or not there are any more macroblocks (to be encoded). If so, then control is returned to step 705. Otherwise, control is passed to step to 750. At step 750, the encoded picture is written into a bitstream, and the bitstream is sent over a network(s).
It is to be appreciated that in at least one implementation, step 740 may involve, for example, forming a residue between the current block and the prediction for the current block, and encoding the residue.
Figure 8 shows an exemplary method 800 for intra slice decoding with Intra skip mode, in accordance with an embodiment of the present principles. At step 801 , a bitstream for a slice is read or received. At step 805, mb_skip_run is parsed. At step 810, it is determined whether or not mb_skip_run is equal to zero. If so, then control is passed to a step 815. Otherwise, control is passed to a step 825.
At step 815, decoding of the Intra MB is continued as in the AVC Standard. i UWoU I J I
14
At step 820, it is determined whether or not there are any more macroblocks in this (the current) slice. If so, then control is returned to step 805. Otherwise, the method is terminated.
At step 825, it is determined whether or not mb_skip_run is greater than zero. If so, then control is passed to a step 830. Otherwise, control is passed to the step 820.
At step 830, it is determined whether or not a left macroblock exists (in the current slice, with respect to the current macroblock). If so, then control is passed to a step 835. Otherwise, control is passed to a step 845.
At step 835, the Intra prediction mode from the left macroblock is copied. At step 840, the Intra MB is decoded using the derived Intra prediction modes.
At step 860, a value of the syntax element mb_skip_run is decremented.
At step 845, it is determined whether or not an above macroblock exists (in the current slice, with respect to the current macroblock). If so, then control is passed to a step 850. Otherwise,. control is passed to a step 855. At step 850, the Intra Prediction mode from the above macroblock is copied.
At step 855, the bitstream is indicated as being invalid.
Figure 9 shows an exemplary method for calculating rate distortion (RD) with respect to steps 710, 715, 720, 725, and 730 of method 700 of Figure 7. At step 905, it is determined whether or not a left macroblock exists (in the current slice, with respect to the current macroblock). If so, then control is passed to a step 910. Otherwise, control is passed to a step 930.
At step 910, the Intra prediction mode from the left macroblock is copied. At step 915, the prediction block is found using the derived prediction mode. At step 920, the distortion is calculated using the prediction block as a reconstruction. At step 925, the RD cost is calculated using the distortion and zero bitrate.
At step 930, it is determined whether or not an above macroblock exists (in the current slice, with respect to the current macroblock). If so, then control is passed to a step 935. Otherwise, control is passed to a step 940.
At step 935, the Intra prediction mode from the above macroblock is copied. At step 940, the rate distortion (RD) is set to be infinite.
We now discuss encoder modifications that are used in various implementations. Intra skip mode is added as another mode in the intra slices. Intra skip mode is in addition to the existing intra prediction modes: 16x16; 8x8; and 4x4. As shown in Figure 7, in a typical exhaustive search encoder, the Intra skip mode is added as a new module to calculate the rate distortion cost. Eventually the mode with the lowest RD cost is selected as the coding mode. Figure 9 shows one embodiment of how the encoder calculates the prediction block for the Intra skip mode and the corresponding RD cost. This can be described as follows:
Step 1. Check if left macroblock exists
Step 2. If left macroblock exists, copy the macroblock type and intra prediction modes to the current macroblock and go to Step 5, else go to Step 3. Step 3. Check if above macroblock exists Step 4. If above macroblock exists, copy the macroblock type and intra prediction modes to the current macroblock and go to Step 5, else go to
Step 7. Step 5. Obtain the prediction samples using the macroblock type and intra prediction modes that were copied. Step 6. Calculate the RD cost of coding using Intra skip mode and go to Step 8
Step 7. Set the RD cost of a very large value.
Step 8. Return RD cost.
Of course, it is to be appreciated that the present principles are not limited solely to the preceding approach regarding how the encoder calculates the prediction block for the Intra skip mode and the corresponding RD cost and, thus, other approaches may also be used. That is, given the teachings of the present principles provided herein, one of ordinary skill in this and related arts will contemplate this and other approaches for how the encoder calculates the prediction block for the Intra skip and the corresponding RD cost, while maintaining the spirit of the present principles.
We now discuss syntax changes that are used in various implementations. The first change, for enabling Intra skip mode at the decoder in an Intra slice, is to remove the Intra slice detection condition in slice_data() as shown in Table 1. In particular, Table 1 shows syntax changes for slice_data(), in accordance with an embodiment of the present principles. The syntax that is deleted is shown with strikethrough, italics, and larger font. That is, Intra slices can have skip MB as Inter slices. The value of mb_skip_run indicates how many macroblocks are to be skipped for coding under CAVLC mode. Under CABAC mode, mb_skip_flag is set to 1 when the current macroblock is skipped for coding. Otherwise, mb_skip_flag is set to 0. To encode mb_skip_flag efficiently under CABAC mode, a new CABAC context table should be designed carefully. Herein, we do not specify any particular CABAC table. However, it is to be appreciated that, given the teachings of the present principles provided herein, one of ordinary skill in this and related arts will readily contemplate CABAC context tables to which the present principles may be applied, while maintaining the spirit of the present principles.
TABLE 1
Figure imgf000018_0001
r UUoU I J 1
17
Figure imgf000019_0001
We now discuss the decoding process for Intra MBs in various implementations. In addition to the above syntax changes, the decoding process for an Intra slice is proposed to be modified as follows. If a macroblock is detected to be an Intra skip MB, then the decoder first determines which neighbor macroblock is to be used to derive the intra prediction mode. In one example, the macroblock on the left is always selected if it is available. Otherwise, the above macroblock is selected. If neither the left macroblock nor the above macroblock is available, then the bitstream is claimed invalid. After the deriving macroblock is determined, then the macroblock type and intra prediction mode is copied from this macroblock. The decoder can set the prediction block using this derived prediction mode. Then the prediction block is assumed to be the reconstruction of the skipped macroblock. The decoding procedure is depicted in Figure 8. In one implementation relating to, for example, a joint scalable video model
(JSVM) decoder, the parsing process is separated from the decoding process. The decoder first parses the syntax elements of all macroblocks within the slice, then decodes the macroblocks. When the decoder parses the bitstream, the macroblocks with Intra skip mode are marked. For INTRA16x16, the intra prediction mode is signaled by the mb_type syntax. Thus, if the neighboring macroblock is in
INTRA16x16 MB, then the intra prediction mode of the current (Intra skip) macroblock can be derived during the parsing process. In the case that the neighbor macroblock is in INTRA4x4 or INTRAδxδ mode, then the real intra prediction mode is unknown during the parsing process. Instead, the decoder waits until the decoding process to copy the intra prediction mode to the current (Intra skip) macroblock.
It is noted that with this embodiment, neither the coding mode nor the residual are transmitted. Figure 10 shows an exemplary depth map 1000 to which the present principles may be applied, in accordance with an embodiment of the present principles. The exemplary depth map 200 relates to a scene of "breakdancers". Simulation results show a 0.497 dB improvement for Breakdancers depth. ruυoυ i j i
18
A second embodiment, and variations of the second embodiment, will now be discussed. The second embodiment, and variations, may generally be referred to as Extended P skip mode and B skip mode.
When In embodiment 1 , the Intra skip mode works only for I (Intra) slices. In this embodiment, we propose to enable the Intra skip mode for other slice types.
Introducing Intra skip mode in P and B slices is different since the skip run in the slice data can now apply to both inter skip and intra skip.
With the extended P skip mode, the derivation of mb_type of a skipped macroblock in a P /B slice is modified. In one embodiment, if all neighboring (e.g., any combination of left, above, above right, and above left) macroblocks are coded in Intra mode, then the current skipped macroblock is derived as an Intra macroblock, instead of an Inter macroblock. All the decoding procedure defined for embodiment 1 can be reproduced for P slices in this embodiment.
In another embodiment, with the skip run information we can signal another flag, lntra_skip_flag, which indicates whether the skipped macroblock is an Intra skip or Inter skip. This modification is shown in Table 2 using italics and larger font. In particular, Table 2 shows syntax changes in slice_data(), in accordance with an embodiment of the present principles. If intra_skip_flag is set to 1, then the syntax mb_skip_run indicates the run length of Intra skip MBs under CAVLC and mb_skip_flag indicates that the current macroblock is an Intra skip macroblock under CABAC. If intra_skip_flag is set to 0, then the syntax mb_skip_run indicates the run length of Inter skip macroblocks under CAVLC and the syntax mb_skip_flag indicates that the current macroblock is an Intra skip macroblock under CABAC.
The condition for the presence of intra_skip_flag is explained here. If the slice type is Intra, the flag need not be presented. If the previous macroblock is signaled as skip even if it is an Inter slice, then the flag need not be presented either.
TABLE 2
Figure imgf000020_0001
Figure imgf000021_0001
Figure 11 shows an exemplary method for encoding a P or B slice, in accordance with an embodiment of the present principles At step 1105, an encoding off the current macroblock (MB) is initialized. At step 1110, the best rate distortion (RD) cost for INTRA16x16 is calculated At step 1115, the best RD cost for INTRAδxδ is calculated. At step 1120, the best RD cost for INTRA4x4 is calculated At step 1125, the RD cost for PCM is calculated. At step 1130, the RD cost for Intra skip is calculated. At step 1135, the best Intra mode is selected from among 16x16, 8x8, 4x4, PCM, and Intra skip. At step 1140, the best RD cost is calculated for Inter modes. At step 1 145, it is determined whether or not the best Inter mode is better than the nest Intra mode. If so, then control is passed to a step 1150. Otherwise, control is passed to a step 1160. At step 1150, the current macroblock is encoded with the best Inter mode.
At step 1155, it is determined whether or not there are any more macroblocks (to be encoded). If so, then control is returned to step 1 105. Otherwise, the method is terminated.
At step 1160, the current macroblock is encoded with the best Intra mode. Figure 12 shows an exemplary method for decoding a P or B slice, in accordance with an embodiment of the present principles. At step 1201 , a bitstream for a slice is read or received. At step 1205, mb_skip_run is parsed. At step 1210, intra_skip_flag is derived/parsed. At step 1215, it is determined whether or not mb_skip_run is equal to zero. If so, then control is passed to a step 1220. Otherwise, control is passed to a step 1230.
At step 1220, decoding of the Intra MB is continued as in the AVC Standard.
At step 1225, it is determined whether or not there are any more macroblocks in this (the current) slice. If so, then control is returned to step 1205. Otherwise, the method is terminated. At step 1230, it is determined whether or not mb_skip_run > zero. If so, then control is passed to a step 1235. Otherwise, control is passed to step 1225.
At step 1235, it is determined whether or not lntra_skip_flag is equal to zero. If so, then control is passed to a step 1245. Otherwise, control is passed to a step 1240.
At step 1245, the macroblock is decoded using the derived Intra prediction modes.
At step 1240, the macroblock is decoded using the derived Inter prediction modes.
At step 1250, a value of the syntax element mb_skip_run is decremented.
Reference in the specification to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
It is to be appreciated that the use of any of the following 7", "and/or", and "at least one of, for example, in the cases of "AJB", "A and/or B" and "at least one of A and B", is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of "A, B, and/or C" and "at least one of A, B, and C", such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed. Implementations may signal information using a variety of techniques including, but not limited to, in-band information, out-of-band information, datastream data, implicit signaling, and explicit signaling. In-band information and explicit signaling may include, for various implementations and/or standards, slice headers, SEI messages, other high level syntax, and non-high-level syntax. Accordingly, although implementations described herein may be described in a particular context, such descriptions should in no way be taken as limiting the features and concepts to such implementations or contexts.
The implementations and features described herein may be used in the context of the MPEG-4 AVC Standard, or the MPEG-4 AVC Standard with the MVC extension, or the MPEG-4 AVC Standard with the SVC extension. However, these implementations and features may be used in the context of another standard and/or recommendation (existing or future), or in a context that does not involve a standard and/or recommendation.
In coding a given portion of a picture, implementations may borrow various pieces of information from a neighboring portion. The information may include, for example, one or more pixel values, a motion vector, and/or a coding mode. Further, the neighbor may be, for example, all or part of (i) a temporally neighboring picture, (ii) a spatially neighboring portion, and (iii) a neighboring view from the same or a different instant in time. Additionally, the portion may be, for example, (i) a full picture, such as, for example, a full frame or field, (ii) a slice, (iii) a macroblock, and (iv) a partition or other sub-block portion.
It should also be clear that in coding a given portion of a picture, various modes can be used. For example, if the coding mode borrowed from a spatially neighboring macroblock indicates that pixels are to be copied the macroblock that is immediately above the macroblock being coded, then the prediction is formed by copying (in columns) the pixels in the row that is immediately above the macroblock being coded. Various modes may be used (and therefore borrowed), such as, for example, horizontal copy, vertical copy, DC coefficient copy, and other known modes. The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle. Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory ("RAM"), or a read-only memory ("ROM"). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation. As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application and are within the scope of the following claims.

Claims

CLAIMS:
1. A method comprising: accessing a neighbor-block prediction mode for a neighbor block in a picture, the neighbor block being a spatial neighbor of a target block in the picture, the neighbor-block prediction mode indicating how the neighbor-block has been predicted in an encoding process of the neighbor block; determining a target-block prediction mode for the target block based on the neighbor-block prediction mode, the target-block prediction mode indicating how the target-block is to be predicted in an encoding process of the target block; predicting the target block using the target-block prediction mode, to produce a target-block prediction; evaluating the target-block prediction; determining to use the target-block prediction, as a prediction for the target block, based on a result of the evaluating; and encoding prediction mode information for the target block, the prediction mode information indicating that the target block of data was predicted using a prediction mode that was determined based on the neighbor-block prediction mode.
2. The method of claim 1 further comprising forming a bitstream that includes the encoded prediction mode information.
3. The method of claim 2 further comprising transmitting the bitstream.
4. The method of claim 1 further comprising forming a residue between the target block and the target-block prediction; and encoding the residue.
5. The method of claim 4 further comprising forming a bitstream that includes the encoded prediction mode information and the encoded residue.
6. The method of claim 1 wherein the picture has depth information for a different picture.
7. The method of claim 6 wherein the picture is a depth map for the different picture.
8. The method of claim 1 wherein the picture is designated as a picture that is to be encoded with respect to itself and not predictively encoded with respect to another picture.
9. The method of claim 1 wherein the picture is an I picture.
10. The method of claim 1 wherein the target-block prediction mode is determined based on prediction modes of multiple blocks that are all spatial neighbors of the target block.
11. The method of claim 1 wherein predicting the target block comprises using pixel values from the neighbor block.
12. The method of claim 1 wherein the picture is designated as a picture for which blocks may be predictively encoded with respect to another picture.
13. The method of claim 1, wherein the picture is a P or B picture.
14. The method of claim 1 wherein the target-block prediction mode and the neighbor-block prediction mode are defined in a standard.
15. The method of claim 14 wherein the standard is AVC or MVC.
16. An apparatus comprising: means for accessing a neighbor-block prediction mode for a neighbor block in a picture, the neighbor block being a spatial neighbor of a target block in the picture, the neighbor-block prediction mode indicating how the neighbor-block has been predicted in an encoding process of the neighbor block; means for determining a target-block prediction mode for the target block based on the neighbor-block prediction mode, the target-block prediction mode indicating how the target-block is to be predicted in an encoding process of the target block; means for predicting the target block using the target-block prediction mode, to produce a target-block prediction; means for evaluating the target-block prediction; means for determining to use the target-block prediction, as a prediction for the target block, based on a result of the evaluating; and means for encoding prediction mode information for the target block, the prediction mode information indicating that the target block of data was predicted using a prediction mode that was determined based on the neighbor-block prediction mode.
17. A processor readable medium having stored thereon instructions for causing a processor to perform at least the following: accessing a neighbor-block prediction mode for a neighbor block in a picture, the neighbor block being a spatial neighbor of a target block in the picture, the neighbor-block prediction mode indicating how the neighbor-block has been predicted in an encoding process of the neighbor block; determining a target-block prediction mode for the target block based on the neighbor-block prediction mode, the target-block prediction mode indicating how the target-block is to be predicted in an encoding process of the target block; predicting the target block using the target-block prediction mode, to produce a target-block prediction; evaluating the target-block prediction; determining to use the target-block prediction, as a prediction for the target block, based on a result of the evaluating; and encoding prediction mode information for the target block, the prediction mode information indicating that the target block of data was predicted using a prediction mode that was determined based on the neighbor-block prediction mode.
18. An apparatus, comprising a processor configured to perform at least the following: accessing a neighbor-block prediction mode for a neighbor block in a picture, the neighbor block being a spatial neighbor of a target block in the picture, the neighbor-block prediction mode indicating how the neighbor-block has been predicted in an encoding process of the neighbor block; determining a target-block prediction mode for the target block based on the neighbor-block prediction mode, the target-block prediction mode indicating how the target-block is to be predicted in an encoding process of the target block; predicting the target block using the target-block prediction mode, to produce a target-block prediction; evaluating the target-block prediction; determining to use the target-block prediction, as a prediction for the target block, based on a result of the evaluating; and encoding prediction mode information for the target block, the prediction mode information indicating that the target block of data was predicted using a prediction mode that was determined based on the neighbor-block prediction mode.
19. An apparatus comprising: an intra predictor, wherein the intra predictor is for: determining a neighbor-block prediction mode for a neighbor block in a picture, the neighbor block being a spatial neighbor of a target block in the picture, the neighbor-block prediction mode indicating how the neighbor-block has been predicted in an encoding process of the neighbor block, determining a target-block prediction mode for the target block based on the neighbor-block prediction mode, the target-block prediction mode indicating how the target-block is to be predicted in an encoding process of the target block, predicting the target block using the target-block prediction mode, to produce a target-block prediction, evaluating the target-block prediction, and determining to use the target-block prediction, as a prediction for the target block, based on a result of the evaluating; and an entropy encoder for encoding prediction mode information for the target block, the prediction mode information indicating that the target block of data was predicted using a prediction mode that was determined based on the neighbor-block prediction mode.
20. The apparatus of claim 19 wherein the apparatus is implemented in a video encoder.
21. An apparatus comprising: an intra predictor, wherein the intra predictor is for: determining a neighbor-block prediction mode for a neighbor block in a picture, the neighbor block being a spatial neighbor of a target block in the picture, the neighbor-block prediction mode indicating how the neighbor-block has been predicted in an encoding process of the neighbor block, determining a target-block prediction mode for the target block based on the neighbor-block prediction mode, the target-block prediction mode indicating how the target-block is to be predicted in an encoding process of the target block, predicting the target block using the target-block prediction mode, to produce a target-block prediction, evaluating the target-block prediction, and determining to use the target-block prediction, as a prediction for the target block, based on a result of the evaluating; an entropy encoder for encoding prediction mode information for the target block, the prediction mode information indicating that the target block of data was predicted using a prediction mode that was determined based on the neighbor-block prediction mode; and a modulator for modulating a signal, the signal including the prediction mode information. r υvovu i
29
22. A method comprising: accessing prediction mode information for reconstructing a target block of data from a picture, the prediction mode information indicating that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block that is a spatial neighbor of the target block in the picture; determining the neighbor-block prediction mode of the neighbor block; determining the target-block prediction mode of the target block based on the neighbor-block prediction mode; and reconstructing the target block by predicting the target block using the determined target-block prediction mode.
23. The method of claim 22 wherein the picture has depth information for a different picture.
24. The method of claim 23 wherein the picture is a depth map for the different picture.
25. The method of claim 22 wherein the picture is designated as a picture that is to be encoded with respect to itself and not predictively encoded with respect to another picture.
26. The method of claim 22 wherein the picture is an I picture.
27. The method of claim 22 wherein the target-block prediction mode is determined based on prediction modes of multiple blocks that are all spatial neighbors of the target block.
28. The method of claim 22 wherein reconstructing the target block comprises using pixel values from the neighbor block.
29. The method of claim 22 wherein the picture is designated as a picture for which blocks may be predictively encoded with respect to another picture.
30. The method of claim 22 wherein the picture is a P or B picture.
31. The method of claim 22 wherein the target-block prediction mode and the neighbor-block prediction mode are defined in a standard.
32. The method of claim 31 wherein the standard is AVC or MVC.
33. An apparatus comprising: means for accessing prediction mode information for reconstructing a target block of data from a picture, the prediction mode information indicating that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block that is a spatial neighbor of the target block in the picture; means for determining the neighbor-block prediction mode of the neighbor block; means for determining the target-block prediction mode of the target block based on the neighbor-block prediction mode; and means for reconstructing the target block by predicting the target block using the determined target-block prediction mode.
34. A processor readable medium having stored thereon instructions for causing a processor to perform at least the following: means for accessing prediction mode information for reconstructing a target block of data from a picture, the prediction mode information indicating that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block that is a spatial neighbor of the target block in the picture; means for determining the neighbor-block prediction mode of the neighbor block; means for determining the target-block prediction mode of the target block based on the neighbor-block prediction mode; and means for reconstructing the target block by predicting the target block using the determined target-block prediction mode.
35. An apparatus, comprising a processor configured to perform at least the following: accessing prediction mode information for reconstructing a target block of data from a picture, the prediction mode information indicating that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block that is a spatial neighbor of the target block in the picture; determining the neighbor-block prediction mode of the neighbor block; determining the target-block prediction mode of the target block based on the neighbor-block prediction mode; and reconstructing the target block by predicting the target block using the determined target-block prediction mode.
36. An apparatus comprising an intra predictor, wherein the intra predictor is for: accessing prediction mode information for reconstructing a target block of data from a picture, the prediction mode information indicating that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block that is a spatial neighbor of the target block in the picture, determining the neighbor-block prediction mode of the neighbor block, determining the target-block prediction mode of the target block based on the neighbor-block prediction mode, and reconstructing the target block by predicting the target block using the determined target-block prediction mode.
37. The apparatus of claim 36 wherein the apparatus is implemented in at least one of a video encoder and a video decoder. r uυoυ u I
32
38. An apparatus comprising: a demodulator for receiving and demodulating a signal, the signal including prediction mode information for reconstructing a target block of data from a picture, the prediction mode information indicating that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block that is a spatial neighbor of the target block in the picture; and an intra predictor, wherein the intra predictor is for: accessing the prediction mode information, determining the neighbor-block prediction mode of the neighbor block, determining the target-block prediction mode of the target block based on the neighbor-block prediction mode, and reconstructing the target block by predicting the target block using the determined target-block prediction mode.
39. A video signal formatted to include information, the video signal comprising: a mode portion including prediction mode information for reconstructing a target block of data from a picture, the prediction mode information indicating that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block that is a spatial neighbor of the target block in the picture.
40. A video signal structure comprising: a mode portion including prediction mode information for reconstructing a target block of data from a picture, the prediction mode information indicating that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block that is a spatial neighbor of the target block in the picture. r uυou i J i
33
41. A processor readable medium having stored thereon a video signal structure comprising: a mode portion including prediction mode information for reconstructing a target block of data from a picture, the prediction mode information indicating that the target block of data was predicted using a target-block prediction mode that was determined based on a neighbor-block prediction mode of a neighbor block that is a spatial neighbor of the target block in the picture.
42. The processor readable medium of claim 41 wherein the video signal structure stored on the processor-readable medium further includes: a picture portion including an encoding of the picture.
43. The processor readable medium of claim 41 wherein the prediction mode information further indicates the no residue is provided and the reconstruction of the target block is to be a prediction generated by the target-block prediction mode.
PCT/US2009/004540 2008-08-20 2009-08-07 Depth coding Ceased WO2010021664A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US18958508P 2008-08-20 2008-08-20
US61/189,585 2008-08-20

Publications (1)

Publication Number Publication Date
WO2010021664A1 true WO2010021664A1 (en) 2010-02-25

Family

ID=41279247

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/004540 Ceased WO2010021664A1 (en) 2008-08-20 2009-08-07 Depth coding

Country Status (1)

Country Link
WO (1) WO2010021664A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8913105B2 (en) 2009-01-07 2014-12-16 Thomson Licensing Joint depth estimation
US9179153B2 (en) 2008-08-20 2015-11-03 Thomson Licensing Refined depth map
CN106416243A (en) * 2014-02-21 2017-02-15 联发科技(新加坡)私人有限公司 A Video Coding Method Based on Intra-picture Block Copy Prediction
US10198792B2 (en) 2009-10-14 2019-02-05 Dolby Laboratories Licensing Corporation Method and devices for depth map processing
WO2019047664A1 (en) * 2017-09-06 2019-03-14 浙江宇视科技有限公司 Code rate control method and apparatus, image acquisition device, and readable storage medium
CN110365976A (en) * 2013-11-14 2019-10-22 寰发股份有限公司 Video Coding Method Using Intra Picture Block Copy Prediction

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003105070A1 (en) * 2002-06-01 2003-12-18 Nokia Corporation Spatial prediction based intra coding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003105070A1 (en) * 2002-06-01 2003-12-18 Nokia Corporation Spatial prediction based intra coding

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALJOSCHA SMOLIC ET AL: "3D Video and Free Viewpoint Video - Technologies, Applications and MPEG Standards", MULTIMEDIA AND EXPO, 2006 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PI, 1 July 2006 (2006-07-01), pages 2161 - 2164, XP031033297, ISBN: 978-1-4244-0366-0 *
BOJUN MENG ET AL: "Fast intra-prediction mode selection for 4x4 blocks in H.264", PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP'03) 6-10 APRIL 2003 HONG KONG, CHINA; [IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP)], 2003 IEEE INTERNATIONAL CONFERENCE, vol. 3, 6 April 2003 (2003-04-06), pages III_389 - III_392, XP010639091, ISBN: 978-0-7803-7663-2 *
JIA-WEI CHEN ET AL: "A Condition-based Intra Prediction Algorithm for H.264/AVC", MULTIMEDIA AND EXPO, 2006 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PI, 1 July 2006 (2006-07-01), pages 1077 - 1080, XP031033026, ISBN: 978-1-4244-0366-0 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9179153B2 (en) 2008-08-20 2015-11-03 Thomson Licensing Refined depth map
US8913105B2 (en) 2009-01-07 2014-12-16 Thomson Licensing Joint depth estimation
US10198792B2 (en) 2009-10-14 2019-02-05 Dolby Laboratories Licensing Corporation Method and devices for depth map processing
US10417748B2 (en) 2009-10-14 2019-09-17 Dolby Laboratories Licensing Corporation Filtering and edge encoding and decoding for depth maps
CN110365976B (en) * 2013-11-14 2023-03-24 寰发股份有限公司 Video coding method using intra picture block based copy prediction
CN110365976A (en) * 2013-11-14 2019-10-22 寰发股份有限公司 Video Coding Method Using Intra Picture Block Copy Prediction
CN109889850A (en) * 2014-02-21 2019-06-14 联发科技(新加坡)私人有限公司 Video coding-decoding method
CN106416243B (en) * 2014-02-21 2019-05-03 联发科技(新加坡)私人有限公司 Video Coding Method Using Intra-picture Block Copy Prediction
US10555001B2 (en) 2014-02-21 2020-02-04 Mediatek Singapore Pte. Ltd. Method of video coding using prediction based on intra picture block copy
US11140411B2 (en) 2014-02-21 2021-10-05 Mediatek Singapore Pte. Ltd. Method of video coding using prediction based on intra picture block copy
CN106416243A (en) * 2014-02-21 2017-02-15 联发科技(新加坡)私人有限公司 A Video Coding Method Based on Intra-picture Block Copy Prediction
WO2019047664A1 (en) * 2017-09-06 2019-03-14 浙江宇视科技有限公司 Code rate control method and apparatus, image acquisition device, and readable storage medium
US11902533B2 (en) 2017-09-06 2024-02-13 Zhejiang Uniview Technologies Co., Ltd. Code rate control method and apparatus, image acquisition device, and readable storage medium

Similar Documents

Publication Publication Date Title
US9179153B2 (en) Refined depth map
US8532410B2 (en) Multi-view video coding with disparity estimation based on depth information
KR101653724B1 (en) Virtual reference view
US20110038418A1 (en) Code of depth signal
CN114600466A (en) Image coding apparatus and method based on cross component filtering
CN120321388A (en) Device and method for image coding based on filtering
EP2838262A1 (en) Method for multi-view video encoding based on tree structure encoding unit and apparatus for same, and method for multi-view video decoding based on tree structure encoding unit and apparatus for same
CN114424531A (en) In-loop filtering based video or image coding
CN114982245A (en) Image encoding apparatus and method based on filtering
WO2010021664A1 (en) Depth coding
CN115136608B (en) Image encoding device and method based on virtual boundary
CN115104318B (en) Image encoding device and method based on sub-picture
CN115152214B (en) Image encoding device and method based on screen division
CN120017863A (en) Decoding device, encoding device, and device for transmitting data for image
CN120639971A (en) Encoding device, decoding device and data sending device
CN115088262A (en) Method and apparatus for signaling image information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09789086

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09789086

Country of ref document: EP

Kind code of ref document: A1