WO2008028013A2 - Procédé et système de décodage pour des données vidéo à rapport de compression élevé - Google Patents
Procédé et système de décodage pour des données vidéo à rapport de compression élevé Download PDFInfo
- Publication number
- WO2008028013A2 WO2008028013A2 PCT/US2007/077191 US2007077191W WO2008028013A2 WO 2008028013 A2 WO2008028013 A2 WO 2008028013A2 US 2007077191 W US2007077191 W US 2007077191W WO 2008028013 A2 WO2008028013 A2 WO 2008028013A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video data
- control maps
- prediction
- processing
- units
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
Definitions
- the invention is in the field of decoding video data that has been encoded according to a specified encoding format, and more particularly, decoding the video data to optimize use of data processing hardware.
- Digital video playback capability is increasingly available in all types of hardware platforms, from inexpensive consumer-level computers to super- sophisticated flight simulators
- Digital video playback includes displaying video that is accessed from a storage medium or streamed from a real-time source, such as a television signal .
- a real-time source such as a television signal
- new techniques to improve the quality and accessibility of the digital video are being developed, For example, in order to store and transmit digital video, it is typically compressed or encoded using a format specified by a standard.
- Recently H.264 a video compression scheme, or codec, has been adopted by the Motion Pictures Expert Group (MPEG) to be the video compression scheme for the MPEG-4 format for digital media exchange, H 264 is MPEG-4 Part 10.
- MPEG Motion Pictures Expert Group
- H.264 was developed to address various needs in an evolving digital media market, such as relative inefficiency of older compression schemes, the availability of greater computational resources today, and the increasing demand for High Definition (HD) video, which requires the ability to store and transmit about six times as much data as required by Standard Definition (SD) video
- H.264 is an example of an encoding scheme developed to have a much higher compression ratio than previously available in order to efficiently store and transmit higher quantities of video data, such as HD video data
- the higher compression ratio comes with a significant increase in the computational complexity required to decode the video data for playback
- PCs personal computers
- High compression ratio schemes such as H.264 Therefore, most PCs cannot playback highly compressed video data stored on high-density media such as optical Blu-ray discs (BD) or HD-DVD discs.
- Many PCs include dedicated video processing units (VPUs) or graphics processing units (GPUs) that share the decoding tasks with the PC.
- the GPUs may be add-on units in the form of graphics cards, for example, or integrated GPUs
- PCs with dedicated GPUs typically are not capable of BD or HD-DVD playback.
- Efficient processing of H.264/MPEG-4 is very difficult in a multi-pipeline processor such as a GPU
- video frame data is arranged in macro blocks according to the MPEG standard
- a macro block to be decoded has dependencies on other macro blocks, as well as intrablock dependencies within the macro block.
- edge filtering of the edges between blocks must be completed. This normally results in algorithms that simply complete decoding of each macro block sequentially, which involves several computationally distinct operations involving different hardware passes This results in failure to exploit the parallelism that is inherent in modern day processors such as multi- pipeline GPUs.
- Figure 1 is a block diagram of a system with graphics processing capability according to an embodiment
- Figure 2 is a block diagram of elements of a GPU according to an embodiment.
- Figure 3 is a diagram illustrating a data and control flow of a decoding process according to an embodiment.
- Figure 4 is another diagram illustrating a data and control flow of a decoding process according to an embodiment
- Figure 5 is a diagram illustrating a data and control flow of an inter-prediction process according to an embodiment.
- Figures 6A, 6B, and 6C are diagrams of a macro block divided into different blocks according to an embodiment
- Figure 7 is a block diagram illustrating intra-block dependencies according to an embodiment.
- Figure 8 is a diagram illustrating a data and control flow of an intra-prediction process according to an embodiment.
- Figure 9 is a block diagram of a frame after inter-prediction and intra- prediction have been performed according to an embodiment.
- Figures 1OA and 1OB are block diagrams of macro blocks illustrating vertical and horizontal deblocking, which are performed on each macro block according to an embodiment.
- Figures HA, HB, HC, and HD show the pels involved in vertical deblocking for each vertical edge in a macro block according to an embodiment
- Figures 12A, 12B, 12C, and 12D show the pels involved in horizontal deblocking for each horizontal edge in a macro block according to an embodiment
- Figure 13A is a block diagram of a macro block that shows vertical edges 0-3 according to an embodiment.
- Figure 13B is a block diagram that shows the conceptual mapping of the shaded data from Figure 13A into a scratch buffer according to an embodiment.
- Figure 14A is a block diagram that shows multiple macro blocks and their edges according to an embodiment.
- Figure 14B is a block diagram that shows the mapping of the shaded data from Figure 14A into the scratch buffer according to an embodiment
- Figure 15A is a block diagram of a macro block that shows horizontal edges 0-3 according to an embodiment
- Figure 15B is a block diagram that shows the conceptual mapping of the shaded data from Figure 15A into the scratch buffer according to an embodiment
- Figure 16A is a bock diagram that shows multiple macro blocks and their edges according to an embodiment
- Figure 16B is a block diagram that shows the mapping of the shaded data from Figure 16A into the scratch buffer according to an embodiment.
- Figure 17A is a bock diagram that shows multiple macro blocks and their edges according to an embodiment.
- Figure 17B is a block diagram that shows the mapping of the shaded data from Figure 17A into the scratch buffer according to an embodiment.
- Figure 18A is a bock diagram that shows multiple macro blocks and their edges according to an embodiment.
- Figure 18B is a block diagram that shows the mapping of the shaded data from Figure 18A into the scratch buffer according to an embodiment
- Figure 19 A is a bock diagram that shows multiple macro blocks and their edges according to an embodiment
- Figure 19B is a block diagram that shows the mapping of the shaded data from Figure 19A into the scratch buffer according to an embodiment
- Figure 20 is a block diagram of a source buffer at the beginning of a deblocking algorithm iteration according to an embodiment.
- Figure 21 is a block diagram of a target buffer at the beginning of a deblocking algorithm iteration according to an embodiment.
- Figure 22 is a block diagram of the target buffer after the left side filtering according to an embodiment
- Figure 23 is a block diagram of the target buffer after the vertical filtering according to an embodiment.
- Figure 24 is a block diagram of a new target buffer after a copy according to an embodiment.
- Figure 25 is a block diagram of the target buffer after a pass according to an embodiment.
- Figure 26 is a block diagram of the target buffer after a pass according to an embodiment.
- Figure 27 is a block diagram of the target buffer after a copy according to an embodiment.
- Embodiments of a method and system for layered decoding of video data encoded according to a standard that includes a high-compression ratio compression scheme are described herein.
- the term "layer” as used herein indicates one of several distinct data processing operations performed on a frame of encoded video data in order to decode the frame.
- the distinct data processing operations include, but are not limited to, motion compensation and deblocking.
- motion compensation typically refers to accounting for the difference between consecutive frames in terms of where each section of the former frame has moved to
- motion compensation is performed using inter-prediction and/or intra- prediction, depending on the encoding of the video data.
- Prior decoding methods performed all of the distinct data processing operations on a unit of data within the frame before moving to a next unit of data within a frame.
- embodiments of the invention perform a layer of processing on an entire frame at one time, and then perform a next layer of processing
- multiple frames are processed in parallel using the same algorithms described below
- the encoded data is pre-processed in order to allow layered decoding without errors, such as errors that might result from processing interdependent data in an incorrect order.
- the pre-processing prepares various sets of encoded data to be operated on in parallel by different processing pipelines, thus optimizing the use of the available graphics processing hardware and minimizing the use of the CPU.
- FIG. 1 is a block diagram of a system 100 with graphics processing capability according to an embodiment.
- the system 100 includes a video data source 112
- the video data source 112 may be a storage medium such as a Blu-ray disc or an HD-DVD disc.
- the video data source may also be a television signal, or any other source of video data that is encoded according to a widely recognized standard, such as one of the MPEG standards.
- Embodiments of the invention will be described with reference to the H 264 compression scheme, which is used in the MPEG-4 standard. Embodiments provide particular performance benefits for decoding H.264 data, but the invention is not so limited.
- the particular examples given are for thorough illustration and disclosure of the embodiments, but no aspects of the examples are intended to limit the scope of the invention as defined by the claims
- System 100 further includes a central processing unit (CPU)-based processor 108 that receives compressed, or encoded, video data 109 from the video data source 112.
- the CPU-based processor 108 in accordance with the standard governing the encoding of the data 109, processes the data 109 and generates control maps 106 in a known manner.
- the control maps 106 include data and control information formatted in such a way as to be meaningful to video processing software and hardware that further processes the control maps 106 to generate a picture to be displayed on a screen.
- the system 100 includes a graphics processing unit (GPU) 102 that receives the control maps 106.
- the GPU 102 may be integral to the system 100.
- the GPU 102 may be part of a chipset made for inclusion in a personal computer (PC) along with the CPU-based processor 108.
- the GPU 102 may be a component that is added to the system 100 as a graphics card or video card, for example.
- the GPU 102 is designed with multiple processing cores, also referred to herein as multiple processing pipelines or multiple pipes.
- the multiple pipelines each contain similar hardware and can all be run simultaneously on different sets of data to increase performance.
- the GPU 102 can be classed as a single instruction multiple data (SIMD) architecture, but embodiments are not so limited
- the GPU 102 includes a layered decoder 104, which will be described in greater detail below
- the layered decoder 104 interprets the control maps 106 and pre-processes the data and control information so that processing hardware of the GPU 102 can optimally perform parallel processing of the data.
- the GPU 102 thus performs hardware-accelerated video decoding
- the GPU 102 processes the encoded video data and generates display data 115 for display on a display 114.
- the display data 115 is also referred to herein as frame data or decoded frames.
- the display 114 can be any type of display appropriate to a particular system 100, including a computer monitor, a television screen, etc.
- a SIMD architecture is most effective when it conducts multiple, massively parallel computations along substantially the same control flow path.
- embodiments of the layered decoder 104 include an H.264 decoder running GPU hardware to minimize the flow control deviation in each shader thread.
- a shader as referred to herein is a software program specifically for rendering graphics data or video data as known in the art.
- a rendering task may use several different shaders
- a luma or chroma 8-bit value is called a pel. All luma pels in a frame are named in the Y plane
- the Y plane has a resolution of the picture measured in pels For example, if the picture resolution is said to be 720x480, the Y plane has 720x480 pels. Chroma pels are divided into two planes: a U plane and a V plane.
- a so-called 420 format is used.
- the 420 format uses U and V planes having the same resolution, which is half of the width and height of the picture. In a 720x480 example, the U and V resolution is 360x240 measured in pels.
- Hardware pixels are pixels as they are viewed by the GPU on the read from memory and the write to the memory. In most cases this is a 4-channel, 8-bit per channel pixel commonly known as RGBA or ARGB
- pixel also denotes a 4x4 pel block selected as a unit of computation It means that as far as the scan converter is concerned this is the pixel, causing the pixel shader to be invoked per each 4x4 block.
- the resolution of the target surface presented to the hardware is defined as one quarter of the width and of the height of the original picture resolution measured in pels. For example, returning to the 720x480 picture example, the resolution of the target is 180x120.
- the block of 16x16 pels also referred to as a macro block, is the maximal semantically unified chunk of video content, as defined by MPEG standards.
- a block of 4x4 pels is the minimal semantically unified chunk of the video content.
- target frame layouts There are 3 different physical target picture or target frame layouts employed depending on the type of the picture being decoded.
- the target frame layouts are illustrated in Tables 1-3
- PicWidth be the width of the picture in pels (which is the same as bytes) and PicHeight be the height of the picture in scan lines (for example, 720x480 in the previous example.
- Table 1 shows the physical layout based on the picture type.
- the field type picture keeps even and odd fields separately until a last "interleaving" pass
- the AFF type picture keeps field macro blocks as two complimentary pairs until the last "interleaving” pass.
- the interleaving pass interleaves even and odd scan lines and builds one progressive frame.
- Embodiments described herein include a hardware decoding implementation of the H.264 video standard.
- H.264 decoding contains three major parts: inter- prediction; intra-prediction; and deblocking filtering,,
- inter- prediction and intra-prediction are also referred to as motion compensation because of the effect of performing inter-prediction and intra-prediction,
- a decoding algorithm consists of three "logical" passes, Each logical pass adds another layer of data onto the same output picture or frame.
- the first "logical" pass is the inter-prediction pass with added inversed transformed coefficients.
- the first pass produces a partially decoded frame.
- the frame includes macro blocks designated by the encoding process to be decoded using either inter-prediction or intra-prediction. Because only the inter-prediction macro blocks are decoded in the first pass, there will be “holes” or "garbage” data in place of intra-prediction macro blocks.
- a second "logical" pass touches only intra-prediction macro blocks left after the first pass is complete
- the second pass computes the intra-prediction with added inversed transformed coefficients.
- a third pass is a deblocking filtering pass, which includes a deblock control map generation pass.
- the third pass updates pels of the same picture along the sub- block (e.g., 4x4 pels) edges.
- Each logical pass may include many physical hardware passes.
- all of the passes are pre-programmed by a video driver, and the GPU hardware moves from one pass to another autonomously,
- FIG. 2 is a block diagram of elements of a GPU 202 according to an embodiment.
- the GPU 202 receives control maps 206 from a source such as a host processor or host CPU,
- the GPU 202 includes a video driver 222 which, in an embodiment, includes a layered decoder 204.
- the GPU 202 also includes processing pipelines 220A, 220B, 220C, and 220D. In various embodiments, there could be less than four or more than four pipelines 220. In other embodiments, more than one GPU 202 may be combined to share processing tasks,
- the number of pipelines is not intended to be limiting, but is used in this description as a convenient number for illustrating embodiments of the invention. In many embodiments, there are significantly more than four pipelines. As the number of pipelines is increased, the speed and efficiency of the GPU is increased.
- the driver 222 in various embodiments, is software that can be downloaded by a user of an existing GPU to extend new layered decoding capability to the existing GPU, The same driver can be appropriate for all existing GPUs with similar architectures. Multiple drivers can be designed and made available for different architectures .
- One common aspect of drivers including layered decoders described herein is that they immediately allow efficient decoding of video data encoded using H.264 and similar formats by maximizing the use of available graphics processing pipelines on an existing GPU.
- the GPU 202 further includes a Z-buffer 216 and a reference buffer 218.
- Z buffer is used as control information, for example to decide which macro blocks are processed and which are not in any layer.
- the reference buffer 218 is used to store a number of decoded frames in a known manner. Previously decoded frames are used in the decoding algorithm, for example to predict what a next or subsequent frame might look like.
- Control maps 306 are generated by a host processor such as a CPU, as previously described.
- the control maps 306 are generated according to the applicable standard, for example MPEG-4.
- the control maps 306 are generated on a per-frame basis.
- a control map 306 is received by the GPU (as shown in Figures 1 and T).
- the control maps 306 include various information used by the GPU to direct the graphics processing according to the applicable standard.
- the video frame is divided into macro blocks of certain defined sizes, Each macro block may be encoded such that either inter-prediction or intra- prediction must be used to decode it.
- the decision to encode particular macro blocks in particular ways is made by the encoder.
- One piece of information conveyed by the control maps 306 is which decoding method (e g., inter-prediction or intra-prediction) should be applied to each macro block,
- the encoding scheme is a compression of data
- one of the aspects of the overall scheme is a comparison of one frame to the next in time to determine what video data does not change, and what video data changes, and by how much. Video data that does not change does not need to be explicitly expressed or transmitted, thus allowing compression.
- the process of decoding, or decompression, according to the MPEG standards involves reading information in the control maps 306 including this change information per 1 unit of video data in a frame, and from this information, assembling the frame. For example, consider a macro block whose intensity value has changed from one frame to another. During inter-piediction, the decoder reads a residual from the control maps 306. The residual is an intensity value expressed as a number. The residual represents a change in intensity from one frame to the next for a unit of video data.
- the decoder must then determine what the previous intensity value was and add the residual to the previous value.
- the control maps 306 also store a reference index.
- the reference index indicates which previously decoded frame of up to sixteen previously decoded frames should be accessed to retrieve the relevant, previous reference data.
- the control maps also store a motion vector that indicates where in the selected reference frame the relevant reference data is located. In an embodiment, the motion vector refers to a block of 4x4 pels, but embodiments are not so limited,
- the GPU performs preprocessing on the control map 306, including setup passes 308, to generate intermediate control maps 307.
- the setup passes 308 include sorting surfaces for performing inter-prediction for the entire frame, intra-prediction for the entire frame, and deblocking for the entire frame, as further described below.
- the setup passes 308 also include intermediate control map generation for deblocking passes according to an embodiment
- the setup passes 308 involve running "pre- shaders" that can be referred to as software programs of relatively small size (compared to the usual rendering shaders) to read the control map 306 without incurring the performance penalty for running the usual rendering shaders.
- the intermediate control maps 307 are the result of interpretation and reformulation of control map 306 data and control information so as to tailor the data and control information to run in parallel on the particular GPU hardware in an optimized way
- all the control maps are generated by the GPU.
- the initial control maps are CPU-friendly and data is arranged per macro block.
- Another set of control maps can be generated from the initial control maps using the GPU, where data is arranged per frame (for example, one map for motion vectors, one map for residual).
- setup passes 308 generate intermediate control maps 307, shaders are run on the GPU hardware for inter-prediction passes 310.
- inter- prediction passes 310 may not be available because the frame was encoded using intra-prediction only. It is also possible for a frame to be encoded using only inter- prediction. It is also possible for deblocking to be omitted
- the inter- prediction passes are guided by the information in the control maps 306 and the intermediate control maps 307.
- Intermediate control maps 307 include a map of which macro blocks are inter -prediction macro blocks and which macro blocks are intra-prediction macro blocks.
- Inter-prediction passes 310 read this "inter- intra" information and process only the macro blocks marked as inter-prediction macro blocks.
- the intermediate control maps 307 also indicate which macro blocks or portions of macro blocks may be processed in parallel such that use of the GPU hardware is optimized.
- there are four pipelines which process data simultaneously in inter-prediction passes 310 until inter -prediction has been completed on the entire frame.
- the solution described here can be scaled with the hardware such that more pipelines allow simultaneous processing of more data
- Intra-prediction passes 314 use the control maps 306 and the intermediate control maps 307 to perform intra-prediction on all of the intra-prediction macro blocks of the frame.
- the intermediate control maps 307 indicate which macro blocks are intra-prediction macro blocks.
- Intra-prediction involves prediction of how a unit of data will look based on neighboring units of data within a frame This is in contrast to inter-prediction, which is based on differences between frames In order to perform intra-prediction on a frame, units of data must be processed in an order that does not improperly overwrite data.
- the intermediate control maps 307 include a deblocking map (if available) that indicates an order of edge processing and also indicates filtering parameter. No deblocking map is available if deblocking is not required.
- deblocking the data from adjacent macro block edges is combined and rewritten so that the visible transition is minimized.
- the data to be operated on is written out to scratch buffers 322 for the purpose of rearranging the data to be optimally processed in parallel on the hardware, but embodiments are not so limited.
- a completely decoded frame 320 is stored in the reference buffer (reference buffer 218 of Figure 2, for example), This is the reference buffer accessed by the inter-prediction passes 310, as shown by arrow 330.
- Figure 4 is another diagram illustrating a flow 400 of data and control in video data decoding according to an embodiment.
- Figure 4 is another perspective of the operation illustrated in Figure 3 with more detail.
- Control maps 406 are received by the GPU.
- a comparison value in the Z-buffer is set to "inter” at 408.
- the comparison value can be a single bit that is set to "1" or "0", but embodiments are not so limited.
- a small shader, or "pre-shader” 410 is run on the control maps 406 to create the Z-buffer 412 and intermediate control maps 413.
- the Z-buffer includes information that tells an inter-prediction shader 414 which macro blocks are to be inter-predicted and which are not. In an embodiment this information is determined by Z-testing, but embodiments are not so limited. Macro blocks that are not indicated as inter- prediction macro blocks will not be processed by the inter-prediction shader 414, but will be skipped or discarded.
- the inter -prediction shader 414 is run on the data using control information from control maps 406 and an intermediate control map 413 to produce a partially decoded frame 416 in which all of the inter -prediction macro blocks are decoded, and all of the remaining macro blocks are not decoded.
- the Z buffer testing of whether a macro block is an inter- prediction macro block or an intra-prediction macro block is performed within the inter prediction shader 414.
- the value set at 408 is then reset at 418 to indicate intra-prediction.
- the value is not reset, but rather another buffer is used.
- the Z-buffer includes information that tells an intra-piediction shader 424 which macro blocks are to be intra -predicted and which are not, In an embodiment this information is determined by Z-testing, but embodiments are not so limited, Macro blocks that are not indicated as intra-prediction macro blocks will not be processed by the intra-prediction shader 424, but will be skipped or discarded,
- the inter-prediction shader 424 is run on the data using control information from control maps 406 and an intermediate control map 422 to produce a frame 426 in which all of the inter -prediction macro blocks are decoded and all of the intra-prediction macro blocks are decoded, This is the frame that is processed in the deblocking operation.
- inter-prediction is a way to use pels from reference pictures or frames (future (forward) or past (backward)) to predict the pels of the current frame.
- Figure 5 is a diagram illustrating a data and control flow of an inter- prediction process 500 for a frame according to an embodiment.
- the geometrical mesh for each inter -prediction pass consists of a grid of 4x4 rectangles in the Y part of the physical layout and 2x2 rectangles in the UV part (16x16 or 8x8 pels, where 16x16 pels is a macro block).
- a shader parses the control maps for each macro block's control information and broadcasts the preprocessed control information to each pixel (in this case, a pixel is a 4x4-block).
- the control information includes an 8-bit macro block header, multiple IT coefficients and their offsets, 16 pairs of motion vectors and 8 reference frame selectors. Z-testing as previously described indicates whether the macro block is not an inter- prediction block, in which case, its pixels will be "killed” or skipped from “rendering".
- a particular reference frame among various reference frames in the reference buffer is selected using the control information. Then, at 506, the reference pels within the reference frame are found. In an embodiment, finding the correct position of the reference pels inside the reference frame includes computing the coordinates for each 4x4 block. The input to the computation is the top-left address of the target block in pels, and the delta obtained from the proper 1 control map. The target block is the destination block, or the block in the frame that is being decoded.
- MvDx, MvDy be the delta obtained from the control map.
- MvDx,MvDy are the x,y deltas computed in the appropriate coordinate system. This is true for a frame picture and frame macro block of an AFF picture in frame coordinates, and for a field picture and field macro block of an AFF picture in the field coordinate system of proper polarity.
- the delta is the delta between the X 5 Y coordinates of the target block and the X 3 Y coordinates of the source (reference) block with 4-bit fractional precision.
- the reference pels When the reference pels are found, they are combined at 508 with the residual data (also referred to as "the residual") that is included in the control maps. The result of the combination is written to the destination in the partially decoded frame at 512.
- the process 500 is a parallel process and all blocks are submitted/executed in parallel. At the completion of the process, the frame data is ready for intra- prediction. In an embodiment, 4x4 blocks are processed in parallel as described in the process 500, but this is just an example. Other units of data could be treated in a similar way.
- intra-prediction is a way to use pels from other macro blocks or portions of macro blocks within a pictures or frame to predict the pels of the current macro block or portion of a macro block.
- Figures 6A, 6B, and 6C are diagrams of a macro block divided into different blocks according to an embodiment.
- Figure 6 A is a diagram of a macro block that includes 16x16 pels.
- Figure 6B is diagram of 8x8 blocks in a macro block.
- Figure 6C is a diagram of 4x4 blocks in a macro block.
- a shader parses the control maps to obtain control information for a macro block, and broadcasts the preprocessed control information to each pixel (in this case, a pixel is a 4x4-block),
- the information includes an 8-bit macro block header, a number of IT coefficients and their offsets, availability of neighboring blocks and their types, and for 16x16 and 8x8 blocks, prediction values and prediction modes, Z-testing as previously described indicates whether the macro block is not an intra-prediction block, in which case, its pixels will be "killed” or skipped from “rendering".
- FIG. 7 is a block diagram that illustrates these potential intra-block dependencies.
- Sub-block 702 depends on its neighboring sub-blocks 704 (left), 706 (up-left), 708 (up), and 710 (upright).
- the 16 pixels inside a 4x4 rectangle are rendered in a pass number indicated inside the cell.
- the intra- prediction for a UV macro block and a 16x16 macro block are processed in one pass.
- Intra-prediction for an 8x8 macro block is computed in 4 passes; each pass computes the intra-prediction for one 8x8 block from left to right and from top to bottom
- Table 4 illustrates an example of ordering in a 4x4 case.
- the primitives (blocks of 4x4 pels) rendered in the same pass are organized into a list in a diagonal fashion.
- Table 5 Each cell below in Table 5 is a 4x4 (pixel) rectangle. The number inside the cell connects rectangles belonging to the same lists. Table 5 is an example for 16*8 x 16*8 in the Y plane:
- the driver provides an availability mask for each type of block.
- the mask indicates which neighbor (upper, upper-right, upper-left or left is available). How the mask is used depends on the block. For some blocks not all masks are needed. For some blocks, instead of the upper -right masks, two left masks are used, etc. If the neighboring macro block is available, the pixels from it are used for the target block prediction according to the prediction mode provided to the shader by the driver.
- the following describes computation of neighboring pel coordinates for different temporal types of macro blocks of different picture types according to an embodiment.
- EvenMbXPU is a x coordinate of the complimentary pair of macro block
- EvenMbYPU is a y coordinate of the complimentary pair of macro block YPU is y coordinate of the current scan line.
- MbXPU is a x coordinate of the macro block containing the YPU scan line
- MbYPU is a y coordinate of the macro block containing the YPU scan line
- MbYMU is a y coordinate of the same macro block in macro block units
- MbYSzPU is a size of the macio block in Y direction.
- XNeighbrPU MbXPU - 1
- YNeighbrPU YPU
- XNeighbrPU MbXPU
- YNeighbrPU MbYPU - 1 ;
- YNeighbrPU EvenMbYPU + (YPU - EvenMbYPU)/2 + YIsOdd * MbYSzPU break;
- YNeighbrPU EvenMbYPU + (YPU - MbYPU)*2 + MblsOdd Function to compute x,y coordinates of pels in the neighboring macro bloc to the up:
- MblsOdd MbYMU % 2
- XNeighbrPU MbXPU Frame -> Frame: Frame -> Field:
- YNeighbrPU MbYPU - 1 - MbYSzPU * ( 1 - MblsOdd); break; Field -> Field.
- MblsOdd 1 ; // it allows always to elevate into the macro block of the same polarity.
- YNeighbrPU MbYPU - MbYSzPU * MblsOdd + MblsOdd - 2 , break;
- Figure 8 is a diagram illustrating a data and control flow 800 of an intra- prediction process according to an embodiment.
- the layered decoder parses the control map macro block header to determine types of subblocks within a macro block.
- the subblocks identified to be rendered in the same physical pass are assigned the same number "X" at 804.
- primitives to be rendered in the same pass are organized into lists in a diagonal fashion at 805.
- a shader is run on the subblocks with the same number "X" at 806.
- the subblocks are processed on the hardware in parallel using the same shader, and the only limitation on the amount of data processed at one time is the amount of available hardware
- FIG. 9 is a block diagram of a frame 902 after inter-prediction and intra-prediction have been performed.
- Figure 9 illustrates the deblocking interdependency among macro blocks.
- Some of the macro blocks in frame 902 are shown and numbered Each macro block depends on its neighboring left and top macro blocks, meaning these left and top neighbors must be deblocked first.
- macro block 0 has no dependencies on other macro blocks.
- Macro blocks 1 each depend on macro block 0, and so on. Each similarly numbered macro block has similar inter dependencies.
- Embodiments of the invention exploit this arrangement by recognizing that all of the similar macro blocks can be rendered in parallel.
- each diagonal strip is rendered in a separate pass.
- the deblocking operation moves through the frame 902 to the right and down as shown by the arrows in Figure 9.
- Figures 1OA and 1OB are block diagrams of macro blocks illustrating vertical and horizontal deblocking, which are performed on each macro block
- Figure 1OA is a block diagram of a macro block 1000 that shows how vertical deblocking is arranged
- Macro block 1000 is 16x16 pels, as previously defined. This includes 16x4 pixels as pixels are defined in an embodiment.
- the numbered dashed lines 0, 1, 2, and 3 designate vertical edges to be deblocked. In other embodiment there may be more or less pels per pixel, for example depending on a GPU architecture
- Figure 1OB is a block diagram of the macro block 1000 that shows how horizontal deblocking is arranged.
- the numbered dashed lines 0, 1, 2, and 3 designate horizontal edges to be deblocked
- Figures HA, HB, HC, and HD show the pels involved in vertical deblocking for each vertical edge in the macro block 1000.
- Figure 1 IA the shaded pels, including pels from a previous (left neighboring) macro block are used in the deblocking operation for edge 0.
- Figure 1 ID the shaded pels on either side of edge 3 are used in a vertical deblocking operation for edge 3.
- Figures 12A, 12B, 12C, and 12D show the pels involved in horizontal deblocking for each horizontal edge in the macro block 1000.
- Figure 12A the shaded pels, including pels from a previous (top neighboring) macro block are used in the deblocking operation for edge 0.
- the pels to be processed in the deblocking algorithm are copied to a scratch buffer (for example, see Figure 3) in order to optimally arrange the pel data to be processed for a particular graphics processing, or video processing architecture,
- a unit of data on which the hardware operates is referred to as a "quad".
- a quad is 2x2 pixels, where a pixel is meant as a "hardware pixels"
- a hardware pixel can be 2x2 of 4x4 pels, 8x8 pels, or 2x2 of ARGB pixels, or others arrangements.
- the data to be processed in horizontal deblocking and vertical deblocking is first remapped onto a quad structure in the scratch buffer.
- the deblocking processing is performed and the result is written to the scratch buffer, then back to the frame in the appropriate location.
- the pels are grouped to exercise all of the available hardware.
- the pels to be processed together may come from anywhere in the frame as long as the macro blocks from which they come are all of the same type. Having the same type means having the same macro block dependencies.
- the use of a quad as a unit of data to be processed and the processing of four quads at one time are just one example of an implementation.
- the same principles applied in rearranging the pel data for processing can be applied to any different graphics processing architecture,
- deblocking is performed for each macro block starting with a vertical pass (vertical edge 0, vertical edge 1, vertical edge 2, vertical edge 3) and then a horizontal pass (horizontal edge 0, horizontal edge 1 , horizontal edge 2, horizontal edge 3).
- a vertical pass vertical edge 0, vertical edge 1, vertical edge 2, vertical edge 3
- a horizontal pass horizontal edge 0, horizontal edge 1 , horizontal edge 2, horizontal edge 3
- FIGS 13-19 are block diagrams that illustrate mapping to the scratch buffer according to an embodiment, These diagrams are an example of mapping to accommodate a particular architecture and are not intended to be limiting.
- Figure 13 A is a block diagram of a macro block that shows vertical edges 0-3.
- the shaded area represents data involved in a deblocking operation for edges 0 and 1, including data (on the far left) from a previous macro block.
- Figure 13B is a block diagram that shows the conceptual mapping of the shaded data from Figure 13A into the scratch buffer.
- there are three scratch buffer that allow 16x3 pixels to fit in an area of 4x4 pixels but other embodiments are possible within the scope of the embodiments.
- deblocking mapping allows optimal use of four pipelines (Pipe 0, Pipe 1, Pipe 2, and Pipe 3) in the example architecture that has been previously described herein.
- Pipe 0 Pipe 1
- Pipe 2 Pipe 2
- Pipe 3 Pipe 3
- deblocking as described is also applicable or adaptable to future architectures (for example, 8x8 or 16x16) in which the screen tiling may not really exist.
- Figure 14A is a block diagram that shows multiple macro blocks and their edges. Each of the macro blocks is similar to the single macro block shown in Figure 13 A.
- Figure 14A shows the data involved in a single vertical deblocking pass according to an embodiment.
- Figure 14B is a block diagram that shows the mapping of the shaded data from Figure 14A into the scratch buffer in an arrangement that optimally uses the available hardware.
- Figure 15A is a block diagram of a macro block that shows horizontal edges 0- 3.
- the shaded area represents data involved in a deblocking operation for edge 0, including data (at the top) from a previous macro block
- Figure 15B is a block diagram that shows the conceptual mapping of the shaded data from Figure 15A into the scratch buffer in an arrangement that optimally uses available pipelines in the example architecture that has been previously described herein.
- Figure 16 A is a bock diagram that shows multiple macro blocks and their edges. Each macro block is similar to the single macro block shown in Figure 15A.
- the shaded data is the data involved in deblocking for edges 0.
- Figure 16B is a block diagram that shows the mapping of the shaded data from Figure 16A into the scratch buffer in an arrangement that optimally uses the available hardware for performing deblocking on edges 0.
- Figure 17A is a bock diagram that shows multiple macro blocks and their edges.
- the shaded data is the data involved in deblocking for edges 1
- Figure 17B is a block diagram that shows the mapping of the shaded data from Figure 17A into the scratch buffet in an arrangement that optimally uses the available hardware for performing deblocking on edges 1.
- Figure 18 A is a bock diagram that shows multiple macro blocks and their edges.
- the shaded data is the data involved in deblocking for edges 2.
- Figure 18B is a block diagram that shows the mapping of the shaded data from Figure 18 A into the scratch buffer in an arrangement that optimally uses the available hardware for performing deblocking on edges 2.
- Figure 19A is a bock diagram that shows multiple macro blocks and their edges.
- the shaded data is the data involved in deblocking for edges 3
- Figure 19B is a block diagram that shows the mapping of the shaded data from Figure 19A into the scratch buffer in an arrangement that optimally uses the available hardware for performing deblocking on edges 3.
- mapping shown in Figures 13-19 is just one example of a mapping scheme for rearranging the pel data to be processed in a manner that optimizes the use of the available hardware
- a scratch buffer could also be used in the inter-prediction and/or intra-prediction operations.
- a scratch buffer may or may not be more efficient than processing "in place".
- the deblocking operation benefits from using the scratch buffer
- the size and configuration of the pel data to be processed and the number of processing passes required do not vary.
- the order of the copies can vary. For example, copying can be done after every diagonal or after all of the diagonals.
- the rearrangement for a particular architecture does not vary, and any performance penalties related to copying to the scratch buffer and copying back to the frame can be calculated These performance penalties can be compared to the performance penalties associated with processing the pel data in place, but in configurations that are not optimized for the hardware, An informed choice can then be made regarding whether to use the scratch buffer or not.
- the units of data to be processed are randomized by the encoding process, so it is not possible to accurately predict gains or losses associated with using the scratch buffer, and the overall performance over time may be about the same as for processing in place.
- the deblocking filtering is performed by a vertex shader for an entire macro block.
- the vertex shader works as a dedicated hardware pipeline.
- the deblocking algorithm involves two passes. The first pass is a vertical pass for all macro blocks along the diagonal being filtered (or deblocked). The second pass is a horizontal pass along the same diagonal.
- the vertex shader process 256 pels of the luma macro block and 64 pels of each chroma macro block, In an embodiment, the vertex shader passes resulting filtered pels to pixel shaders through 16 parameter registers Each register (128 bits) keeps one 4x4 filtered block of data.
- the "virtual pixel", or the pixel visible to the scan converter is an 8x8 block of pels for most of the passes.
- eight render targets are defined. Each render target has a pixel format with two channels, and 32 bits per channel
- the pixel shader is invoked per 8x8 block.
- the pixel shader selects four proper registers from the 16 provided, rearranges them into eight 2x32-bit output color registers, and sends the data to the color buffer.
- two buffers are used, a source buffer, and a target buffer.
- the target buffer is the scratch buffer
- the source buffer is used as texture and the target is comprised of either four or eight render targets.
- the following tables illustrate buffer states during deblocking.
- Figures 20 and 21 show the state of the source buffer ( Figure 20) and the target buffer ( Figure 21) at the beginning of an algorithm iteration designated by the letter C.
- C marks the diagonal of the macro blocks to be filtered at the iteration C.
- P marks the previous diagonal. Both source buffer and target buffer keep the same data. Darkly shaded cells indicate already filtered macro blocks, white cells indicate not- yet- filtered macro blocks, Lightly shaded cells are partially filtered in the previous iteration.
- the iteration C consists of several passes. Passl . Filtering the left side of the 0 th vertical edge of each C macro block.
- This pass is running along the C diagonal.
- the vertex/pixel shader pair is in a standard mode of operation. That is, the vertex shader sends 16 registers keeping a packed block of 4x4 pels each, and the pixel shader is invoked per 8x8 block, target pixel format (2 channel, 32 bit per channel). There are 8 render targets.
- Figure 23 shows the state of the target after the vertical filtering. After pass2 the source and target are switched.
- Pass3 Copying the state of the P diagonal only from the new source (old target) to the new target (old source).
- Figure 23 is a new source now.
- Figure 24 presents the state of the new target after the copy. In this pass the vertex shader does nothing The pixel shader copies texture pixels in standard mode (format: 2 channels, 32 per channel, virtual pixel is 8x8) directly into the frame buffer. 8 render targets are involved.
- Pass4 Filtering the up side of the 0 horizontal edge of each C macro block.
- Figure 26 is a now source.
- Figure 23 is a new target.
- Figure 27 is the state of the target after copy. The copying is done the same way as described with reference to Pass3.
- Embodiments of the decoding method and system include a video data decoding method comprising: pre-processing control maps generated from encoded video data that was encoded according to a pie-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information; and decoding the encoded video data, wherein decoding comprises parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines ,
- control information comprises control information specific to an architecture of a graphics processing unit (GPU).
- the plurality of processing pipelines comprise a plurality of graphics processing unit (GPU) pipelines.
- the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of data in a frame, and wherein the control information comprises an indication of which prediction operation was used to encode each unit of data in the frame.
- An embodiment further comprises decoding the encoded video data on a frame basis such that each one of several decoding operations is performed on an entire frame of video data at a time.
- the control information comprises a rearrangement of the video data such that a decoding operation can be performed in parallel on multiple video data using the plurality of GPU pipelines.
- pre-processing further comprises creating a buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps, and wherein the buffer contains a subset of the control information.
- the buffer is a Z-buffer.
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H.264.
- the pre-defined format comprises an MPEG standard video format.
- An embodiment further comprises decoding the encoded video data on a frame basis such that each one of several decoding operations is performed on an entire frame of video data at a time
- the several decoding operations comprise inter-prediction, intra-prediction, and deblocking.
- Embodiments of the decoding method and system further include a system for decoding video data encoded using a high-compression-ratio codec, the system comprising, a processing unit, comprising, a plurality of processing pipelines; and a driver comprising a layered decoder, wherein the layered decoder pre-processes control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, including control information specific to the plurality of processing pipelines
- An embodiment further comprises a Z-buffer coupled to the driver, wherein the Z-buffer is created from the control maps, and wherein generating the intermediate control maps comprises performing Z-testing on the Z-buffer
- control information comprises information regarding rearranging the video data and directing the processing of the video data to be performed in parallel on the plurality of processing pipelines.
- An embodiment further comprises a scratch buffer coupled to the driver, wherein the scratch buffer stores rearranged data for processing.
- Embodiments of the decoding method and system further include a method for decoding video data encoded using a high-compression-ratio codec, the method comprising: pre-processing control maps that were generated during encoding of the video data; and generating intermediate control maps comprising information regarding decoding the video data on a frame basis such that each of multiple, distinct decoding operations is performed on an entire frame at one time, and further regarding rearranging the video data to be processed in parallel on multiple pipelines of a graphics processing unit (GPU) so as to optimize the use of the multiple pipelines
- An embodiment further comprises executing a plurality of setup passes on the control maps, comprising peiforming Z-testing of a Z-buffer created from the control maps.
- An embodiment further comprises: determining from the intermediate control maps which data units within a frame are inter-prediction video data units; and performing inter-prediction on all of the inter-prediction data units within the frame.
- An embodiment further comprises: determining from the intermediate control maps which data units within a frame are intra-prediction video data units; and performing intra-prediction on all of the intra-prediction video data units within the frame,
- An embodiment further comprises: determining from the intermediate control maps video data units that do not have inter-unit dependencies for deblocking filtering, and rearranging the video data units that do not have inter-unit dependencies such that the data units that do not have inter-unit dependencies can be processed in parallel on the multiple pipelines.
- An embodiment further comprises mapping the rearranged data units that do not have inter-unit dependencies to a scratch buffer for processing,
- Embodiments of the decoding method and system further include a computer readable medium including instructions which when executed in a video processing system cause the system to process the encoded video data, the processing comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, including control information specific to an architecture of a video processing unit; and decoding the encoded video data, wherein decoding comprises parallel processing using the intermediate control maps to optimize usage of a plurality of graphics processing unit (GPU) pipelines.
- GPU graphics processing unit
- the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of data in a frame, and wherein the control information comprises an indication of which prediction operation was used to encode each unit of data in the frame,
- the processing further comprises decoding the encoded video data on a frame basis such that each one of several decoding operations is performed on an entire frame of video data at a time.
- control information comprises a rearrangement of the video data such that a decoding operation can be performed in parallel on multiple video data using the plurality of GPU pipelines,
- pie-processing further comprises creating a Z-buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps.
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H.264.
- the pre-defined format comprises an MPEG standard video format.
- the processing further comprises decoding the encoded video data on a frame basis such that each one of several decoding operations is performed on an entire frame of video data at a time.
- the several decoding operations comprise inter-prediction, intra-prediction, and deblocking.
- Embodiments of the decoding method and system further include a computer readable medium having instructions stored thereon which, when processed, are adapted to create a circuit capable of performing a method comprising: pre-processing control maps generated fr ⁇ m encoded video data that was encoded according to a predefined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, including control information specific to an architecture of a video processing unit, and decoding the encoded video data, wherein decoding comprises parallel processing using the intermediate control maps to optimize usage of a plurality of graphics processing unit (GPU) pipelines.
- pre-processing comprises generating a plurality of intermediate control maps containing control information, including control information specific to an architecture of a video processing unit
- decoding comprises parallel processing using the intermediate control maps to optimize usage of a plurality of graphics processing unit (GPU) pipelines.
- GPU graphics processing unit
- Embodiments of the decoding method and system further include a computer having instructions store thereon which, when implemented in a video processing driver, cause the driver to perform a parallel processing method, the method comprising: pre-processing control maps that were generated from encoded video data; and generating intermediate control maps comprising information regarding decoding the video data on a frame basis such that each of multiple, distinct decoding operations is performed on an entire frame at one time, and further regarding rearranging the video data to be processed in parallel on multiple pipelines of a graphics processing unit (GPU) so as to optimize the use of the multiple pipelines.
- GPU graphics processing unit
- Embodiments of the decoding method and system further include a graphics processing unit (GPU) configured to: pre-process control maps that were generated from encoded video data; generate intermediate control maps; and use the intermediate control maps to perform decoding of the video data on a frame basis such that each of multiple, distinct decoding operations is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines.
- GPU graphics processing unit
- Embodiments of the decoding method and system further include a video processing apparatus comprising: circuitry configured to pre-process control maps that were generated from encoded video data that was encoded according to a predefined format, and to generate intermediate control maps; and driver circuitry configured to read the intermediate control maps for controlling a video data decoding operation; and multiple video processing pipeline circuitry configured to respond to the driver circuitry to perform decoding of the video data on a frame basis such that each of multiple, distinct decoding operations is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines.
- a video processing apparatus comprising: circuitry configured to pre-process control maps that were generated from encoded video data that was encoded according to a predefined format, and to generate intermediate control maps; and driver circuitry configured to read the intermediate control maps for controlling a video data decoding operation; and multiple video processing pipeline circuitry configured to respond to the driver circuitry to perform decoding of the video data on a frame basis such
- Embodiments include a digital image generated by a method comprising: preprocessing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information; and decoding the encoded video data, wherein decoding comprises parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines.
- Embodiments of the decoding method and system further include a method for decoding video data, comprising: a first processor generating control maps from encoded video data; a second processor, receiving the control maps; generating intermediate control maps from the control maps, wherein the intermediate control maps include information specific to an architecture of the second processor; and using the intermediate control maps to decode the encoded video data.
- control maps comprise data and control information according to a specified format
- An embodiment further comprises the second processor using the intermediate control maps to perform parallel processing on the video data to generate display data.
- control maps are generated on a per frame basis.
- the architecture of the second processor comprises a type of architecture selected from a group comprising: a single instruction multiple data (SIMD) architecture; a multi-core architecture; and a multi- pipeline architecture.
- SIMD single instruction multiple data
- multi-core multi-core
- multi- pipeline architecture multi- pipeline architecture
- parallel processing comprises performing set up passes
- performing setup passes comprises at least one of: sorting passes to sort surfaces; inter-prediction passes; intra-prediction passes; and deblocking passes.
- Embodiments of the decoding method and system further include a method of upgrading a system to allow for decoding of video data comprising: causing an updated driver to be installed on the system, the updated driver containing computer readable instructions for adapting a system to pre-process control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information
- the computer readable instructions further adapt the system to decode the encoded video data, wherein decoding comprises parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines.
- Embodiments of the decoding method and system further include a hardware- accelerated decoding method, comprising: pre-processing encoded data, wherein the encoded data is encoded in a plurality of units of predefined sizes, wherein various units of the plurality of units have dependencies such that dependent units must be processed in a particular order, and wherein pre-processing comprises determining the dependencies.
- pre-processing further comprises designating units of data that have similar dependencies similarly, and processing similarly designated units in parallel.
- designating units of data comprises: designating units of data that have similar inter-block dependencies similarly; and designating units of data that have similar intra-block dependencies similarly.
- An embodiment further comprises: performing inter-prediction processing on similarly designated units of data in parallel; and performing intra-prediction processing on similarly designated units of data in parallel.
- the method further comprises: decoding the preprocessed video data; performing further preprocessing on the decoded video data to determine deblocking dependencies; and designating units of decoded data having similar dependencies similarly.
- An embodiment further comprises performing deblocking on multiple, similarly designated units in parallel.
- Embodiments of the decoding method and system further include a video data decoding method comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, the plurality of prediction operations comprising inter-prediction; determining from the intermediate control maps which indicated units of video data are to decoded using inter -prediction; performing inter-prediction on all of the indicated units of video data in the frame in parallel.
- An embodiment further comprises performing inter-prediction on all of the indicated video data in multiple interleaved frames in parallel.
- An embodiment further comprises parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines.
- the plurality of processing pipelines comprise a plurality of graphics processing unit (GPU) pipelines.
- GPU graphics processing unit
- control information comprises a rearrangement of the video data such that a decoding operation can be performed in parallel on multiple video data using the plurality of GPU pipelines,
- pre-processing further 1 comprises creating a buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps, and wherein the buffer contains a subset of the control information.
- the buffer is a Z-buffer
- determining comprises Z-testing to determine which of the plurality of prediction operations to perform on a unit of video data.
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H.264.
- the pre-defined format comprises an MPEG standard video format.
- performing inter-prediction on all of the indicated units of video data comprises broadcasting information from the intermediate control maps to each indicated unit of video data.
- An embodiment further comprises finding a reference frame using the information.
- An embodiment further comprises finding reference pels within the reference frame using the information.
- An embodiment further comprises combining reference pel data and residual data.
- An embodiment further comprises writing a result for each indicated unit of data to a partially decoded frame.
- Embodiments of the decoding method and system further include a system for decoding video data, the system comprising: a processing unit, comprising, a plurality of processing pipelines; and a driver comprising a layered decoder, wherein the layered decoder pre-processes control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information including, information indicating which of a plurality of prediction operations is to be performed on each unit of video data in a frame, the plurality of prediction operations comprising inter-prediction information regarding reference frames and reference pels; and control information specific to the plurality of processing pipelines.
- An embodiment further comprises a Z-buffer coupled to the driver, wherein the Z-buffer is created from the control maps, and wherein generating the intermediate control maps comprises performing Z-testing on the Z-buffer to determine which units of video data in a frame are to be decoded using inter-prediction.
- control information comprises a rearrangement of the video data such that a decoding operation can be performed in parallel on multiple video data using the plurality of GPU pipelines.
- Embodiments of the decoding method and system further include a method for decoding video data encoded using a high-compression- ratio codec, the method comprising: pre-processing control maps that were generated during encoding of the video data; and generating intermediate control maps comprising information regarding performing inter-prediction on the video data on a frame basis such inter-prediction is performed on an entire frame at one time.
- An embodiment further comprises executing a plurality of setup passes on the control maps, comprising performing Z-testing of a Z-buffer created from the control maps, wherein at least one Z-buffer test indicates which of the units of video data to perform inter-prediction on.
- Embodiments of the decoding method and system further include a computer readable medium including instructions which when executed in a video processing system cause the system to decode video data, the decoding comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre- defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, the plurality of prediction operations comprising inter-prediction; determining from the intermediate control maps which indicated units of video data are to decoded using inter-prediction; performing inter-prediction on all of the indicated units of video data in the frame in parallel.
- the decoding further comprises performing inter-prediction on all of the indicated video data in multiple interleaved frames in parallel.
- the decoding further comprises parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines.
- the plurality of processing pipelines comprise a plurality of graphics processing unit (GPU) pipelines,
- control information comprises a rearrangement of the video data such that a decoding operation can be performed in parallel on multiple video data using the plurality of GPU pipelines.
- pre-processing further comprises creating a Z-buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps.
- determining comprises Z-testing to determine which of the plurality of prediction operations to perform on a unit of video data.
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H 264.
- the pre-defined format comprises an MPEG standard video format.
- performing inter- prediction on all of the indicated units of video data comprises broadcasting information from the intermediate control maps to each indicated unit of video data.
- performing inter-prediction further comprises finding a reference frame using the information
- performing inter-prediction further comprises finding reference pels within the reference frame using the information.
- performing inter -prediction further comprises combining reference pel data and residual data.
- performing inter-prediction further comprises writing a result for each indicated unit of data to a partially decoded frame.
- Embodiments of the decoding method and system further include a computer readable medium having instructions stored thereon which, when processed, are adapted to create a circuit capable of performing a video data decoding method comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, the plurality of prediction operations comprising inter-prediction; determining from the intermediate control maps which indicated units of video data are to decoded using inter-prediction; performing inter- prediction on all of the indicated units of video data in the frame in parallel.
- Embodiments of the decoding method and system further include a computer having instructions store thereon which, when implemented in a video processing driver, cause the driver to perform a parallel processing method, the method comprising: pre-processing control maps that were generated from encoded video data; and generating intermediate control maps comprising information regarding decoding the video data on a frame basis such that an inter-prediction operation is perfoimed on an entire fiame at one time.
- Embodiments of the decoding method and system further include a graphics processing unit (GPU) configured to perform motion compensation, comprising: pre- processing control maps that were generated from encoded video data; generating intermediate control maps that indicate which units of video data in a frame are to be processed using an inter- prediction operation; and using the intermediate control maps to perform inter- prediction on the video data on a frame basis such that each inter- prediction is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines.
- GPU graphics processing unit
- Embodiments of the decoding method and system further include a video processing apparatus comprising: circuitry configured to pre-process control maps that were generated from encoded video data that was encoded according to a predefined format, and to generate intermediate control maps that indicate which units of video data in a frame are to be processed using an inter- prediction operation; and driver 1 circuitry configured to read the intermediate control maps for controlling a video data decoding operation, including performing the inter-prediction operation; and multiple video processing pipeline circuitry configured to respond to the driver circuitry to perform decoding of the video data on a frame basis such that the inter-prediction is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines.
- a video processing apparatus comprising: circuitry configured to pre-process control maps that were generated from encoded video data that was encoded according to a predefined format, and to generate intermediate control maps that indicate which units of video data in a frame are to be processed using an inter- prediction operation; and
- Embodiments include a digital image generated by a method comprising: pre- processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, the plurality of prediction operations comprising inter-prediction; determining from the intermediate control maps which indicated units of video data are to decoded using inter -prediction; performing inter -prediction on all of the indicated units of video data in the frame in parallel.
- Embodiments of the decoding method and system further include a method for decoding video data, comprising, a first processor generating control maps from encoded video data; a second processor, receiving the control maps; generating intermediate control maps from the control maps, wherein the intermediate control maps indicate which units of video data in a frame are to be processed using an inter- prediction operation; and using the intermediate control maps to decode the encoded video data, comprising performing inter-prediction on all of the indicated units in the frame in parallel
- the intermediate control maps further comprise information specific to an architecture of the second processor.
- An embodiment further comprises the second processor using the intermediate control maps to perform parallel processing on the video data to generate display data.
- control maps are generated on a per frame basis.
- the architecture of the second processor comprises a type of architecture selected from a group comprising: a single instruction multiple data (SIMD) architecture, a multi-core architecture; and a multi -pipeline architecture.
- SIMD single instruction multiple data
- parallel processing comprises performing set up passes
- performing setup passes comprises at least one of: sorting passes to sort surfaces, inter -prediction passes; and intra-prediction passes
- Embodiments of the decoding method and system further include a method of upgrading a system to allow for decoding of video data comprising: causing an updated driver to be installed on the system, the updated driver containing computer readable instructions for adapting a system to pre-process control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information indicating which units of video data in a frame are to be processed using an inter-prediction operation.
- the computer readable instructions further adapt the system to decode the encoded video data, wherein decoding comprises parallel processing using the intermediate control maps to perform inter-prediction on all of the indicated units of data in the frame in parallel
- Embodiments of the decoding method and system further include a video data decoding method comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, the plurality of prediction operations comprising intra-prediction; determining from the intermediate control maps which indicated units of video data are to decoded using intra-prediction; performing intra-prediction on all of the indicated units of video data in the frame in parallel.
- An embodiment further comprises performing intra-prediction on all of the indicated video data in multiple interleaved frames in parallel.
- An embodiment further comprises parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines,
- the plurality of processing pipelines comprise a plurality of graphics processing unit (GPU) pipelines.
- GPU graphics processing unit
- control information comprises designations for units of video data such that a decoding operation can be performed in parallel on similarly designated units of data using the plurality of GPU pipelines without errors due to inter- unit dependencies.
- pre-processing further comprises creating a buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps, and wherein the buffer contains a subset of the control information.
- the buffer is a Z-buffer.
- determining comprises Z-testing to determine which of the plurality of prediction operations to perform on a unit of video data.
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H.264.
- the pre-defined format comprises an MPEG standard video format.
- control information comprises types of sub-units within the units of video data.
- An embodiment further comprises similarly designating sub-units of video data to be processed concurrently using intra-prediction, wherein the similarly designated sub-units have similar inter -unit dependencies.
- An embodiment further comprises arranging similarly designated sub-units of video data diagonally within the frame.
- An embodiment further comprises running a shader on similarly designated sub- units of video data to perform intra-prediction on the similarly designated sub-units.
- An embodiment further comprises writing a result for each indicated unit of data to a partially decoded frame.
- Embodiments of the decoding method and system further include a system for decoding video data, the system comprising: a processing unit, comprising, a plurality of processing pipelines, and a driver comprising a layered decoder, wherein the layered decoder pre-processes control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information including, information indicating which of a plurality of rediction operations is to be performed on each unit of video data in a frame, the plurality of prediction operations comprising intra-prediction information regarding types of sub-units of video data within units of video data in the frame; and control information specific to the plurality of processing pipelines
- An embodiment further comprises a Z-buffer coupled to the driver, wherein the Z-buffer is created from the control maps, and wherein generating the intermediate control maps comprises performing Z-testing on the Z-buffer to determine which units of video data in a frame are to be decoded using intra-prediction.
- the control information comprises designations for units of video data such that a decoding operation can be performed in parallel on similarly designated units of data without e ⁇ ors due to inter-unit dependencies.
- Embodiments of the decoding method and system further include a method for decoding video data encoded using a high-compression-ratio codec, the method comprising: pre-processing control maps that were generated during encoding of the video data; and generating intermediate control maps comprising information regarding performing intra-prediction on the video data on a frame basis such intra-prediction is performed on an entire frame at one time, and further regarding sub-units of video data within the frame on which intra- prediction can be performed concurrently without errors due to dependencies between units of video data.
- An embodiment further comprises executing a plurality of setup passes on the control maps, comprising performing Z-testing of a Z-buffer created from the control maps, wherein at least one Z-buffer test indicates which of the units of video data to perform intra-prediction on.
- Embodiments of the decoding method and system further include a computer readable medium including instructions which when executed in a video processing system cause the system to decode video data, the decoding comprising: pre-processing control maps generated from encoded video data that was encoded according to a predefined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, the plurality of prediction operations comprising intra-prediction; determining from the intermediate control maps which indicated units of video data are to decoded using intra-prediction; performing intra-prediction on all of the indicated units of video data in the frame in parallel.
- the decoding further comprises performing intra-prediction on all of the indicated video data in multiple interleaved frames in parallel.
- the decoding further comprises parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines.
- the plurality of processing pipelines comprise a plurality of graphics processing unit (GPU) pipelines.
- the control information comprises designations for units of video data such that a decoding operation can be performed in parallel on similarly designated units of data using the plurality of GPU pipelines without errors due to inter- unit dependencies.
- pre-processing further comprises creating a Z-buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps,
- determining comprises Z-testing to determine which of the plurality of prediction operations to perform on a unit of video data
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H 264
- the pie-defined format comprises an MPEG standard video format.
- control information comprises types of sub-units within the units of video data.
- decoding further comprises similarly designating sub-units of video data to be processed concurrently using intra- prediction, wherein the similarly designated sub-units have similar inter-unit dependencies. In an embodiment decoding further comprises arranging similarly designated sub-units of video data diagonally within the frame.
- decoding further comprises running a shader on similarly designated sub-units of video data to perform intra- prediction on the similarly designated sub-units
- performing intra-prediction further comprises writing a result for each indicated unit of data to a partially decoded frame.
- Embodiments of the decoding method and system further include a computer readable medium having instructions stored thereon which, when processed, are adapted to create a circuit capable of performing a video data decoding method comprising, pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, the plurality of prediction operations comprising intra-prediction; determining from the intermediate control maps which indicated units of video data are to decoded using intra-prediction; performing intra-prediction on all of the indicated units of video data in the frame in parallel.
- Embodiments of the decoding method and system further include a computer having instructions store thereon which, when implemented in a video processing driver, cause the driver to perform a parallel processing method, the method comprising: pre-processing control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data on a frame basis such that an intra-prediction operation is performed on an entire frame at one time, and further regarding groups of sub-units of video data in the frame on which intra-prediction can be performed concurrently without errors due to inter-unit dependencies.
- Embodiments of the decoding method and system further include a graphics processing unit (GPU) configured to perform motion compensation, comprising: preprocessing control maps that were generated from encoded video data; generating intermediate control maps that indicate which units of video data in a frame are to be processed using an intra-prediction operation; and using the intermediate control maps to perform intra-prediction on the video data on a frame basis such that each intra- prediction is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines.
- GPU graphics processing unit
- Embodiments of the decoding method and system further include a video processing apparatus comprising, circuitry configured to pre-process control maps that were generated from encoded video data that was encoded according to a predefined format, and to generate intermediate control maps that indicate which units of video data in a frame are to be processed using an intra-prediction operation; and driver circuitry configured to read the intermediate control maps for controlling a video data decoding operation, including performing the intra-prediction operation; and multiple video processing pipeline circuitry configured to respond to the driver circuitry to perform decoding of the video data on a frame basis such that the intra-prediction is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines
- Embodiments include a digital image generated by a method comprising: preprocessing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the
- the intermediate control maps further comprise information specific to an architecture of the second processor.
- control maps comprise data and control information according to a specified video encoding format.
- An embodiment further comprises the second processor using the intermediate control maps to perform parallel processing on the video data to generate display data.
- control maps are generated on a per frame basis.
- the architecture of the second processor comprises a type of architecture selected from a group comprising: a single instruction multiple data (SIMD) architecture; a multi-core architecture; and a multi-pipeline architecture.
- SIMD single instruction multiple data
- parallel processing comprises performing set up passes.
- performing setup passes comprises at least one of: sorting passes to sort surfaces; inter-prediction passes; and intra-prediction passes.
- Embodiments of the decoding method and system further include a method of upgrading a system to allow for decoding of video data comprising: causing an updated driver to be installed on the system, the updated driver containing computer readable instructions for adapting a system to pre-process control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information indicating which units of video data in a frame are to be processed using an intra-prediction operation
- the computer readable instructions further adapt the system to decode the encoded video data, wherein decoding comprises parallel processing using the intermediate control maps to perform intra-prediction on all of the indicated units of data in the frame in parallel.
- Embodiments of the decoding method and system further include a hardware- accelerated intra-prediction method, comprising: pre-processing encoded video data that is encoded in a plurality of units of predefined sizes, wherein various units of the plurality of units have dependencies such that dependent units must be processed in a particular order, and wherein pre-processing comprises determining the dependencies,
- pre-processing further comprises designating units of data that have similar dependencies similarly, and performing intra-prediction on similarly designated units concurrently
- designating units of data comprises designating units of data that have similar inter-unit dependencies.
- Embodiments of the decoding method and system further include a video data motion compensation method comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein preprocessing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, and wherein the control information comprises an indication of which prediction operation was used to encode each unit of data in the frame; and decoding the encoded video data, wherein decoding comprises performing one of the indicated prediction operations in parallel on all of the video data in the frame encoded using the indicated prediction operations.
- the method comprises performing one of the indicated prediction operations in parallel on all of the video data in multiple interleaved frames encoded using the indicated prediction operations.
- the plurality of prediction operations comprise inter- prediction and intra-prediction,
- the plurality of processing pipelines comprise a plurality of graphics processing unit (GPU) pipelines.
- GPU graphics processing unit
- control information comprises a rearrangement of the video data such that a decoding operation can be performed in parallel on multiple video data using the plurality of GPU pipelines.
- pre-processing further comprises creating a buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps, and wherein the buffer contains a subset of the control information.
- the buffer is a Z-buffer.
- decoding further comprises Z- testing to determine which of the plurality of prediction operations to perform on a unit of video data.
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H.264.
- the pre-defined format comprises an MPEG standard video format.
- Embodiments of the decoding method and system further include a system for performing motion compensation in video data decoding, the system comprising: a processing unit, comprising, a plurality of processing pipelines; and a driver comprising a layered decoder, wherein the layered decoder pre-processes control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information including, information indicating which of a plurality of prediction operations is to be performed on each unit of video data in a frame; and control information specific to the plurality of processing pipelines.
- system further comprises a Z-buffer coupled to the driver, wherein the Z-buffer is created from the control maps, and wherein generating the intermediate control maps comprises performing Z-testing on the Z-buffer to determine which of the plurality of prediction operations is to be performed on each unit of video data in a frame.
- control information comprises information regarding rearranging the video data and directing the processing of the video data to be performed in parallel on the plurality of processing pipelines, comprising performing one of the plurality of prediction operations on all of the units of video data indicated by the information.
- Embodiments of the decoding method and system further include a method for motion compensation in decoding video data encoded using a high-compression-ratio codec, the method comprising: pre-processing control maps that were generated during encoding of the video data; and generating intermediate control maps comprising infoimation regarding performing motion compensation on the video data on a frame basis such that each of multiple, distinct motion compensation operations is performed on an entire frame at one time, and further regarding rearranging the video data to be processed in parallel on multiple pipelines of a graphics processing unit (GPU) so as to optimize the use of the multiple pipelines.
- GPU graphics processing unit
- the multiple, distinct motion compensation operations comprise inter -prediction and intra-prediction.
- An embodiment further comprises executing a plurality of setup passes on the control maps, comprising performing Z-testing of a Z-buffer created from the control maps, wherein at least one Z-buffer test indicates which of the multiple, distinct motion compensation operations is to be performed on each unit of video data in the frame
- An embodiment further comprises performing intra-prediction on all of the intra-prediction video data units within the frame.
- An embodiment further comprises performing inter-prediction on all of the intra-prediction video data units within the frame.
- Embodiments of the decoding method and system further include a computer readable medium including instructions which when executed in a video processing system cause the system to process encoded video data, including performing motion compensation, the processing comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein preprocessing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, and wherein the control information comprises an indication of which prediction operation was used to encode each unit of data in the frame; and decoding the encoded video data, wherein decoding comprises performing one of the indicated prediction operations in parallel on all of the video data in the frame encoded using the indicated prediction operations.
- the processing further comprises, performing one of the indicated prediction operations in parallel on all of the video data in multiple interleaved frames encoded using the indicated prediction operations.
- the plurality of prediction operations comprise inter- prediction and intra-prediction.
- decoding comprises parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines.
- the plurality of processing pipelines comprise a plurality of graphics processing unit (GPU) pipelines.
- the control information comprises a rearrangement of the video data such that a decoding operation can be performed in parallel on multiple video data using the plurality of GPU pipelines.
- pre-processing further comprises creating a Z-buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps.
- decoding further comprises Z-testing to determine which of the plurality of prediction operations to perform on a unit of video data.
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H.,264.
- the pre-defined format comprises an MPEG standard video format.
- Embodiments of the decoding method and system further include a computer readable medium having instructions stored thereon which, when processed, are adapted to create a circuit capable of performing a motion compensation method comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, and wherein the control information comprises an indication of which prediction operation was used to encode each unit of data in the frame; and decoding the encoded video data, wherein decoding comprises performing one of the indicated prediction operations in parallel on all of the video data in the frame encoded using the indicated prediction operations.
- Embodiments of the decoding method and system further include a computer having instructions store thereon which, when implemented in a video processing driver, cause the driver to perform a parallel processing method, the method comprising: pre-processing control maps that were generated from encoded video data; and generating intermediate control maps comprising information regarding decoding the video data on a frame basis such that each of multiple, distinct motion compensation operations is performed on an entire frame at one time, and further regarding rearranging the video data to be processed in parallel on multiple pipelines of a graphics processing unit (GPU) so as to optimize the use of the multiple pipelines.
- GPU graphics processing unit
- Embodiments of the decoding method and system further include a graphics processing unit (GPU) configured to perform motion compensation, comprising: preprocessing control maps that were generated from encoded video data; generating intermediate control maps that indicate which one of multiple prediction operations is to be used in performing motion compensation on particular units of data in a frame; and using the intermediate control maps to perform motion compensation on the video data on a frame basis such that each of the multiple, distinct prediction operations is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines.
- GPU graphics processing unit
- Embodiments of the decoding method and system further include a video processing apparatus comprising: circuitry configured to pre-process control maps that were generated from encoded video data that was encoded according to a predefined format, and to generate intermediate control maps that indicate which one of multiple prediction operations is to be used in performing motion compensation on particular units of data in a frame; and driver circuitry configured to read the intermediate control maps for controlling a video data decoding operation, including performing one or more of the multiple prediction operations; and multiple video processing pipeline circuitry configured to respond to the driver circuitry to perform decoding of the video data on a frame basis such that each of the multiple prediction operations is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines.
- a video processing apparatus comprising: circuitry configured to pre-process control maps that were generated from encoded video data that was encoded according to a predefined format, and to generate intermediate control maps that indicate which one of multiple prediction operations is to be used in performing motion compensation on
- a digital image generated by the method comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, and wherein the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of video data in a frame, and wherein the control information comprises an indication of which prediction operation was used to encode each unit of data in the frame; and decoding the encoded video data, wherein decoding comprises performing one of the indicated prediction operations in parallel on all of the video data in the frame encoded using the indicated prediction operations.
- Embodiments of the decoding method and system further include a method for decoding video data, comprising: a first processor generating control maps from encoded video data, a second processor, receiving the control maps, generating intermediate control maps from the control maps, wherein the intermediate control maps indicate which one of multiple prediction operations is to be used in performing motion compensation on particular units of data in a frame; and using the intermediate control maps to decode the encoded video data, comprising performing motion compensation by performing an indicated prediction operation on all of the particular units in the frame in parallel.
- the intermediate control maps further comprise information specific to an architecture of the second processor.
- the control maps comprise data and control information according to a specified video encoding format.
- An embodiment further comprises the second processor using the intermediate control maps to perform parallel processing on the video data to generate display data,
- control maps are generated on a per frame basis
- the architecture of the second processor comprises a type of architecture selected from a group comprising: a single instruction multiple data (SIMD) architecture, a multi-core architecture; and a multi -pipeline architecture.
- SIMD single instruction multiple data
- parallel processing comprises performing set up passes
- performing setup passes comprises at least one of: sorting passes to sort surfaces; inter-prediction passes; and intra-prediction passes.
- Embodiments of the decoding method and system further include a method of upgrading a system to allow for decoding of video data comprising: causing an updated driver to be installed on the system, the updated driver containing computer readable instructions for adapting a system to pre-process control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information indicating which one of multiple prediction operations is to be used in performing motion compensation on particular units of data in a frame.
- the computer readable instructions further adapt the system to decode the encoded video data, wherein decoding comprises parallel processing using the intermediate control maps to perform one of the indicated multiple prediction operations on all of the particular units of data in the frame in parallel
- decoding method and system further include a hardware- accelerated motion compensation method, comprising: pre-processing encoded video data that is encoded in a plurality of units of predefined sizes, wherein various units of the plurality of units have dependencies such that dependent units must be processed in a particular order, and wherein pre-processing comprises determining the dependencies.
- pre-processing further comprises : designating units of data that have similar 1 dependencies similarly, and processing similarly designated units in parallel.
- designating units of data comprises: designating units of data that have similar inter-unit dependencies similarly; and designating units of data that have similar intra-unit dependencies similarly.
- An embodiment further comprises: performing inter-prediction processing on similarly designated units of data in parallel; and performing intra-prediction processing on similarly designated units of data in parallel.
- Embodiments of the decoding method and system further include a video data decoding method comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information; and decoding the encoded video data, wherein decoding comprises: parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines; and performing deblocking on a frame of video data on which motion compensation has been performed.
- control information comprises control information specific to an architecture of a graphics processing unit (GPU).
- GPU graphics processing unit
- the plurality of processing pipelines comprise a plurality of graphics processing unit (GPU) pipelines.
- GPU graphics processing unit
- the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of data in a frame, and wherein the control information compiises an indication of which prediction operation was used to encode each unit of data in the frame.
- pre-processing further comprises creating a buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps, and wherein the buffer contains a subset of the control information.
- the buffer is a Z -buffer.
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H.264.
- the pre-defined format comprises an MPEG standard video format.
- An embodiment further comprises designating video data units in the frame on which one of vertical and horizontal deblocking can be performed concurrently.
- An embodiment further comprises: mapping a plurality of similarly designated video data units to a scratch buffer such that the plurality of video data units is optimally processed by a particular architecture.
- An embodiment further comprises: performing vertical deblocking on all of the similarly designated video data units, and performing horizontal deblocking on all of the similarly designated video data units.
- Embodiments of the decoding method and system further include a system for decoding video data encoded using a high-compression-ratio codec, the system comprising- a processing unit, comprising, a plurality of processing pipelines; and a driver comprising a layered decoder, wherein the layered decoder pre-processes control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, including designations of video data macro blocks, wherein a similar designation indicates similar deblocking dependencies
- An embodiment further comprises a Z-buffer coupled to the driver, wherein the Z-buffer is created from the control maps, and wherein generating the intermediate control maps comprises performing Z-testing on the Z-buffer
- control information comprises information regarding rearranging the video data and directing the processing of the video data to be performed in parallel on the plurality of processing pipelines
- An embodiment further comprises a scratch buffer coupled to the driver, wherein the scratch buffer stores rearranged data for processing.
- Embodiments of the decoding method and system further include a method for decoding video data encoded using a high-compression- ratio codec, the method comprising: pre-processing control maps that were generated during encoding of the video data; and generating intermediate control maps comprising information regarding decoding the video data on a frame basis such that a deblocking operation is performed on an entire frame at one time, and further regarding rearranging the video data to be processed in parallel on multiple pipelines of a graphics processing unit (GPU) so as to optimize the use of the multiple pipelines.
- a graphics processing unit GPU
- An embodiment further comprises executing a plurality of setup passes on the control maps, comprising performing Z- testing of a Z-buffer created from the control maps
- An embodiment further comprises: determining from the intermediate control maps video data units that do not have inter-unit dependencies for deblocking filtering; and rearranging the video data units that do not have inter-unit dependencies such that the data units that do not have inter-unit dependencies can be processed in parallel on the multiple pipelines,
- An embodiment further comprises mapping the rearranged data units that do not have inter-unit dependencies to a scratch buffer for processing,
- Embodiments of the decoding method and system further include a computer readable medium including instructions which when executed in a video processing system cause the system to process the encoded video data, the processing comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information; and decoding the encoded video data, wherein decoding comprises: parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines; and performing deblocking on a frame of video data on motion compensation has been performed.
- the pre-defined format comprises a compression scheme according to which the video data may be encoded using one of a plurality of prediction operations for various units of data in a frame, and wherein the control information comprises an indication of which prediction operation was used to encode each unit of data in the frame.
- processing further comprises deblocking the decoded video data on a frame deblocking is performed on an entire frame of video data at a time.
- control information comprises a rearrangement of the video data such that a deblocking operation can be performed in parallel on multiple video data using the plurality of GPU pipelines.
- the pre-processing further comprises creating a Z-buffer from the control maps using one of a plurality of pre-shaders, wherein running a pre-shader on the control maps is more efficient than running a rendering shader on the control maps.
- the compression scheme comprises one of a plurality of high- compression-ratio schemes, including H.264.
- the pre-defined format comprises an MPEG standard video format.
- Embodiments of the decoding method and system further include a computer readable medium having instructions stored thereon which, when processed, are adapted to create a circuit capable of performing a method comprising: pre-processing control maps generated from encoded video data that was encoded according to a predefined format, wherein pre-processing comprises generating a plurality of intermediate control maps containing control information, including control information specific to an architecture of a video processing unit, and decoding the encoded video data; grouping units of video data that have similar deblocking dependencies; and performing deblocking on each group having the same dependencies concurrently.
- Embodiments of the decoding method and system further include a computer having instructions store thereon which, when implemented in a video processing driver, cause the driver to perform a parallel processing method, the method comprising: pre-processing control maps that were generated from encoded video data; and generating intermediate control maps comprising information regarding decoding the video data on a frame basis such that each of multiple, distinct decoding operations, including a deblocking operation, is performed on an entire frame at one time, and further regarding rearranging the video data to be processed in parallel on multiple pipelines of a graphics processing unit (GPU) so as to optimize the use of the multiple pipelines.
- GPU graphics processing unit
- Embodiments of the decoding method and system further include a graphics processing unit (GPU) configured to: pre-process control maps that were generated from encoded video data; generate intermediate control maps; and use the intermediate control maps to perform deblocking of the video data on a frame basis such that deblocking is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel in groups of like dependencies on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines.
- GPU graphics processing unit
- Embodiments of the decoding method and system further include a video processing apparatus comprising: circuitry configured to pre-process control maps that were generated from encoded video data that was encoded according to a predefined format, and to generate intermediate control maps; and driver circuitry configured to read the intermediate control maps for controlling a video data decoding operation; and multiple video processing pipeline circuitry configured to respond to the driver circuitry to perform decoding of the video data on a frame basis such deblocking is performed on an entire frame at one time, and to further rearrange the video data to be processed in parallel in groups of like dependencies on multiple pipelines of the GPU so as to optimize the use of the multiple pipelines.
- a video processing apparatus comprising: circuitry configured to pre-process control maps that were generated from encoded video data that was encoded according to a predefined format, and to generate intermediate control maps; and driver circuitry configured to read the intermediate control maps for controlling a video data decoding operation; and multiple video processing pipeline circuitry configured to respond to the driver circuitry to perform decoding of the video data on a frame basis such
- Embodiments of the decoding method and system further include a digital image generated by a method comprising: pre-processing control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre- processing comprises generating a plurality of intermediate control maps containing control information; and decoding the encoded video data, wherein decoding comprises: parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines; and performing deblocking on a frame of video data on which motion compensation has been performed.
- Embodiments of the decoding method and system further include a method for decoding video data, comprising: a first processor generating control maps from encoded video data; a second processor, receiving the control maps; generating intermediate control maps from the control maps, wherein the intermediate control maps include information specific to an architecture of the second processor; using the intermediate control maps to decode the encoded video data; deblocking the decoded data, comprising deblocking an entire frame in parallel.
- control maps comprise data and control information according to a specified format
- An embodiment further comprises the second processor using the intermediate control maps to perform parallel processing on the video data to generate display data.
- control maps are generated on a per frame basis.
- the architecture of the second processor comprises a type of architecture selected from a group comprising: a single instruction multiple data (SIMD) architecture; a multi-core architecture; and a multi-pipeline architecture.
- SIMD single instruction multiple data
- multi-core multi-core
- multi-pipeline architecture multi-pipeline
- parallel processing comprises performing set up passes.
- performing setup passes comprises at least one of: sorting passes to sort surfaces; inter-prediction passes; intra-prediction passes; and deblocking passes.
- Embodiments of the decoding method and system further include a method of upgrading a system to allow for decoding of video data comprising: causing an updated driver to be installed on the system, the updated driver containing computer readable instructions for adapting a system to pre-process control maps generated from encoded video data that was encoded according to a pre-defined format, wherein pre-processing comprises- generating a plurality of intermediate control maps containing control information; and grouping units of data with similar deblocking dependencies such that a deblocking operation is performed on units in a group concurrently.
- the computer readable instructions further adapt the system to decode the encoded video data, wherein decoding comprises parallel processing using the intermediate control maps to optimize usage of a plurality of processing pipelines.
- Embodiments of the decoding method and system further include a hardware- accelerated decoding method, comprising: pre-processing encoded data, wherein the encoded data is encoded in a plurality of units of predefined sizes, wherein various units of the plurality of units have dependencies, including deblocking dependencies, such that dependent units must be processed in a particular order, and wherein preprocessing comprises determining the dependencies; and performing deblocking on all of the units in a frame in one operation.
- pre-processing further comprises designating units of data that have similar dependencies similarly, mapping units with similar dependencies to be processed together so as to optimally utilize the hardware; and processing similarly designated units in parallel.
- the method further comprises: copying the mapped units to a buffer for processing; and copying the mapped units back to the frame after processing.
- PLDs programmable logic devices
- FPGAs field programmable gate arrays
- PAL programmable array logic
- ASICs application specific integrated circuits
- microcontrollers with memory such as electronically erasable programmable read only memory (EEPROM)
- EEPROM electronically erasable programmable read only memory
- aspects of the embodiments may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types
- MOSFET metal-oxide semiconductor field-effect transistor
- CMOS complementary metal-oxide semiconductor 1
- ECL emitter-coupled logic
- polymer technologies e.g., silicon-conjugated polymer and metal- conjugated polymer-metal structures
- mixed analog and digital etc.
- some or all of the hardware and software capability described herein may exist in a printer, a camera, television, a digital versatile disc (DVD) player, a handheld device, a mobile telephone or some other device ,
- DVD digital versatile disc
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
La description de l'invention concerne des modes de réalisation d'un procédé et d'un système pour la compensation du mouvement lors du décodage de données vidéo. Selon différents modes de réalisation, un codec à rapport de compression élevé (notamment, le codec H.264) constitue un élément du principe de codage des données vidéo. Des modes de réalisation procèdent au prétraitement de cartes de commande, générées en se basant sur des données vidéo codées, et génèrent des cartes de commande intermédiaires contenant des informations concernant le décodage de données vidéo. Les cartes de commande indiquent celle de différentes opérations de prédiction qui doit être utilisée pour procéder à la compensation du mouvement sur des unités particulières de données d'une trame. Selon un mode de réalisation, la compensation du mouvement est effectuée trame par trame de manière à mettre en oevre chacune des différentes opérations de prédiction sur l'intégralité d'une trame, en une seule fois. Selon d'autres modes de réalisation, le traitement des différentes trames est entrelacé. Des modes de réalisation permettent d'augmenter l'efficacité de la compensation de mouvement de manière à permettre le décodage de données vidéo codées selon un rapport de compression élevé sur des ordinateurs personnels, ou tout équipement comparable, sans matériel de décodage supplémentaire spécial.
Applications Claiming Priority (10)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US11/514,780 US9055306B2 (en) | 2006-08-31 | 2006-08-31 | Parallel decoding method and system for highly compressed data |
| US11/515,472 | 2006-08-31 | ||
| US11/515,473 | 2006-08-31 | ||
| US11/515,472 US8345756B2 (en) | 2006-08-31 | 2006-08-31 | Method and system for parallel intra-prediction decoding of video data |
| US11/514,780 | 2006-08-31 | ||
| US11/514,801 | 2006-08-31 | ||
| US11/515,311 US20080056350A1 (en) | 2006-08-31 | 2006-08-31 | Method and system for deblocking in decoding of video data |
| US11/514,801 US20080056349A1 (en) | 2006-08-31 | 2006-08-31 | Method and system for motion compensation method in decoding of video data |
| US11/515,473 US9049461B2 (en) | 2006-08-31 | 2006-08-31 | Method and system for inter-prediction in decoding of video data |
| US11/515,311 | 2006-08-31 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2008028013A2 true WO2008028013A2 (fr) | 2008-03-06 |
| WO2008028013A3 WO2008028013A3 (fr) | 2009-11-26 |
Family
ID=39136864
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2007/077191 Ceased WO2008028013A2 (fr) | 2006-08-31 | 2007-08-30 | Procédé et système de décodage pour des données vidéo à rapport de compression élevé |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2008028013A2 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015200144A1 (fr) * | 2014-06-27 | 2015-12-30 | Alibaba Group Holding Limited | Procédé et appareil d'affichage de canal vidéo |
-
2007
- 2007-08-30 WO PCT/US2007/077191 patent/WO2008028013A2/fr not_active Ceased
Non-Patent Citations (3)
| Title |
|---|
| GAO G-P ET AL: "Accelerate Video Decoding With Generic GPU" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 15, no. 5, 1 May 2005 (2005-05-01), pages 685-693, XP011131129 ISSN: 1051-8215 * |
| TOL VAN DER E B ET AL: "MAPPING OF H.264 DECODING ON A MULTIPROCESSOR ARCHITECTURE" PROCEEDINGS OF THE SPIE - THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING, SPIE, PO BOX 10 BELLINGHAM WA 98227-0010 USA, vol. 5022, 21 January 2003 (2003-01-21), pages 707-718, XP008025096 ISSN: 0277-786X * |
| TUNG-CHIEN CHEN ET AL: "Analysis and design of macroblock pipelining for h.264/avc vlsi architecture" CIRCUITS AND SYSTEMS, 2004. ISCAS '04. PROCEEDINGS OF THE 2004 INTERNA TIONAL SYMPOSIUM ON VANCOUVER, BC, CANADA 23-26 MAY 2004, PISCATAWAY, NJ, USA,IEEE, US, vol. 2, 23 May 2004 (2004-05-23), pages 273-276, XP010720158 ISBN: 978-0-7803-8251-0 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015200144A1 (fr) * | 2014-06-27 | 2015-12-30 | Alibaba Group Holding Limited | Procédé et appareil d'affichage de canal vidéo |
| US9495727B2 (en) | 2014-06-27 | 2016-11-15 | Alibaba Group Holding Limited | Video channel display method and apparatus |
| US10291951B2 (en) | 2014-06-27 | 2019-05-14 | Alibaba Group Holding Limited | Video channel display method and apparatus |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2008028013A3 (fr) | 2009-11-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9565433B2 (en) | System for parallel intra-prediction decoding of video data | |
| US12355968B2 (en) | Video decoding implementations for a graphics processing unit | |
| US20080056350A1 (en) | Method and system for deblocking in decoding of video data | |
| ES2981403T3 (es) | Método y aparato de predicción de bloques de croma | |
| KR102371799B1 (ko) | 데이터 처리 시스템 | |
| US8265144B2 (en) | Innovations in video decoder implementations | |
| JP5246264B2 (ja) | 画像符号化装置、画像復号化装置、画像符号化方法及び画像復号化方法 | |
| US9055306B2 (en) | Parallel decoding method and system for highly compressed data | |
| US20190281273A1 (en) | Adaptive loop filtering method for reconstructed projection-based frame that employs projection layout of 360-degree virtual reality projection | |
| US8107761B2 (en) | Method for determining boundary strength | |
| KR20190008125A (ko) | 그래픽 처리 시스템 | |
| US10200716B2 (en) | Parallel intra-prediction encoding/decoding process utilizing PIPCM and/or PIDC for selected sections | |
| US20090129478A1 (en) | Deblocking filter | |
| CN101321290B (zh) | 基于数字信号处理器的去块滤波方法 | |
| US7813432B2 (en) | Offset buffer for intra-prediction of digital video | |
| US9049461B2 (en) | Method and system for inter-prediction in decoding of video data | |
| US7760804B2 (en) | Efficient use of a render cache | |
| US20080056349A1 (en) | Method and system for motion compensation method in decoding of video data | |
| EP1147671A1 (fr) | Procede et dispositif pour la compensation du mouvement dans un moteur de mappage de texture | |
| CN104506867B (zh) | 采样点自适应偏移参数估计方法及装置 | |
| WO2008028013A2 (fr) | Procédé et système de décodage pour des données vidéo à rapport de compression élevé | |
| US6618508B1 (en) | Motion compensation device | |
| US20250384591A1 (en) | Cross-component prediction for bandwidth compression |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| NENP | Non-entry into the national phase |
Ref country code: RU |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 07841593 Country of ref document: EP Kind code of ref document: A2 |