[go: up one dir, main page]

CN101123723A - Digital Video Decoding Method Based on Graphics Processor - Google Patents

Digital Video Decoding Method Based on Graphics Processor Download PDF

Info

Publication number
CN101123723A
CN101123723A CNA2006100892521A CN200610089252A CN101123723A CN 101123723 A CN101123723 A CN 101123723A CN A2006100892521 A CNA2006100892521 A CN A2006100892521A CN 200610089252 A CN200610089252 A CN 200610089252A CN 101123723 A CN101123723 A CN 101123723A
Authority
CN
China
Prior art keywords
point
gpu
coefficient
block
decoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006100892521A
Other languages
Chinese (zh)
Other versions
CN101123723B (en
Inventor
周秉锋
韩博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN2006100892521A priority Critical patent/CN101123723B/en
Publication of CN101123723A publication Critical patent/CN101123723A/en
Application granted granted Critical
Publication of CN101123723B publication Critical patent/CN101123723B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明提出了一种基于GPU的压缩视频解码方法。该方法利用点图元而不是矩形来表示视频块,将除变长解码外的所有解码环节成功映射到GPU上,利用CPU将视频数据组织成点集并通过绘制点集的方式完成解码过程。本发明结合了CPU和GPU各自的优势,使两者并行工作加速视频解码过程,同时具有硬件解码的高性能和软件解码的灵活性,可以处理多样的视频压缩格式和标准,能用来替代配备GPU的个人计算机,游戏主机,手持移动设备等上的专用解码硬件,提高硬件资源的利用率,降低成本。

The invention proposes a GPU-based compressed video decoding method. This method uses point primitives instead of rectangles to represent video blocks, successfully maps all decoding steps except variable-length decoding to GPU, uses CPU to organize video data into point sets, and completes the decoding process by drawing point sets. The present invention combines the respective advantages of CPU and GPU to make them work in parallel to speed up the video decoding process. It also has the high performance of hardware decoding and the flexibility of software decoding, can handle various video compression formats and standards, and can be used to replace equipment Dedicated decoding hardware on GPU personal computers, game consoles, handheld mobile devices, etc., to improve the utilization of hardware resources and reduce costs.

Description

Digital video decoding method based on graphic processor
Technical Field
The invention belongs to the field of computer digital video compression, and particularly relates to a method for completing video decoding by using a Graphics Processing Unit (GPU).
Background
Digital video has been widely used in people's daily life, and relates to various fields such as digital televisions, personal computers, handheld mobile devices, entertainment, education and the like. For a large number of users, the most basic requirement is high quality real-time playback (decoding) of video content. However, in order to achieve high compression rate and good image quality, the video compression standard needs to adopt a video compression technique with high computational complexity, which directly results in that the decoding process thereof needs to consume a large amount of computational resources.
In most of the common video compression standards, macroblocks with a size of 16 × 16 are used as basic processing units, and referring to fig. 1, to complete the decoding process, the following processing links need to be sequentially completed for each macroblock: variable length decoding, inverse quantization, inverse Discrete Cosine Transform (Inverse Discrete Cosine Transform-IDCT), motion compensation and color space conversion. Variable length decoding completes parsing of the video bitstream, and restores entropy coding information of the video, such as parameters, coefficients, and motion vectors of each macroblock, which is a strictly serial bit operation. Subsequent inverse quantization and IDCT acts on each block of coefficients that make up the macroblock, processing the sparse DCT coefficients to recover the original block of pixels, a process that is computationally complex. Motion compensation is an effective method to reduce temporal redundancy in video sequences, in basic units of macroblocks. The basic principle of the process in the encoding stage is to search a reference frame for an image block that is most similar to a macroblock in the current image, i.e., a prediction block, the search result is represented by a motion vector, calculate the difference between the current macroblock and the prediction block, and then encode the difference and the motion vector. Motion compensation is the process of recovering the coded picture from the difference and motion vectors. Since good prediction tends to result in better coding efficiency, common video coding systems employ techniques such as bi-directional prediction (B-frame) and sub-pixel precision motion vectors to improve the accuracy of motion estimation. Although the prediction accuracy and compression rate are improved, the complexity of the operation is further increased. The final color space conversion process, which is typically computationally intensive, multiplies the color vector (RGB) by the transformed rectangle for each pixel in the image. As can be seen, the decoding process of video is a complex system composed of multiple time-consuming processing elements.
In the face of high-quality and high-resolution video and the complex compression technology introduced by the new generation of compression standard (such as h.264), the software decoder that uses CPU alone in the current computer system cannot even meet the requirement of real-time video decoding. Therefore, other subsystems are needed to share part of the decoding task to relieve the CPU stress. Dedicated video decoding hardware has been introduced into computer systems for decades, either as stand-alone boards or integrated within graphics hardware. The popularization of the microsoft direct Video access (DXVA) specification makes the latter the current mainstream. However, such dedicated decoding hardware is usually only applicable to a specific video compression standard (mostly MPEG-2) and thus has very limited extensibility and programmability, and lacks sufficient flexibility to cope with the current diverse video compression formats. While programmable video processing hardware, such as PureVideo by Nvidia and Avivo by ATI, technologies have begun to be integrated on current graphics cards, they require additional hardware overhead and higher cost, and there is currently a lack of efficient high-level languages and application program interfaces to facilitate control of these underlying hardware resources.
On the other hand, with the development and popularization of three-dimensional graphics applications, graphics hardware has evolved into a graphics Processor, i.e., GPU, with high performance and flexibility, and at present, the main programmable parts include a Vertex Processor (Vertex Processor) and a pixel Processor (Fragment Processor). The two parts of processing units combine the grating and the synthesizing device to form a pipeline processing structure of the GPU. The high performance of graphics processors in massively parallel, the programmability of sophisticated high-level shading languages, and the support of high-precision data types (32-bit floating point number) make GPUs an attractive coprocessor in computer systems other than CPUs, and can be used to solve many general-purpose computing problems (gpgpgpu) outside the graphics domain, such as numerical computation, signal processing, fluid simulation, etc. From an architectural point of view, GPUs are highly parallel stream processors based on vector operations, which have a great resemblance to some successful specialized multimedia and video processors. These all provide powerful support for efficient video decoding on GPUs.
However, the GPU is developed from design to development in order to accelerate graphics computation, and processed data are relatively regular vertices and pixels, and therefore cannot be directly used in a relatively complex and multi-branched video decoding process. Except for the final color space conversion stage, the texture methods commonly used in the GPGPU domain are not applicable to this decoding process. The main reason is that most current video compression standards are based on an organization of macroblocks/coefficient blocks; each macroblock or coefficient block has its own specific parameters and properties, which are different from each other and are not conveniently represented by a regular single texture. Some predecessors that are based on texture representations, such as DCT/IDCT transforms using a GPU, have no advantage over the CPU in performance and also have considerable data transmission overhead. The document "Accelerate Video decoding with genetic GPU" (Shen G. Et al, IEEE Transaction on Circuits and Systems for Video Technology, may 2005) represents macroblocks with small rectangles, thereby completing the motion compensation process in decoding, which, while effective, still has problems of data redundancy, etc. These methods do not fully utilize the computational resources of the GPU resulting in poor performance and are not suitable for practical video decoding systems.
Disclosure of Invention
The invention aims to solve the defects of the existing software and hardware decoding scheme in performance or flexibility, and provides a compression video decoding method based on a GPU. The method has the advantages of high performance of hardware and flexibility of software, is suitable for various video compression standards, can be used for replacing special decoding hardware on a personal computer, a game host, a handheld mobile device and the like which are provided with a GPU, improves the utilization rate of hardware resources, and reduces the cost.
The above object of the present invention is achieved by the following technical solutions:
a digital video decoding method based on a graphics processor comprises the following steps:
1) CPU variable length decoding to obtain macro block and coefficient block, and using basic graphic element point to represent, respectively generating macro block point set corresponding to macro block and DCT coefficient point set corresponding to coefficient block;
2) The CPU sends the macro block point set and the DCT coefficient point set to the GPU in batches in a batch processing mode;
3) And drawing a macro block point set and a DCT coefficient point set, and executing a corresponding vertex and pixel processing program by the GPU to finish the video decoding process.
The invention expresses the basic units- 'macro blocks and coefficient blocks' forming the video by the basic graphic elements- 'points' in the graphic drawing, thereby mapping the traditional video decoding process into the drawing process of a point set, fully exerting the advantages of GPU pipeline processing and large-scale parallel processing and obtaining higher decoding performance. In the process of drawing the point set, a programmable vertex processor and a programmable pixel processor on a GPU are controlled by a vertex program and a pixel program to finish the main links in the decoding process: inverse quantization, IDCT, motion compensation and color space conversion, and further share part of the computational task with the compositing unit (Blending) and the texture filtering unit on the GPU. The technical scheme specifically comprises the following aspects:
1) Video block information is represented by point primitives rather than rectangles. The working principle is to store the type, position, parameters, coefficients, etc. of macroblocks and coefficient blocks in the video using the properties of the points (four-dimensional vectors) such as position, normal and texture coordinates, etc. Wherein the macro block and the coefficient block correspond to two different types of point sets respectively: macroblock point sets and DCT coefficient point sets for motion compensation and IDCT, respectively. The generation process of the DCT coefficient point set utilizes Zigzag scanning to reduce the number of points in the point set. In consideration of the inefficient branch processing capability of the GPU and the different operation processes corresponding to different types of macro blocks or coefficient blocks, during the process of generating the DCT coefficient point set and the macro block point set, the CPU is used to further subdivide the two types of point sets, and the blocks corresponding to the same type of operation are divided into a type of subset, for example, all non-predicted macro blocks (Intra) in the macro block are grouped into one type, and all forward predicted macro blocks (forward) are grouped into another type.
2) The inverse quantization and IDCT processes in the decoding process are performed by once plotting the DCT coefficient point set created in 1). The inverse quantization is completely finished by a vertex processor of the GPU, while the IDCT is mainly finished in a pixel processor, and the inverse quantization and the IDCT form a pipeline structure to improve the execution efficiency. The quantization parameters and DCT coefficients in inverse quantization are fed into the vertex processor by the attributes of the point elements and the quantization rectangles are preset by the uniform parameter into the constant registers of the vertex processor. The IDCT process is performed by linearly combining DCT coefficients and corresponding base images in a pixel processing unit, and the base images are stored in a video memory of the GPU in the form of texture after being preprocessed. For DCT coefficients distributed in a plurality of points and belonging to the same coefficient block, accumulating the results of a plurality of point primitives to an IDCT output buffer (residual image texture) by using a mixing device (blending) in the GPU.
3) The motion compensation process is completed by drawing the macro block point set created in 1), sampling the reference image texture and the IDCT output texture output in step 2) in a pixel processing unit, accumulating the sampling results and performing saturation operation, and completing the motion compensation process. For the motion compensation of sub-pixel precision, the interpolation operation of sub-pixels is realized by utilizing bilinear filtering hardware of a GPU texture unit.
The advantages of the invention can be summarized in the following aspects:
1) The method combines the advantages of the CPU and the GPU, enables the CPU and the GPU to work in parallel to accelerate the video decoding process, has high performance of hardware decoding and flexibility of software decoding, and can process various video compression formats and standards.
2) Compared with dedicated video hardware, the solution can be realized on the basis of an upper-layer graphics API (such as OpenGL) and a high-level rendering language (such as CG and GLSL) without platform and system relations, is independent of specific hardware of a bottom layer, and is suitable for various systems with GPUs (graphics processing units), such as a personal computer, a game host, a mobile phone, a PDA (personal digital assistant) and the like. The GPU has high evolution speed, the performance increase range far exceeds Moore's law, new functions and characteristics are continuously added to bring more flexible programmability, and the GPU has more potential than CPU soft decoding and special hardware in the long run.
3) The method uses points to represent the macro blocks and the coefficient blocks, and has simple realization and flexible control. Comparing texture representations, the point-based method only transmits non-zero coefficients; compared with a rectangular representation method, a large amount of redundant data of four vertexes in the rectangle is eliminated, so that transmission overhead is reduced, and bandwidth requirements are reduced. Meanwhile, the point method is flexible to control, un-coded blocks (non-coded blocks) can be conveniently removed, zero coefficients are automatically removed in the DCT coefficient point diagram element generating process corresponding to the coefficient blocks, and unnecessary calculation is reduced. And based on a point representation mode, a vertex processor and rasterization hardware in a GPU processing pipeline are conveniently utilized, and the calculation resources of the GPU are fully mined. On the other hand, the method of dividing the CPU into different point sets eliminates the bottleneck of GPU branch processing and improves the performance.
Drawings
The following is a brief description of the drawings that accompany the present invention:
fig. 1 is a schematic diagram of the main elements of a typical video decoding process.
Fig. 2 is a diagram of a hardware system according to the present invention.
Fig. 3 is a schematic diagram of a macroblock/block structure of digital video.
FIG. 4 is an overall flow chart of the present invention for video decoding with a GPU by rendering a set of points.
Fig. 5 is a schematic diagram of the generation of DCT coefficient point primitives from coefficient blocks of video in the present invention.
Fig. 6 is a schematic diagram of DCT-based image forming texture.
Fig. 7a is a schematic diagram of the output buffer of the IDCT process.
Fig. 7b is a schematic diagram of the frame buffer structure of the motion compensation process.
Fig. 8a is a schematic diagram of a sub-pixel precision motion compensated interpolation process.
FIG. 8b is a diagram of texture filtering unit bilinear interpolation.
FIG. 9 is a diagram illustrating a process of mapping a DCT coefficient point set to complete inverse quantization and IDCT.
Fig. 10 is a schematic diagram illustrating a process of completing motion compensation by plotting a macroblock point set.
Detailed Description
The preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings of the present invention.
Fig. 2 illustrates a block diagram of a hardware system according to the present invention. The invention needs the cooperation of the CPU and the GPU to complete the whole decoding process, and the CPU and the GPU can be executed in parallel, thereby further improving the efficiency. The CPU and the GPU are connected through a system bus, such as PCIE or AGP. Bus bandwidth is a limited resource and data transfer overhead is an important factor affecting overall performance. An important advantage of the present invention over prior methods is that useless or redundant data is avoided, significantly reducing the amount of data transferred. The CPU packages the information needed by the decoding of the macro block and the coefficient block in the video into a point set for drawing, temporarily stores the information in a system memory in a vertex array or other forms, and then transmits the information into the GPU through a system bus. The GPU is the main execution unit of the decoding task of the present invention, completes the main decoding task, and requires a vertex and pixel processor with programmability and a video memory of a certain capacity for storing the calculation data and intermediate results.
The invention provides a method for realizing video decoding by using point elements to represent macro blocks and coefficient blocks in a video and drawing point sets corresponding to the corresponding macro blocks and coefficient blocks through a graphics hardware GPU. The process flow of the present invention is shown in FIG. 4. The following describes the specific steps of the present invention for implementing video decoding in detail with reference to the accompanying drawings:
1) And the CPU performs variable length decoding to generate a point set corresponding to the macro block and the coefficient block in the video.
Firstly, the CPU completes variable length decoding to obtain information of macro blocks and coefficient blocks in the video, then the video information is packaged into attributes of point primitives, the point primitives are classified into different point sets according to different types or processing processes of the macro blocks and the coefficient blocks, after all video blocks are processed, the corresponding point sets are sent to the GPU in batches in a batch processing mode (such as vertex arrays), and therefore the GPU parallel and pipeline execution efficiency is improved.
The point sets are divided into two broad categories: a DCT coefficient point set and a macroblock point set. The main basis for this partitioning is that the current compressed video is based on the block-based structure, as shown in fig. 3, where the macroblock is the basic unit of motion compensation, and the coefficient blocks that make up the macroblock are the basic units of inverse quantization and IDCT. Both large sets of class points can be further divided into new subsets depending on the type and characteristics of the block. For example, the DCT coefficient point set can be further divided into a field DCT coding point set and a frame DCT coding point set according to different DCT coding modes; the macroblock point sets may be subdivided into non-predicted macroblock (Intra) sets, uni-predicted macroblock sets, bi-predicted macroblock sets, etc., according to macroblock type. The video blocks of different types are usually corresponding to different decoding processing processes, are classified into subsets by using the CPU in advance and are respectively sent to the GPU for execution, so that time-consuming branch operation on the GPU can be avoided, and the overall decoding efficiency is improved.
The process of packing information in macroblocks and coefficient blocks into point primitives is different, but the basic idea is to use multiple vector attributes of point primitives such as position, normal, color, texture coordinates, etc. to store useful information such as types, parameters, coefficients, etc. in video blocks.
The main information contained in the macro block is the position, type (intra, inter) and motion vector of the macro block, and can be directly put into the vector attribute of the point primitive, so that the macro block is converted into the point primitive.
The main information of the coefficient block is DCT coefficients. Due to the energy-converging property and quantization process of DCT, only a few non-zero values of 64 DCT coefficients in an 8 x 8 coefficient block are distributed in the low frequency part. Although a small number of coefficients can be directly put into the attributes of the point primitives, the distribution of the coefficients in different coefficient blocks is irregular, which is not favorable for forming a regular point set suitable for GPU processing, so that the coefficients in each coefficient block need to be reorganized to generate a regular structure. We use Zigzag storage of DCT coefficients to generate the corresponding coefficient point primitives as shown in fig. 5. Zigzag scanning converts two dimensions into a one-dimensional form to concentrate the non-zero coefficients as much as possible. Based on the one-dimensional coefficient array of Zigzag, every four coefficients are a four-dimensional attribute in a group of corresponding point primitives. To ensure the regularity of the points, each or a specific plurality of four-dimensional attributes are placed in a point, and the index of the set of coefficients in this one-dimensional array (coefficient index) is loaded, together with the location, type, quantization parameter information of the coefficient block, to form a DCT coefficient point primitive. The direct result of this approach is that multiple point primitives may be generated for each video block. We then use the IDCT process to accumulate the results scattered into different points.
The method for generating the point primitives acts on all macro blocks and coefficient blocks in each frame of image, the generated point set is stored in a system memory in the form of a Vertex Array (Vertex Array), then the point set is drawn by using a graphics API (application programming interface), and data is sent to a GPU (graphics processing unit) in a batch processing mode to complete the subsequent decoding process.
2) The graphics API rendering environment is initialized.
a) The API function is called to set the rasterized size of the point primitive (e.g., glPointSize in OpenGL). The size is set to 8 when drawing the DCT coefficient Point set and the Point Sprite mode texture generation (Point Sprite ARB Extension) is activated, and set to 16 when drawing the macroblock Point set. For variable-size block structures, the size of the block may be stored in a point attribute, changing the PSIZE register in the vertex processor of the GPU to achieve different rasterization sizes.
b) And allocating off-screen buffer space on the GPU, and storing intermediate output results. We allocate one IDCT output buffer and three frame buffers. In order to ensure the accuracy of the IDCT operation, the output of the IDCT is buffered in a single-channel 16-bit floating-point format (fp 16), and the luminance and chrominance components are shown in fig. 7 a. Because the motion compensation process needs to keep the reference frame, the three frame buffers are respectively used for storing the forward reference frame, the backward reference frame and the current frame, the structure of the frame buffer is an unaligned Byte type of an 8-bit RGB three-channel, the structure is shown in FIG. 7B, the brightness component is stored in the R channel, and the two chroma components are respectively stored in the G channel and the B channel after interpolation. With the "render to texture" function of the GPU, such as render to texture extension or FBO of OpenGL, these caches can be directly used as textures for sampling and access after rendering is complete. Setting a texture filtering mode as 'Nearest' for the IDCT output texture; setting texture filtering as 'Biliner' for a frame buffer used for motion prediction so as to automatically activate a texture filtering function for sub-pixel precision motion compensation during texture sampling; while setting the texture addressing mode to "Clamp" for pixel padding (padding) of the image edges required for "unconstrained motion vectors".
c) The DCT base image is processed to synthesize base image texture for GPU sampling. The IDCT transform can be viewed as a linear combination of DCT coefficients and their corresponding base images, as shown by the following equation:
Figure A20061008925200101
wherein X represents pixel block after IDCT, X (u, v) represents coefficient at (u, v) in DCT coefficient block, T represents DCT transformation matrix, T (u) is u-th row of the matrix, and base image corresponding to coefficient (u, v) passes through column directionQuantity T (u) T And the outer product of the row vector T (v). The calculation process of the formula is the multiplication operation of scalar quantity and matrix and the linear combination operation of matrix. The main advantage of this process is that each coefficient is computed relatively independently, and zero-valued coefficients can be directly culled to reduce the amount of computation.
The base image texture generation process is shown in fig. 7. According to the Zigzag scanning sequence, the base images corresponding to every four coefficients are stored in an RGBA channel of an 8 x 8 texture block, and in order to ensure the accuracy of IDCT operation, the data accuracy of each color channel is 16 bits. This results in a 32 x 32 size, 16 bit floating point precision RGBA format texture.
d) A Vertex handler (Vertex Program) and a pixel handler (Fragment Program) corresponding to the set of points for rendering the DCT coefficients are loaded. The quantization matrix is loaded into the vertex handler by the uniformity parameter for inverse quantization.
3) After the preparation 2) is completed, drawing the DCT coefficient point set generated in step 1) is started, and the GPU completes inverse quantization and IDCT processes in the drawing process, as shown in fig. 9.
a) The vertex processor implements inverse quantization. The inverse quantization process is essentially a multiplication of the quantization step size and the coefficients. The operation process is as follows:
X iq (u,v)=qp×QM(u,v)×X q (u,v)
wherein X is q (u, v) and X iq The (u, v) respectively represents DCT coefficients before and after inverse quantization, the quantization parameter represented by qp is put into the attribute of the point through the generation process of the coefficient point primitive in the step 1), QM (u, v) represents a corresponding item of the quantization matrix, the whole quantization matrix is loaded into a constant register in the step 2) d), and the corresponding item (entry) can be obtained through the coefficient index introduced in the step 1). Because the coefficients are stored in a vector form, one vector multiplication in the vertex processing program can complete the inverse quantization process of four coefficients.
The vertex processing program can also calculate the texture coordinates of the base image corresponding to the coefficients according to the coefficient indexes and transmit the texture coordinates to the subsequent rasterization stage.
b) And in the light deleting stage, the point graphic elements are converted into pixel blocks with the specified sizes at corresponding positions according to the sizes of the points set in the step 2) a) and the positions output by the vertex processor. Meanwhile, each pixel covered by the pixel block inherits the output attribute of the dot diagram element in the vertex processing stage. For a set of coefficient points, after activating step 2) a) pixelwise texture generation, each pixel will generate the corresponding intra-block texture coordinates, range (0, 0) - (1, 1).
c) The pixel processor combines the texture coordinates of the base image output in a) and the texture coordinates in the block formed in b), and can accurately sample the texture value of the base image corresponding to each pixel point. Consider the IDCT calculation formula in step 2) c). The multiplication between scalar and matrix has been converted to a direct form of operation between pixels. Because the coefficients and the texture values of the base image exist in the form of RGBA four-dimensional vectors, the multiplication and accumulation of the four coefficients can be completed by one vector point multiplication operation in the pixel processing program, and then the result is output to a buffer.
d) The Blending function (Blending) of the GPU hardware is activated and set to Add. Because of the plurality of coefficient point primitives which may be generated by each coefficient block in step 1), the operation result output by each point primitive in the step can be finally accumulated in the output buffer, thereby completing the linear accumulation of all coefficients in the IDCT calculation formula in step 2) c).
And finally, finishing drawing the DCT coefficient point set, and storing the result of inverse quantization of the coefficient block in the video and IDCT in an output buffer of the IDCT to be used as residual image texture for a subsequent motion compensation process.
4) The vertex and pixel handlers for motion compensation are loaded, the macroblock point size is set (16), and the macroblock point set is drawn to complete the motion compensation process, as shown in fig. 10.
a) The vertex processing procedure is mainly used to pre-process the motion vectors. And generating a corresponding decimal part according to the pixel precision of the motion vector so as to automatically complete the interpolation of the pixel by utilizing the bilinear filtering hardware of the texture when the texture is sampled. For example, for half-pixel accuracy, the fractional part is 0.5. Fig. 8a and 8b are simplified illustrations of the pixel interpolation and texture bilinear filtering processes.
b) The rasterization generates a block of pixels of macroblock size, each pixel inheriting the motion vector output in a).
c) In the pixel processing program, the position of each pixel is obtained by using the WPOS register, and then the position is shifted by using the motion vector to obtain the texture coordinate of the corresponding reference block. And the pixel processing program samples the texture of the reference frame and the texture of the residual image output by the IDCT, accumulates the sampling values, carries out saturation processing and outputs the result to a frame cache.
5) If the image in the frame buffer needs to be output to a display device, color space conversion needs to be carried out. The implementation process is to draw a rectangle with the size of an image, sample the frame buffer output in the step 4) c) by using a pixel processing program, change the color of each pixel, and output and display the result. And finally, the whole decoding process is finished.
The steps give out the whole process of finishing video decoding by using the GPU, the CPU is only used for generating and organizing point sets for drawing, and other decoding links are finished on the GPU, so that the calculation burden of the CPU is reduced to the maximum extent; by representing macro blocks and coefficient blocks in the video as point primitives and efficiently mapping the whole decoding process into a drawing process of the point primitives, the method fully exerts the computing resources on the GPU and remarkably improves the video decoding efficiency by means of the parallel computing of GPU hardware and the acceleration function of pipeline processing.
Although specific embodiments of, and examples for, the invention are disclosed in the accompanying drawings for illustrative purposes and to aid in the understanding of the invention and the manner in which it may be practiced, those skilled in the art will understand that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. Therefore, the present invention should not be limited to the disclosure of the preferred embodiments and the accompanying drawings.

Claims (10)

1. A digital video decoding method based on a graphics processor comprises the following steps:
1) The CPU obtains a macro block and a coefficient block through variable length decoding, and the macro block and the coefficient block are expressed by basic graphic primitive points in graphic drawing to generate a point set corresponding to a video block;
2) The CPU sends point sets corresponding to the video blocks to the GPU in batches in a batch processing mode;
3) Drawing a macro block point set and a DCT coefficient point set, and executing a corresponding vertex and pixel processing program by the GPU to finish the decoding process.
2. A graphics processor based digital video decoding method as claimed in claim 1, wherein: the method for representing the macro block as the point picture element comprises the steps of 1) storing information required by decoding of the macro block in the attribute of the point picture element, wherein the information required by decoding is the position, the type and the motion vector of the macro block, and the attribute of the point picture element is vector attributes such as the position, the normal line, texture coordinates and the like.
3. A graphics processor based digital video decoding method as claimed in claim 1 or 2, wherein: step 1) the method for representing the coefficient block as the point graphic element is that the DCT coefficient in the coefficient block is stored in the attribute of the point graphic element, and the DCT coefficient generates a regular coefficient storage structure by utilizing CPU preprocessing.
4. A graphics processor based digital video decoding method as claimed in claim 1 wherein: step 1) when generating point diagram elements corresponding to the macro block and the coefficient block, a CPU divides the point diagram elements into different point sets in advance according to different types and operation processes of the macro block and the coefficient block.
5. A graphics processor based digital video decoding method as claimed in claim 1, wherein: and 3) finishing inverse quantization operation in the decoding process in a GPU vertex processor, loading the quantization matrix into a constant register through a Uniform parameter, combining a coefficient index and a quantization parameter stored in the point attribute, and finishing inverse quantization by utilizing vector multiplication.
6. A graphics processor based digital video decoding method as claimed in claim 1, wherein: and 3) the inverse DCT process in the decoding process is realized by linear combination of the DCT coefficient and the corresponding base image, and the base image generates texture and is stored in the GPU in the form of texture.
7. A graphics processor based digital video decoding method as claimed in claim 6, wherein: the process of generating texture from the base image corresponds to the process of generating coefficient points as claimed in claim 3, and the base image corresponding to each quadruple coefficient is stored in the RGBA channel of the same texture block.
8. A graphics processor based digital video decoding method as claimed in claim 1, wherein: the inverse DCT operation process in the decoding process of the step 3) sequentially comprises the following steps:
1) The vertex processing program calculates texture coordinates of the base image;
2) Rasterizing the pixel blocks according to the set size of the dot image elements;
3) A pixel processing program samples a base image and performs dot product operation with inherited coefficient attributes;
4) And activating a GPU mixing function, setting the GPU mixing function as an addition operation, accumulating the calculation results in different point primitives, and outputting the calculation results to the predicted residual image texture.
9. A graphics processor based digital video decoding method as claimed in claim 1, wherein: step 3), the motion compensation process in the decoding process sequentially comprises the following steps:
1) The vertex processing program processes the motion vector according to the prediction precision and sets a corresponding decimal part;
2) Rasterizing a macro block point primitive to form a pixel block;
3) The pixel processing program calculates texture coordinates of the reference block in the reference frame based on the motion vectors, samples the reference frame and the predicted residual image texture output from step 4) of claim 8, accumulates the results and performs a saturation operation.
10. A graphics processor-based digital video decoding method as defined in claim 9, wherein: the sampling reference frame process utilizes the bilinear filtering function of a GPU texture unit to complete interpolation operation required by sub-pixel precision motion compensation.
CN2006100892521A 2006-08-11 2006-08-11 Digital Video Decoding Method Based on Graphics Processor Expired - Fee Related CN101123723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2006100892521A CN101123723B (en) 2006-08-11 2006-08-11 Digital Video Decoding Method Based on Graphics Processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2006100892521A CN101123723B (en) 2006-08-11 2006-08-11 Digital Video Decoding Method Based on Graphics Processor

Publications (2)

Publication Number Publication Date
CN101123723A true CN101123723A (en) 2008-02-13
CN101123723B CN101123723B (en) 2011-01-12

Family

ID=39085869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006100892521A Expired - Fee Related CN101123723B (en) 2006-08-11 2006-08-11 Digital Video Decoding Method Based on Graphics Processor

Country Status (1)

Country Link
CN (1) CN101123723B (en)

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102158694A (en) * 2010-12-01 2011-08-17 航天恒星科技有限公司 Remote-sensing image decompression method based on GPU (Graphics Processing Unit)
CN102164284A (en) * 2010-02-24 2011-08-24 富士通株式会社 Video decoding method and system
CN102404576A (en) * 2011-11-30 2012-04-04 国云科技股份有限公司 Cloud terminal decoder and its load balancing algorithm and GPU decoding algorithm
CN101577110B (en) * 2009-05-31 2012-04-25 腾讯科技(深圳)有限公司 Method for playing videos and video player
CN102497550A (en) * 2011-12-05 2012-06-13 南京大学 Parallel acceleration method and device for motion compensation interpolation in H.264 encoding
CN102547289A (en) * 2012-01-17 2012-07-04 西安电子科技大学 Fast motion estimation method realized based on GPU (Graphics Processing Unit) parallel
CN102904857A (en) * 2011-07-25 2013-01-30 风网科技(北京)有限公司 Client video playing system and method thereof
CN102932003A (en) * 2012-09-07 2013-02-13 上海交通大学 Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework
CN103077688A (en) * 2013-01-11 2013-05-01 北京京东方光电科技有限公司 Source electrode driving device and source electrode driving method of liquid crystal display screen
CN103108186A (en) * 2013-02-21 2013-05-15 中国对外翻译出版有限公司 Method of achieving high-definition transmission of videos
CN103327413A (en) * 2013-06-26 2013-09-25 四川长虹电器股份有限公司 Method for achieving alpha animation in smart television
CN103427844A (en) * 2013-07-26 2013-12-04 华中科技大学 High-speed lossless data compression method based on GPU-CPU hybrid platform
CN103841389A (en) * 2014-04-02 2014-06-04 北京奇艺世纪科技有限公司 Video playing method and player
CN104104958A (en) * 2013-04-08 2014-10-15 联发科技(新加坡)私人有限公司 Image decoding method and image decoding device
CN104143334A (en) * 2013-05-10 2014-11-12 中国电信股份有限公司 Programmable graphics processor and method for performing sound mixing on multipath audio through programmable graphics processor
CN104519353A (en) * 2013-09-29 2015-04-15 联想(北京)有限公司 Image processing method and electronic apparatus
CN104702954A (en) * 2013-12-05 2015-06-10 华为技术有限公司 Video coding method and device
CN104836970A (en) * 2015-03-27 2015-08-12 北京联合大学 Multi-projector fusion method based on GPU real-time video processing, and multi-projector fusion system based on GPU real-time video processing
CN105120293A (en) * 2015-08-26 2015-12-02 中国航空工业集团公司洛阳电光设备研究所 Image cooperative decoding method and apparatus based on CPU and GPU
CN105678681A (en) * 2015-12-30 2016-06-15 广东威创视讯科技股份有限公司 GPU data processing method, GPU, PC architecture processor and GPU data processing system
CN105794209A (en) * 2013-12-13 2016-07-20 高通股份有限公司 Controlling sub prediction unit (SUB-PU) motion parameter inheritance (MPI) in three dimensional (3D) HEVC or other 3D coding
CN105787987A (en) * 2016-03-15 2016-07-20 广州爱九游信息技术有限公司 Texture processing method and electronic equipment
CN106210726A (en) * 2016-08-08 2016-12-07 成都佳发安泰科技股份有限公司 The method that utilization rate according to CPU Yu GPU carries out adaptive decoding to video data
CN106331852A (en) * 2016-09-13 2017-01-11 武汉斗鱼网络科技有限公司 Method and system of using WP cell phone for H264 hardware decoding
CN106504185A (en) * 2016-10-26 2017-03-15 腾讯科技(深圳)有限公司 One kind renders optimization method and device
CN106792066A (en) * 2016-12-20 2017-05-31 暴风集团股份有限公司 The method and system that the video decoding of optimization is played
CN107113426A (en) * 2014-11-14 2017-08-29 Lg 电子株式会社 The method and apparatus that the conversion based on figure is performed using broad sense graphic parameter
CN107172432A (en) * 2017-03-23 2017-09-15 杰发科技(合肥)有限公司 A kind of method for processing video frequency, device and terminal
CN107239268A (en) * 2016-03-29 2017-10-10 阿里巴巴集团控股有限公司 A kind of method for processing business, device and intelligent terminal
CN108924566A (en) * 2011-12-19 2018-11-30 索尼公司 Image processing equipment and method
CN109005160A (en) * 2018-07-10 2018-12-14 广州虎牙信息科技有限公司 Video encoding/decoding method, device and computer readable storage medium, terminal
US10157480B2 (en) 2016-06-24 2018-12-18 Microsoft Technology Licensing, Llc Efficient decoding and rendering of inter-coded blocks in a graphics pipeline
CN109408028A (en) * 2018-09-21 2019-03-01 东软集团股份有限公司 Floating point arithmetic method, apparatus and storage medium
US10237566B2 (en) 2016-04-01 2019-03-19 Microsoft Technology Licensing, Llc Video decoding using point sprites
US10575007B2 (en) 2016-04-12 2020-02-25 Microsoft Technology Licensing, Llc Efficient decoding and rendering of blocks in a graphics pipeline
CN111464773A (en) * 2020-04-08 2020-07-28 湖南泽天智航电子技术有限公司 Multi-channel video display method and system
US11197010B2 (en) 2016-10-07 2021-12-07 Microsoft Technology Licensing, Llc Browser-based video decoder using multiple CPU threads

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577110B (en) * 2009-05-31 2012-04-25 腾讯科技(深圳)有限公司 Method for playing videos and video player
CN102164284A (en) * 2010-02-24 2011-08-24 富士通株式会社 Video decoding method and system
CN102158694B (en) * 2010-12-01 2012-12-26 航天恒星科技有限公司 Remote-sensing image decompression method based on GPU (Graphics Processing Unit)
CN102158694A (en) * 2010-12-01 2011-08-17 航天恒星科技有限公司 Remote-sensing image decompression method based on GPU (Graphics Processing Unit)
CN102904857A (en) * 2011-07-25 2013-01-30 风网科技(北京)有限公司 Client video playing system and method thereof
CN102404576A (en) * 2011-11-30 2012-04-04 国云科技股份有限公司 Cloud terminal decoder and its load balancing algorithm and GPU decoding algorithm
CN102497550A (en) * 2011-12-05 2012-06-13 南京大学 Parallel acceleration method and device for motion compensation interpolation in H.264 encoding
CN108924566A (en) * 2011-12-19 2018-11-30 索尼公司 Image processing equipment and method
US12389001B2 (en) 2011-12-19 2025-08-12 Sony Group Corporation Image processing device and method
US11758139B2 (en) 2011-12-19 2023-09-12 Sony Corporation Image processing device and method
CN102547289A (en) * 2012-01-17 2012-07-04 西安电子科技大学 Fast motion estimation method realized based on GPU (Graphics Processing Unit) parallel
CN102547289B (en) * 2012-01-17 2015-01-28 西安电子科技大学 Fast motion estimation method realized based on GPU (Graphics Processing Unit) parallel
CN102932003A (en) * 2012-09-07 2013-02-13 上海交通大学 Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework
CN102932003B (en) * 2012-09-07 2016-05-04 上海交通大学 The acceleration interpretation method of the QC-LDPC code based on GPU framework
CN103077688A (en) * 2013-01-11 2013-05-01 北京京东方光电科技有限公司 Source electrode driving device and source electrode driving method of liquid crystal display screen
CN103077688B (en) * 2013-01-11 2015-02-18 北京京东方光电科技有限公司 Source electrode driving device and source electrode driving method of liquid crystal display screen
CN103108186A (en) * 2013-02-21 2013-05-15 中国对外翻译出版有限公司 Method of achieving high-definition transmission of videos
CN104104958A (en) * 2013-04-08 2014-10-15 联发科技(新加坡)私人有限公司 Image decoding method and image decoding device
CN104104958B (en) * 2013-04-08 2017-08-25 联发科技(新加坡)私人有限公司 Picture decoding method and its picture decoding apparatus
CN104143334A (en) * 2013-05-10 2014-11-12 中国电信股份有限公司 Programmable graphics processor and method for performing sound mixing on multipath audio through programmable graphics processor
CN104143334B (en) * 2013-05-10 2017-06-16 中国电信股份有限公司 Programmable graphics processor and its method that audio mixing is carried out to MCVF multichannel voice frequency
CN103327413A (en) * 2013-06-26 2013-09-25 四川长虹电器股份有限公司 Method for achieving alpha animation in smart television
CN103427844B (en) * 2013-07-26 2016-03-02 华中科技大学 A kind of high-speed lossless data compression method based on GPU and CPU mixing platform
CN103427844A (en) * 2013-07-26 2013-12-04 华中科技大学 High-speed lossless data compression method based on GPU-CPU hybrid platform
CN104519353A (en) * 2013-09-29 2015-04-15 联想(北京)有限公司 Image processing method and electronic apparatus
CN104519353B (en) * 2013-09-29 2019-02-05 联想(北京)有限公司 Image processing method and electronic equipment
CN104702954A (en) * 2013-12-05 2015-06-10 华为技术有限公司 Video coding method and device
CN104702954B (en) * 2013-12-05 2017-11-17 华为技术有限公司 Method for video coding and device
CN105794209B (en) * 2013-12-13 2019-01-01 高通股份有限公司 Method and apparatus for coding depth blocks
CN105794209A (en) * 2013-12-13 2016-07-20 高通股份有限公司 Controlling sub prediction unit (SUB-PU) motion parameter inheritance (MPI) in three dimensional (3D) HEVC or other 3D coding
CN103841389A (en) * 2014-04-02 2014-06-04 北京奇艺世纪科技有限公司 Video playing method and player
US10666960B2 (en) 2014-11-14 2020-05-26 Lg Electronics Inc. Method and device for performing graph-based transform using generalized graph parameter
CN107113426A (en) * 2014-11-14 2017-08-29 Lg 电子株式会社 The method and apparatus that the conversion based on figure is performed using broad sense graphic parameter
CN107113426B (en) * 2014-11-14 2020-03-03 Lg 电子株式会社 Method and apparatus for performing graph-based transformations using generalized graph parameters
CN104836970A (en) * 2015-03-27 2015-08-12 北京联合大学 Multi-projector fusion method based on GPU real-time video processing, and multi-projector fusion system based on GPU real-time video processing
CN104836970B (en) * 2015-03-27 2018-06-15 北京联合大学 More projection fusion methods and system based on GPU real time video processings
CN105120293A (en) * 2015-08-26 2015-12-02 中国航空工业集团公司洛阳电光设备研究所 Image cooperative decoding method and apparatus based on CPU and GPU
CN105120293B (en) * 2015-08-26 2018-07-06 中国航空工业集团公司洛阳电光设备研究所 Image collaboration coding/decoding method and device based on CPU and GPU
CN105678681A (en) * 2015-12-30 2016-06-15 广东威创视讯科技股份有限公司 GPU data processing method, GPU, PC architecture processor and GPU data processing system
CN105787987A (en) * 2016-03-15 2016-07-20 广州爱九游信息技术有限公司 Texture processing method and electronic equipment
CN107239268A (en) * 2016-03-29 2017-10-10 阿里巴巴集团控股有限公司 A kind of method for processing business, device and intelligent terminal
US10237566B2 (en) 2016-04-01 2019-03-19 Microsoft Technology Licensing, Llc Video decoding using point sprites
US10575007B2 (en) 2016-04-12 2020-02-25 Microsoft Technology Licensing, Llc Efficient decoding and rendering of blocks in a graphics pipeline
US10157480B2 (en) 2016-06-24 2018-12-18 Microsoft Technology Licensing, Llc Efficient decoding and rendering of inter-coded blocks in a graphics pipeline
CN106210726A (en) * 2016-08-08 2016-12-07 成都佳发安泰科技股份有限公司 The method that utilization rate according to CPU Yu GPU carries out adaptive decoding to video data
CN106331852A (en) * 2016-09-13 2017-01-11 武汉斗鱼网络科技有限公司 Method and system of using WP cell phone for H264 hardware decoding
US11197010B2 (en) 2016-10-07 2021-12-07 Microsoft Technology Licensing, Llc Browser-based video decoder using multiple CPU threads
CN106504185A (en) * 2016-10-26 2017-03-15 腾讯科技(深圳)有限公司 One kind renders optimization method and device
CN106504185B (en) * 2016-10-26 2020-04-07 腾讯科技(深圳)有限公司 Rendering optimization method and device
CN106792066A (en) * 2016-12-20 2017-05-31 暴风集团股份有限公司 The method and system that the video decoding of optimization is played
CN107172432A (en) * 2017-03-23 2017-09-15 杰发科技(合肥)有限公司 A kind of method for processing video frequency, device and terminal
US10602135B2 (en) 2017-03-23 2020-03-24 Autochips Inc. Video processing method, device and terminal
CN109005160A (en) * 2018-07-10 2018-12-14 广州虎牙信息科技有限公司 Video encoding/decoding method, device and computer readable storage medium, terminal
CN109408028B (en) * 2018-09-21 2021-03-05 东软集团股份有限公司 Floating point number operation method and device and storage medium
CN109408028A (en) * 2018-09-21 2019-03-01 东软集团股份有限公司 Floating point arithmetic method, apparatus and storage medium
CN111464773A (en) * 2020-04-08 2020-07-28 湖南泽天智航电子技术有限公司 Multi-channel video display method and system

Also Published As

Publication number Publication date
CN101123723B (en) 2011-01-12

Similar Documents

Publication Publication Date Title
CN101123723A (en) Digital Video Decoding Method Based on Graphics Processor
US10719447B2 (en) Cache and compression interoperability in a graphics processor pipeline
JP5638230B2 (en) Method, apparatus and computer readable recording medium for decoding video supported by a graphics processing unit
KR100856211B1 (en) High speed image processing method and device therefor based on graphic accelerator
US20140153635A1 (en) Method, computer program product, and system for multi-threaded video encoding
US7565021B2 (en) Efficient implementation of block-based transform on graphics processing unit
US20120307004A1 (en) Video decoding with 3d graphics shaders
CN111402380B (en) GPU compressed texture processing method
CN102254297A (en) Multi-shader system and processing method thereof
WO2017172053A2 (en) Method and apparatus for multi format lossless compression
US10397542B2 (en) Facilitating quantization and compression of three-dimensional graphics data using screen space metrics at computing devices
WO2008152513A2 (en) Methods and apparatus for image compression and decompression using graphics processing unit (gpu)
US8624896B2 (en) Information processing apparatus, information processing method and computer program
WO2017105595A1 (en) Graphics processor logic for encoding increasing or decreasing values
JP2023530306A (en) Delta triplet index compression
CN102164284A (en) Video decoding method and system
CN102572436A (en) Intra-frame compression method based on CUDA (Compute Unified Device Architecture)
Nagayasu et al. A decompression pipeline for accelerating out-of-core volume rendering of time-varying data
US10198850B2 (en) Method and apparatus for filtering compressed textures
Datla et al. Parallelizing motion JPEG 2000 with CUDA
CN1520187A (en) System and method for video data compression
US7911478B2 (en) Drawing device, drawing method, and drawing program
Kwan et al. Packing vertex data into hardware-decompressible textures
KR100382106B1 (en) 3D graphic accelerator based on MPEG
Han et al. Efficient video decoding on GPUs by point based rendering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110112

Termination date: 20130811