CN102006483A

CN102006483A - Video coding and decoding method and device

Info

Publication number: CN102006483A
Application number: CN 200910091982
Authority: CN
Inventors: 左雯; 王宁; 郭秀江; 于培松; 卓力; 李晓光; 田卫
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2009-09-03
Filing date: 2009-09-03
Publication date: 2011-04-06
Anticipated expiration: 2029-09-03
Also published as: CN102006483B

Abstract

The invention discloses a video encoding and decoding method and device, wherein the video encoding and decoding method includes: using wavelet transform to divide the input video image into different subbands, and encoding the lowest frequency subband in each frame of video image Form the lowest frequency sub-band code stream, and independently encode the high-frequency sub-bands to form a high-frequency sub-band code stream, combine and package the obtained code streams to form a combined code stream and output it; according to the code stream combination and packaging method at the encoding end, The lowest-frequency sub-band code stream and high-frequency sub-band code stream are respectively analyzed from the input combined code stream; according to the encoding method of the lowest-frequency sub-band and high-frequency sub-band, respectively Corresponding decoding operation is carried out with the code stream to obtain the reconstructed image of the lowest frequency sub-band and the reconstructed image of each high-frequency sub-band in the wavelet domain; the decoded reconstructed image of the entire video image is obtained by inverse wavelet transform. The invention adopts the scalable, real-time and good error-tolerant coding and decoding operation for the video image.

Description

Video coding and decoding method and device

Technical Field

The present invention relates to video image information processing, and in particular, to a video encoding and decoding method and apparatus.

Background

With the continuous development and the increasing popularization of the third generation (3G) video service, people have higher and higher requirements on the quality of video calls, and the video compression coding technology is more and more important. Therefore, it is a driving force and goal of the development of video compression coding technology to improve coding compression efficiency, reduce complexity, improve fault tolerance, improve classification capability, etc.

With respect to the current development of video compression coding technology, it is divided into two main categories according to whether international standardization exists or not: one is an International standard established by two International Standards organizations, such as the mpeg x series established by the International Telecommunication union Telecommunication Standardization group (ITU-T, ITU Telecommunication Standardization Sector) and the h.26x series established by the International Standards Organization (ISO) and International Electrotechnical Commission (IEC), which have become mainstream technologies of video services and have been widely used; the other is a national standard developed by a certain country or an industry standard developed by a certain industry, such as digital audio video coding and decoding technology standard (AVS), VC1, VP7, and the like. These standard techniques basically employ a hybrid coding algorithm framework based on discrete Digital Cosine Transform (DCT), i.e., a coding framework of block prediction, DCT, entropy coding (AC), etc. Such frames work well in terms of compression coding efficiency, but due to the problems of such frames and the DCT itself, these standard techniques have some drawbacks, such as: the DCT spectrum has no local characteristics, which may cause blocking artifacts, etc., and additional techniques, such as deblocking filtering, etc., are required to remove the blocking artifacts; for another example: the framework has no natural grading capability and poor expandability, and can be realized only by additionally adding an enhancement layer to achieve the grading purpose; furthermore, the fault tolerance of such frameworks is cumbersome to design and requires additional technology to implement.

Therefore, a wavelet transform coding algorithm-based framework with the advantages of multi-resolution and the like becomes a research hotspot once, and the framework is mainly divided into three types: 1) a prediction + wavelet transform + entropy coding framework; 2) a wavelet transformation + prediction + entropy coding framework; 3)2D spatial wavelet transform +1D temporal wavelet transform framework. In view of the characteristics of the wavelet transform coding algorithm, the three frameworks are easy to realize hierarchical and fault-tolerant design. However, compared with the frame based on DCT coding, the wavelet transform coding algorithm frame has the defects of low compression efficiency and high computational complexity, and the frame 3) has large requirements for time delay and buffer, so that the wavelet transform coding algorithm frame is not well developed and applied.

Therefore, how to combine the advantages of the two types of frames and fully exert the respective advantages to provide a real-time coding algorithm frame with high video image quality, scalability and good fault-tolerant performance becomes a difficult problem in the development of the current video coding technology.

Disclosure of Invention

In view of the above, the present invention provides a video encoding and decoding method and apparatus, which can perform scalable, real-time and good error-tolerant encoding and decoding operations on video images.

To achieve the above object, the present invention discloses a video encoding method, comprising: dividing different sub-bands into input video images by adopting wavelet transformation, coding the lowest-frequency sub-band in each frame of video image to form a lowest-frequency sub-band code stream, independently coding the high-frequency sub-bands to form a high-frequency sub-band code stream, merging and packaging the obtained code streams to form a merged code stream and outputting the merged code stream.

Before merging and packing the obtained code streams, the method further comprises the following steps: respectively carrying out wavelet domain image reconstruction on the low-frequency sub-band code stream and the high-frequency sub-band code stream, and then carrying out wavelet inverse transformation on the obtained wavelet domain image to obtain a reconstructed image of the video image; subtracting the video image from the reconstructed image to obtain a difference image, and encoding the difference image to form a difference image code stream; correspondingly, the obtained code streams are merged and packed into: and merging and packaging the low-frequency sub-band code stream, the high-frequency sub-band code stream and the difference image code stream.

In the above method, the merging and packing further includes: and ordering and inserting the resynchronization marks according to the data importance.

Accordingly, the present invention provides a video decoding method, comprising: respectively resolving a lowest-frequency sub-band code stream and a high-frequency sub-band code stream from the input combined code stream according to a code stream combining and packaging method of a coding end; respectively carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream according to the encoding methods of the lowest-frequency sub-band and the high-frequency sub-band to obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band; and obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation.

Wherein, the method also comprises: respectively resolving difference image code streams from input combined code streams according to a code stream combining and packaging method of a coding end; according to the encoding method of the difference image, decoding operation is carried out on the difference image code stream to obtain a reconstructed image of the difference image, and the decoded reconstructed image and the reconstructed image of the difference image are added to obtain a decoded video image.

Correspondingly, the invention provides a video coding and decoding method, which comprises the following steps: at a coding end, dividing an input video image into different sub-bands by adopting wavelet transformation, coding the lowest-frequency sub-band in each frame of video image to form a lowest-frequency sub-band code stream, respectively independently coding the high-frequency sub-bands to form a high-frequency sub-band code stream, merging and packaging the obtained code streams to form a merged code stream and outputting the merged code stream; at a decoding end, respectively resolving a lowest-frequency sub-band code stream and a high-frequency sub-band code stream from the input combined code stream according to a code stream combining and packaging method of an encoding end; respectively carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream according to the encoding methods of the lowest-frequency sub-band and the high-frequency sub-band to obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band; and obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation.

Wherein, the method also comprises: at the encoding end, before merging and packing the obtained code streams, the method further comprises the following steps: respectively carrying out wavelet domain image reconstruction on the low-frequency sub-band code stream and the high-frequency sub-band code stream, and carrying out wavelet inverse transformation on the obtained wavelet domain image to obtain a reconstructed image of a video image; subtracting the video image from the reconstructed image to obtain a difference image, and encoding the difference image to form a difference image code stream; and/or merging and packing the obtained code streams into: merging and packaging the low-frequency sub-band code stream, the high-frequency sub-band code stream and the difference image code stream; wherein the merge packing operation further comprises: and ordering and inserting the resynchronization marks according to the data importance.

In the above method, the method further comprises: at a decoding end, respectively resolving difference image code streams from input combined code streams according to a code stream combining and packaging method of a coding end; according to the encoding method of the difference image, decoding operation is carried out on the difference image code stream to obtain a reconstructed image of the difference image, and the decoded reconstructed image and the reconstructed image of the difference image are added to obtain a decoded video image.

Wherein, the decoding reconstruction image of the whole video image obtained by wavelet inverse transformation is as follows: according to the wavelet transform adopted by the encoding end, performing wavelet domain reconstruction on the reconstructed image of the lowest frequency sub-band and the reconstructed image of each wavelet domain high-frequency sub-band, and synthesizing to obtain an entire wavelet domain image; and carrying out wavelet inverse transformation on the wavelet domain image to obtain a decoding reconstruction image of the whole video image.

To achieve the above method, the present invention provides a video encoding apparatus, including: the device comprises a sub-band dividing module, a sub-band coding module and a code stream merging module; the device comprises a sub-band dividing module, a video coding module and a video coding module, wherein the sub-band dividing module is used for dividing an input video image into different sub-bands by adopting wavelet transformation; the subband coding module is used for coding the lowest frequency subband in each frame of video image to form a lowest frequency subband code stream and independently coding the high frequency subbands to form a high frequency subband code stream; and the code stream merging module is used for merging and packaging the code streams obtained by the sub-band coding module to form merged code streams and outputting the merged code streams.

Wherein the video encoding apparatus further comprises: the device comprises a coding side reconstruction module and a difference image processing module; the encoding side reconstruction module is used for respectively reconstructing wavelet domain images of the low-frequency sub-band code stream and the high-frequency sub-band code stream obtained by the sub-band encoding module, and performing wavelet inverse transformation on the obtained wavelet domain images to obtain reconstructed images of video images; and the difference image processing module is used for subtracting the video image from the reconstructed image to obtain a difference image, and encoding the difference image to form a difference image code stream.

To achieve the above method, the present invention further provides a video decoding apparatus, including: the device comprises a code stream splitting module, a code stream decoding module and an image reconstruction module; the code stream splitting module is used for respectively resolving a lowest-frequency sub-band code stream and a high-frequency sub-band code stream from the input combined code stream; the code stream decoding module is used for carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream to respectively obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band; and the image reconstruction module is used for obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation.

In the above apparatus, the video decoding apparatus further includes: the difference image decoding module and the difference image reconstruction module; the difference image decoding module is used for respectively resolving difference image code streams from input combined code streams according to a code stream combining and packaging method of a coding end; the difference image reconstruction module is used for decoding the difference image code stream according to the coding method of the difference image to obtain a reconstructed image of the difference image; correspondingly, the image reconstruction module is configured to add the decoded reconstructed image and the reconstructed image of the difference image to obtain a decoded video image.

In order to implement the above method, the present invention provides a video encoding and decoding apparatus, including: a video encoding device and a video decoding device; the video coding device is used for dividing an input video image into different sub-bands by adopting wavelet transformation, coding the lowest-frequency sub-band in each frame of video image to form a lowest-frequency sub-band code stream, independently coding the high-frequency sub-bands to form a high-frequency sub-band code stream, merging and packaging the obtained code streams to form a merged code stream and outputting the merged code stream; the video decoding device is used for respectively analyzing the lowest-frequency sub-band code stream and the high-frequency sub-band code stream from the input combined code stream, and respectively carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream to obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band; and obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation.

Wherein the video encoding device comprises: the sub-band dividing module is used for dividing the input video image into different sub-bands by adopting wavelet transformation; the subband coding module is used for coding the lowest frequency subband in each frame of video image to form a lowest frequency subband code stream and independently coding the high frequency subbands to form a high frequency subband code stream; the code stream merging module is used for merging and packaging the code streams obtained by the sub-band coding module to form merged code streams and outputting the merged code streams; and/or, the encoding side reconstruction module is used for respectively reconstructing wavelet domain images of the low-frequency sub-band code stream and the high-frequency sub-band code stream obtained by the sub-band encoding module, and performing wavelet inverse transformation on the obtained wavelet domain images to obtain reconstructed images of the video images; and the difference image processing module is used for subtracting the video image from the reconstructed image to obtain a difference image, and encoding the difference image to form a difference image code stream.

In the above apparatus, the video decoding apparatus includes: the device comprises a code stream splitting module, a code stream decoding module, an image reconstruction module and/or a difference image decoding module and a difference image reconstruction module; the code stream splitting module is used for respectively resolving a lowest-frequency sub-band code stream and a high-frequency sub-band code stream from the input combined code stream; the code stream decoding module is used for carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream to respectively obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band; the image reconstruction module is used for obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation; and/or, the difference image decoding module is used for respectively resolving difference image code streams from the input combined code streams according to a code stream combining and packaging method of the encoding end; the difference image reconstruction module is used for decoding the difference image code stream according to the coding method of the difference image to obtain a reconstructed image of the difference image; correspondingly, the image reconstruction module is configured to add the decoded reconstructed image and the reconstructed image of the difference image to obtain a decoded video image.

According to the technical scheme, the video coding and decoding method and the device provided by the invention have the advantages that the wavelet transform is adopted to divide the input video image into different sub-bands, the divided different sub-bands are coded by a DCT-based hybrid coding algorithm framework, and the corresponding decoding method is adopted according to the coding method. The DCT-based hybrid coding algorithm framework is adopted for the lowest frequency sub-band, which can improve the overall coding efficiency of the scheme, for example: the time domain correlation and the space correlation can be removed by adopting a hybrid coding algorithm frame based on DCT, and the coding effect is improved; therefore, the invention effectively integrates the DCT-based hybrid coding and decoding framework and the wavelet transform-based coding and decoding framework, fully utilizes the advantages of high compression efficiency and low complexity based on the DCT hybrid coding algorithm, and the multi-resolution analysis characteristic and the fault-tolerant characteristic of the wavelet transform, has the real-time video coding and decoding method with high video image quality, scalability and good fault-tolerant performance, and has the following outstanding advantages:

1) the invention has natural space gradable, quality gradable and time gradable capability, can easily realize various gradings to meet the application of different quality requirements, different space resolution requirements, time resolution requirements and the like, and can realize different gradings only by one code stream, while the prior system needs corresponding code streams according to each requirement. Among other things, because the coding architecture of the present invention naturally partitions a video image into subbands of different properties, such as different spatial resolutions, after wavelet transform, the combination between the different subbands naturally forms a hierarchy. Some existing scalable schemes, such as H264SVC, need to design additional enhancement layers to achieve scalability.

2) Since only the original video image 1/4 is processedⁿWhen the LL sub-band with the lowest frequency is used for motion estimation, and the high-frequency sub-band and the difference image adopt time prediction, the motion vector can be obtained by deducing the motion vector of the sub-band with the lowest frequency, so the invention has low overall operation complexity and low coding time consumption.

3) The invention has high compression efficiency because the encoding compression efficiency of the LL sub-band with the lowest frequency is high, and the high-frequency sub-band and the difference image are effectively compressed and encoded. And, support the lossless compression of the video, for example, adopt the coding mode of the lossless compression to the difference image.

In addition, the invention respectively and independently encodes the lowest frequency sub-band, the high frequency sub-bands of each level and the difference image, and also adopts the fault-tolerant technology such as processing operations of ordering according to the data importance, inserting resynchronization marks and the like when merging and packaging, thereby having strong fault-tolerant performance and ensuring the video image quality under the condition of certain packet loss and error code. In addition, the coding and decoding method of the invention has low computation complexity and no time delay, and can meet the requirement of real-time property.

Therefore, the invention is suitable for various video services, can provide real-time video coding and decoding methods with various grades and good fault-tolerant performance, and has great application and popularization values.

Drawings

FIG. 1 is a flow chart of an implementation of a video encoding method of the present invention;

FIG. 2 is a diagram illustrating a data structure of a two-level wavelet transform according to the present invention;

FIG. 3 is a block diagram of a video encoding method according to the present invention;

FIG. 4 is a flowchart of an implementation of a video decoding method according to the present invention;

FIG. 5 is a block diagram of a frame for implementing the video decoding method according to the present invention;

fig. 6 is a schematic structural diagram of a video encoding and decoding device according to the present invention.

Detailed Description

The basic idea of the invention is that: the method comprises the steps of dividing an input video image into different sub-bands by adopting wavelet transformation, coding the divided different sub-bands by a DCT-based hybrid coding algorithm framework, and adopting a corresponding decoding method according to a coding method. Therefore, the advantages of high compression efficiency and low complexity based on the DCT hybrid coding algorithm and the multi-resolution analysis characteristic and fault-tolerant characteristic of wavelet transformation are fully utilized, and the video coding and decoding method with real-time, high video image quality, gradable and good fault-tolerant performance is realized.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below. The invention includes two parts of encoding method and decoding method, first describe the technical scheme of encoding end, the video encoding method of the invention is shown in figure 1, including the following steps:

step 101, dividing an input video image into different sub-bands by adopting wavelet transformation;

here, n levels of wavelet transform are performed on the video image frame F, where n is greater than or equal to 1 and n is an integer, and four sub-bands are obtained after each level of wavelet transform, that is: a low-pass component sub-band LL in the row and column directions, a high-pass component sub-band LH in the row and column directions, a low-pass component sub-band HL in the row and column directions, and a high-pass component sub-band HH in the row and column directions.

After n-level wavelet transform, each frame of video image is divided into 3n +1 subbands, for example: a two-level discrete Digital Wavelet Transform (DWT) is performed on an input video image, and n is 2, as shown in fig. 2, forming 7 sub-bands of LL2, LH2, HL2, HH2, LH1, HL1, and HH 1.

The wavelet transform is a discrete two-dimensional orthogonal wavelet transform, a biorthogonal wavelet transform or an integer wavelet transform, which is selected according to practical applications, such as applications of video calls, video conferences, video storage and the like.

102, encoding the lowest frequency sub-band in each frame of video image to form a lowest frequency sub-band code stream;

here, the lowest frequency sub-band of the video image is the LL sub-band of the lowest frequency in each frame of the video image, and the size is 1/4 of the original video imageⁿAnd coding by adopting a hybrid coding algorithm frame based on DCT to form a lowest-frequency LL sub-band code stream, namely a basic layer code stream. Wherein the DCT-based hybrid coding algorithm framework comprises: a coding algorithm framework of prediction, DCT, quantization, entropy coding.

It should be noted that the use of the DCT-based hybrid coding algorithm framework for the lowest frequency sub-bands can improve the overall coding efficiency of the scheme, for example: the time domain correlation and the space correlation can be removed by adopting a hybrid coding algorithm frame based on DCT, and the coding effect is improved.

Here, the frame types supported by the coding scheme of the present embodiment include: i frames, P frames and B frames, wherein the P frames refer to frames adopting forward prediction and refer to previous reconstructed images; b frame refers to a forward and backward bidirectional prediction frame, and refers to the reconstructed images in the front and back directions; i-frames do not refer to other reconstructed images, but only spatially predicted frames. Therefore, if the type of the frame to be coded is an I frame, the coding mode of intra-frame prediction, DCT, quantization and entropy coding is adopted; if the type of the frame to be coded is a P frame or a B frame, the coding modes of inter-frame prediction, DCT, quantization and entropy coding are adopted.

103, independently coding the high-frequency sub-bands respectively to form high-frequency sub-band code streams;

in order to improve the fault-tolerant performance of the code stream, the high-frequency sub-band of each level is independently coded, and the coefficient correlation of the high-frequency sub-band of the previous level or the high-frequency sub-band of the next level is not used. And uniformly packing the high-frequency sub-bands transformed at each level to form n high-frequency sub-band code streams. The n code streams can be used as n spatial enhancement layers and also can be used as n quality enhancement layers.

Wherein the high frequency sub-band comprises: all sub-bands other than the lowest-frequency LL sub-band of each frame are 3n in number and have a size of 1/4 corresponding to the original video imageⁿ、1/4^n-1.... Reference is made to fig. 3, which is a schematic diagram of an implementation framework of the encoding method of the present invention. The coding modes of the high-frequency sub-band are various, including:

1) coding by adopting a direct quantization and entropy coding mode;

2) coding by adopting DCT, quantization and entropy coding modes;

3) when the frame type is P frame or B frame, deducing the motion vector of the corresponding position of the high frequency sub-band by using the motion vector of the LL sub-band with the lowest frequency, and coding by adopting a prediction, DCT, quantization and entropy coding mode;

4) and coding by adopting zero block coding according to the importance of the high-frequency subband coefficient and according to blocks.

Of course, the above methods can be fused to encode the high frequency subbands.

104, respectively reconstructing a wavelet domain image for the lowest-frequency sub-band code stream and the high-frequency sub-band code stream to form a whole wavelet domain image, and performing wavelet inverse transformation on the wavelet domain image to obtain a reconstructed image of the video image;

and respectively reconstructing wavelet domain images of the low-frequency sub-band code stream and the high-frequency sub-band code stream by adopting corresponding decoding operations according to the coding modes adopted in the

steps

102 and 103, mainly decoding the lowest-frequency sub-band and the high-frequency sub-band, putting the decoded data on corresponding positions of a wavelet domain to form a whole wavelet domain image, and then performing wavelet inverse transformation to obtain a reconstructed image F' of the video image frame F.

105, subtracting the video image and the reconstructed image thereof to obtain a difference image, and coding the difference image to form a difference image code stream;

in this step, the difference image of the video image frame F and the reconstructed image F' thereof is encoded to form a difference image code stream, i.e., another quality enhancement layer code stream. Reference is made to fig. 3, which is a schematic diagram of a framework for implementing the encoding method according to the present invention. There are various encoding methods for the difference image, such as:

1) coding by adopting a direct quantization and entropy coding mode;

2) DCT, quantization and entropy coding methods are adopted;

4) and a lossless coding mode is adopted.

And 106, merging and packaging the low-frequency sub-band code stream, the high-frequency sub-band code stream and the difference image code stream to form a merged code stream and outputting the merged code stream.

In order to further improve the fault-tolerant performance of the code stream, when merging and packaging, the following steps are added: and sorting according to the importance of the data, inserting resynchronization identification and the like. Here, since the coding scheme of this embodiment does not need to add an additional enhancement layer, and only needs to intercept the code stream according to specific classification requirements, spatial classification of at most n +1 layers and quality classification of at most n +2 layers are easily implemented according to the code stream obtained in step 102, step 103, and step 105; and when B-frame coding is used, since B-frames are bi-directionally predicted frames, the trade-off of B-frames can affect the change of frame rate, and thus temporal scalability can also be achieved. Here, it should be noted that the spatial hierarchy refers to a hierarchy that implements different spatial resolutions, such as 4CIF, QCIF; the quality grading refers to grading for realizing different image qualities, such as grading of 256kbps code rate, 384kbps code rate and 768kbps code rate; the temporal grading refers to grading to realize different temporal resolutions, such as a frame rate of 15 frames per second, 30 frames per second, and the like.

Now that the encoding process of the input video image has been explained, the following describes the decoding process of the encoded code stream, as shown in fig. 4, including the following steps:

step 401, respectively resolving a lowest frequency sub-band code stream, a high frequency sub-band code stream and a difference image code stream from the input combined code stream according to a code stream combining and packaging method of a coding end;

and the decoding end acquires the code stream merging and packing method adopted by the encoding end according to the identification bit indicating the merging and packing method adopted by the encoding end in the merging code stream.

Step 402, according to the encoding method of the lowest frequency sub-band, performing corresponding decoding operation on the lowest frequency sub-band code stream;

for example: the decoding end can know the lowest frequency LL sub-band according to the identification bit indicating the coding mode in the code stream of the lowest frequency LL sub-band, and the lowest frequency LL sub-band adopts the methods of inter-frame prediction, DCT, quantization and entropy coding, so that the decoding method adopts the operations of entropy decoding, inverse quantization, inverse DCT and difference image reconstruction to obtain the reconstruction of the lowest frequency LL sub-band in the wavelet domain.

Step 403, performing corresponding decoding operation on the high-frequency subband code stream according to the coding method of the high-frequency subband;

for example: the decoding end can know the high-frequency sub-band code stream according to the identification bit indicating the coding mode, the high-frequency sub-band adopts quantization and entropy coding methods, and the decoding method adopts entropy decoding and inverse quantization operations to obtain the reconstruction of each wavelet domain high-frequency sub-band.

The high-frequency sub-band code stream is different according to different levels of wavelet transform, and may include: the primary high-frequency sub-band and the secondary high-frequency sub-band are high-frequency sub-bands of n degrees.

Step 404, according to the wavelet transform adopted by the encoding end, the decoding end carries out wavelet domain reconstruction on the reconstructed image of the lowest frequency sub-band of the wavelet domain and the reconstructed image of each high frequency sub-band of the wavelet domain, synthesizes to obtain an entire wavelet domain image, and carries out wavelet inverse transform on the wavelet domain image to obtain a decoded reconstructed image of the entire video image;

step 405, decoding the difference image code stream according to the coding method of the difference image to obtain a difference image;

for example: the decoding end can know the difference image according to the identification bit indicating the coding mode in the difference image code stream, the difference image adopts the methods of DCT, quantization and entropy coding, and then the operations of entropy decoding, inverse quantization and inverse DCT are adopted for decoding to obtain the reconstructed image of the difference image.

And 406, adding the decoded reconstructed image and the reconstructed image of the difference image to obtain a decoded video image.

The addition of the decoded reconstructed image and the reconstructed image of the difference image mainly refers to the addition of pixel values at corresponding positions of the decoded reconstructed image and the reconstructed image of the difference image.

Here, the decoding process can refer to fig. 5, and fig. 5 shows a frame for implementing the video decoding method of the present invention. As can be seen from the above, the above decoding scheme is easy to obtain reconstructed images of various hierarchical code streams, such as: decoding the video images with different spatial resolutions formed by the reconstructed image and the reconstructed images of the high-frequency sub-bands by the LL sub-band with the lowest frequency; or, the video image quality is further enhanced by the video images with different qualities such as the difference image to bring better video image quality.

The above is the implementation flow of the video coding and decoding method of the present invention, and the video coding and decoding method of the present invention is further described in detail below by taking the 4CIF image format with the size of 704 × 576 and the two-level discrete digital wavelet transform DWT as examples. Here, 704 and 576 each represent the number of pixels, and thus represent the size of the image. Where 4CIF is the size format of the video image, and the corresponding size is 704 × 576.

For convenience of description, in the present embodiment, the sub-bands LL, LH, HL, and HH of the video image after wavelet transform are added with sub-bands representing different levels, as shown in fig. 2, the data structure of the video image after two-level wavelet transform forms 7 sub-bands of LL2, LH2, HL2, HH2, LH1, HL1, and HH1, wherein the LL2 sub-band belongs to the LL sub-band of the lowest frequency, LH2, HL2, and HH2 are divided into two-level high-frequency sub-bands, and LH1, HL1, and HH1 are divided into one-level high-frequency sub-bands.

Step 501, performing two-level DWT on an input video image to form a total of 7 sub-bands of LL2, LH2, HL2, HH2, LH1, HL1 and HH 1;

the size of the LL2 sub-band of the lowest frequency obtained by division is 176 × 144, the size of the secondary high-frequency sub-bands LH2, HL2 and HH2 is 176 × 144, and the size of the primary high-frequency sub-bands LH1, HL1 and HH1 is 352 × 288.

Step 502, encoding the LL2 sub-band with the lowest frequency by adopting a mixed encoding mode based on DCT;

the encoding is carried out by adopting a hybrid encoding mode based on DCT (discrete cosine transform), such as an encoding tool conforming to H264 base profile, so as to obtain a lowest-frequency sub-band code stream, namely a basic layer code stream.

Step 503, the secondary high-frequency subbands LH2, HL2, and HH2 are encoded by direct quantization and entropy encoding to obtain a secondary high-frequency subband code stream, i.e. the first quality or spatial scalable enhancement layer code stream.

In step 504, the first-level high-frequency subbands LH1, HL1, and HH1 are encoded by direct quantization and entropy encoding to obtain a first-level high-frequency subband code stream, i.e., a second quality or spatial scalable enhancement layer code stream.

505, carrying out coding side reconstruction on the lowest frequency sub-band, and carrying out inverse quantization reconstruction on the second-level and first-level high frequency sub-bands to obtain a whole wavelet domain image;

the step mainly carries out the reconstruction of the wavelet domain image, which means that: and respectively placing the data of the lowest-frequency sub-band reconstructed image and the data of the second-level high-frequency sub-band reconstructed image and the first-level high-frequency sub-band reconstructed image at corresponding positions of a wavelet domain to obtain a wavelet domain image. Wherein the size of the wavelet domain image is 704 × 576.

Step 506, performing two-level wavelet inverse transformation on the wavelet domain image to obtain a reconstructed image of the original video image, and subtracting the reconstructed image from the original video image to obtain a difference image;

the reconstructed image size of the original video image is the same as that of the difference image, and is 704 × 576.

And 507, coding the difference image, and obtaining a difference image code stream, namely a third quality enhancement layer code stream, by adopting DCT, quantization and entropy coding modes.

And step 508, uniformly combining and packaging the lowest-frequency sub-band code stream, the secondary high-frequency sub-band code stream and the primary high-frequency sub-band code stream, and outputting the combined code stream.

By adopting two-level wavelet transform, the output code stream realizes three-layer spatial classification, such as: 704 × 576, 352 × 288, and 176 × 144; and four-layer quality grading, such as four quality grading of LL2, LL2+ LH2+ HL2+ HH2, LL2+ LH2+ HL2+ HH2+ LH1+ HL1+ HH1, L2+ LH2+ HL2+ HH2+ LH1+ HL1+ HH1+ difference image. Of course, this spatial and quality mixed grading can also be achieved.

Correspondingly, the decoding method is realized as follows:

step 601, the decoding end analyzes the input code stream to respectively obtain a lowest frequency sub-band code stream, a secondary high frequency sub-band code stream, a primary high frequency sub-band code stream and a difference image code stream;

step 602, decoding the lowest-frequency sub-band code stream, and decoding by using a decoding tool set of H264 base profile to obtain a reconstructed image of the lowest-frequency LL2 sub-band;

step 603, decoding the secondary high-frequency sub-band code stream, and obtaining reconstructed images of secondary high-frequency sub-bands LH2, HL2 and HH2 by adopting entropy decoding and inverse quantization methods;

step 604, decoding the primary high-frequency sub-band code stream, and obtaining a reconstructed image of the primary high-frequency sub-bands LH1, HL1 and HH1 by adopting entropy decoding and inverse quantization methods;

step 605, performing two-level wavelet inverse transformation according to the reconstructed image of the lowest-frequency LL2 sub-band, the reconstructed images of the secondary high-frequency LH2, HL2 and HH2 sub-bands and the reconstructed images of the primary high-frequency LH1, HL1 and HH1 sub-bands to obtain a wavelet domain decoding reconstructed image;

step 606, decoding the difference image code stream, and obtaining a reconstructed image of the difference image by adopting methods of entropy decoding, inverse quantization and inverse DCT;

step 607, adding the decoded reconstructed image and the reconstructed image of the difference image to obtain an output decoded video image.

It should be noted that, if the code stream at the encoding end can implement three-layer spatial classification and four-layer quality classification, and the decoding end only needs to perform appropriate clipping on the above decoding steps to obtain a corresponding classified reconstructed image as required. For example: a reconstructed image with 352 × 288 size and good quality is required, and only one-level wavelet transform is performed in step 601, step 602, step 603, and step 605.

For simplicity of explanation, the foregoing embodiments are described as a series of acts or combinations, but it will be appreciated by those skilled in the art that the invention is not limited by the order of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention.

To achieve the above method, the present invention further provides a video encoding device, a video decoding device, and a video encoding and decoding device, respectively, as shown in fig. 6:

the video encoding device includes: the device comprises a sub-band dividing module, a sub-band coding module and a code stream merging module; wherein,

the sub-band dividing module is used for dividing the input video image into different sub-bands by adopting wavelet transformation;

the subband coding module is used for coding the lowest frequency subband in each frame of video image to form a lowest frequency subband code stream and independently coding the high frequency subbands to form a high frequency subband code stream;

and the code stream merging module is used for merging and packaging the code streams obtained by the sub-band coding module to form merged code streams and outputting the merged code streams.

Wherein the video encoding device further comprises: the device comprises a coding side reconstruction module and a difference image processing module; wherein,

the encoding side reconstruction module is used for respectively reconstructing wavelet domain images of the low-frequency sub-band code stream and the high-frequency sub-band code stream obtained by the sub-band encoding module and performing wavelet inverse transformation on the obtained wavelet domain images to obtain reconstructed images of the video images;

and the difference image processing module is used for subtracting the video image from the reconstructed image to obtain a difference image, and encoding the difference image to form a difference image code stream.

The invention also provides a video decoding device, which is used for respectively resolving the lowest-frequency sub-band code stream and the high-frequency sub-band code stream from the input combined code stream, and respectively carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream to obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band; and obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation.

Wherein the video decoding apparatus comprises: the device comprises a code stream splitting module, a code stream decoding module and an image reconstruction module; wherein,

the code stream splitting module is used for respectively resolving the lowest frequency sub-band code stream and the high frequency sub-band code stream from the input combined code stream;

the code stream decoding module is used for carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream to respectively obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band;

and the image reconstruction module is used for obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation.

Wherein the video decoding apparatus further comprises:

the difference image decoding module is used for respectively resolving difference image code streams from the input combined code streams according to a code stream combining and packaging method of the encoding end;

the difference image reconstruction module is used for decoding the difference image code stream according to the coding method of the difference image to obtain a reconstructed image of the difference image;

correspondingly, the image reconstruction module is configured to add the decoded reconstructed image and the reconstructed image of the difference image to obtain a decoded video image.

Here, the video encoding apparatus and the video decoding apparatus together constitute a video encoding and decoding apparatus.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The above description is only for the purpose of illustrating the present invention, and is not intended to limit the scope of the present invention. Any modification and equivalent substitution made to the present invention within the spirit and the scope of the claims of the present invention fall within the scope of the present invention.

Claims

1. A method of video encoding, the method comprising:

dividing different sub-bands into input video images by adopting wavelet transformation, coding the lowest-frequency sub-band in each frame of video image to form a lowest-frequency sub-band code stream, independently coding the high-frequency sub-bands to form a high-frequency sub-band code stream, merging and packaging the obtained code streams to form a merged code stream and outputting the merged code stream.

2. The video coding method of claim 1, wherein before merging and packing the obtained code streams, the method further comprises:

respectively carrying out wavelet domain image reconstruction on the low-frequency sub-band code stream and the high-frequency sub-band code stream, and then carrying out wavelet inverse transformation on the obtained wavelet domain image to obtain a reconstructed image of the video image;

subtracting the video image from the reconstructed image to obtain a difference image, and encoding the difference image to form a difference image code stream;

correspondingly, the obtained code streams are merged and packed into: and merging and packaging the low-frequency sub-band code stream, the high-frequency sub-band code stream and the difference image code stream.

3. The video encoding method of claim 2, wherein the merge packing further comprises: and ordering and inserting the resynchronization marks according to the data importance.

4. A method for video decoding, the method comprising:

respectively resolving a lowest-frequency sub-band code stream and a high-frequency sub-band code stream from the input combined code stream according to a code stream combining and packaging method of a coding end;

respectively carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream according to the encoding methods of the lowest-frequency sub-band and the high-frequency sub-band to obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band;

and obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation.

5. The video decoding method of claim 4, further comprising:

respectively resolving difference image code streams from input combined code streams according to a code stream combining and packaging method of a coding end;

according to the encoding method of the difference image, decoding operation is carried out on the difference image code stream to obtain a reconstructed image of the difference image, and the decoded reconstructed image and the reconstructed image of the difference image are added to obtain a decoded video image.

6. A video encoding and decoding method, the method comprising:

at a coding end, dividing an input video image into different sub-bands by adopting wavelet transformation, coding the lowest-frequency sub-band in each frame of video image to form a lowest-frequency sub-band code stream, respectively independently coding the high-frequency sub-bands to form a high-frequency sub-band code stream, merging and packaging the obtained code streams to form a merged code stream and outputting the merged code stream;

at a decoding end, respectively resolving a lowest-frequency sub-band code stream and a high-frequency sub-band code stream from the input combined code stream according to a code stream combining and packaging method of an encoding end;

respectively carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream according to the encoding methods of the lowest-frequency sub-band and the high-frequency sub-band to obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band; and obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation.

7. The video coding and decoding method of claim 6, further comprising:

at the encoding end, before merging and packing the obtained code streams, the method further comprises the following steps:

respectively carrying out wavelet domain image reconstruction on the low-frequency sub-band code stream and the high-frequency sub-band code stream, and carrying out wavelet inverse transformation on the obtained wavelet domain image to obtain a reconstructed image of a video image; subtracting the video image from the reconstructed image to obtain a difference image, and encoding the difference image to form a difference image code stream; and/or the presence of a gas in the gas,

merging and packaging the obtained code streams into: merging and packaging the low-frequency sub-band code stream, the high-frequency sub-band code stream and the difference image code stream; wherein the merge packing operation further comprises: and ordering and inserting the resynchronization marks according to the data importance.

8. The video coding and decoding method of claim 7, further comprising:

at a decoding end, respectively resolving difference image code streams from input combined code streams according to a code stream combining and packaging method of a coding end;

9. The video coding and decoding method according to claim 6, wherein the decoded reconstructed image of the whole video image obtained by inverse wavelet transform is:

according to the wavelet transform adopted by the encoding end, performing wavelet domain reconstruction on the reconstructed image of the lowest frequency sub-band and the reconstructed image of each wavelet domain high-frequency sub-band, and synthesizing to obtain an entire wavelet domain image;

and carrying out wavelet inverse transformation on the wavelet domain image to obtain a decoding reconstruction image of the whole video image.

10. A video encoding apparatus, comprising: the device comprises a sub-band dividing module, a sub-band coding module and a code stream merging module; wherein,

11. The video coding device of claim 10, further comprising: the device comprises a coding side reconstruction module and a difference image processing module; wherein,

12. A video decoding apparatus, comprising: the device comprises a code stream splitting module, a code stream decoding module and an image reconstruction module; wherein,

13. The video decoding apparatus of claim 12, wherein the video decoding apparatus further comprises: the difference image decoding module and the difference image reconstruction module; wherein,

14. A video encoding and decoding apparatus, comprising: a video encoding device and a video decoding device; wherein,

the video coding device is used for dividing an input video image into different sub-bands by adopting wavelet transformation, coding the lowest frequency sub-band in each frame of video image to form a lowest frequency sub-band code stream, independently coding the high frequency sub-bands to form high frequency sub-band code streams respectively, merging and packaging the obtained code streams to form a merged code stream and outputting the merged code stream;

the video decoding device is used for respectively analyzing the lowest-frequency sub-band code stream and the high-frequency sub-band code stream from the input combined code stream, and respectively carrying out corresponding decoding operation on the lowest-frequency sub-band code stream and the high-frequency sub-band code stream to obtain a reconstructed image of the lowest-frequency sub-band and a reconstructed image of each wavelet domain high-frequency sub-band; and obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation.

15. The video coding and decoding device of claim 14, wherein the video coding device comprises:

the code stream merging module is used for merging and packaging the code streams obtained by the sub-band coding module to form merged code streams and outputting the merged code streams; and/or the presence of a gas in the gas,

16. The video coding and decoding device of claim 14, wherein the video decoding device comprises: the device comprises a code stream splitting module, a code stream decoding module, an image reconstruction module and/or a difference image decoding module and a difference image reconstruction module; wherein,

the image reconstruction module is used for obtaining a decoding reconstruction image of the whole video image through wavelet inverse transformation; and/or the presence of a gas in the gas,