CN1674677A

CN1674677A - Improved FGS coding method and coder/decoder thereof

Info

Publication number: CN1674677A
Application number: CN 200510025263
Authority: CN
Inventors: 张兆扬; 江涛; 石旭利
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2005-04-21
Filing date: 2005-04-21
Publication date: 2005-09-28

Abstract

The invention relates to an improved FGS video coding method and codec. Its method is based on the FGS structure proposed by MPEG-4, which has carried out motion compensation on the basic layer, and also makes motion compensation on the enhancement layer, forming a MC+FGS structure. This structure greatly improves the efficiency of the FGS structure video coding scheme. On the basis of the MC+FGS structure, extend its structure to the space-time domain, and obtain a video coding scheme with the structure of MC+FGSST, which not only obtains the fine and scalable ability of hybrid time domain + space domain + SNR, but also Get higher coding efficiency. The invention designs a codec for the MC+FGSST structure, and develops a new algorithm for how to determine the number of bit planes used for motion compensation in the coder. Experimental results show that the encoding performance of the MC+FGSST structure based on the enhancement layer for motion compensation proposed by the present invention is significantly better than the original FGSST structure for motion compensation using the base layer.

Description

Improved FGS Video Coding Method and Its Codec

技术领域technical field

本发明涉及一种适用于网络流媒体的视频编码方法及其编解码器，特别是一种改进的FGS视频编码方法及其编解码器。The invention relates to a video coding method and a codec applicable to network streaming media, in particular to an improved FGS video coding method and a codec.

背景技术Background technique

近年来，在Internet上进行网络流视频通信越来越流行。可分级编码是解决Intemet流视频应用中网络带宽不断波动的一种有效方法。传统的SNR可分级编码方案也是分层的，但每个层的比特率在编码时是预先已经定好的，因此它只能提供固定分级或称为粗糙的可分级能力，无法精细地匹配网络的即时带宽，且在增强层码率发生跳变的临界点上图像质量过渡不够平滑。为获得精细可分级能力的编码码流，MPEG-4标准中采用了FGS编码技术方案。MPEG-4 FGS技术是基于图像质量可分级的视频编码，它把原始视频序列压缩成两个码流——基本层码流和增强层码流。基本层提供给用户可接受的最低图像质量。增强层采用位平面技术编码以提供嵌入的SNR可分级能力，它允许对增强层码流进行任意码率的截取、传输，用户接收到的码流越多图像质量就越好，因此它提供了非常精细的图像质量可分级能力。在鲁棒性方面，由于基本层码率小于网络最小可用带宽，因此基本层码流发生丢包的概率大大降低，同时对基本层码流进行较强的差错复原保护技术可使得基本层图像质量得到很好的保障，而对于增强层码流可根据应用的具体要求采用不同等级的差错复原机制。但FGS技术获得的图像质量SNR精细可分级能力和鲁棒性是以用户解码后的图像质量下降为代价的，也就是说视频编码效率降低了。这主要是因为FGS的结构采用从基本层重构出的图像做运动参考帧，未能去除增强层的时域相关性。In recent years, network streaming video communication on the Internet has become more and more popular. Scalable coding is an effective method to solve the constant fluctuation of network bandwidth in Internet streaming video applications. The traditional SNR scalable coding scheme is also layered, but the bit rate of each layer is pre-determined during coding, so it can only provide fixed grading or rough scalability, and cannot finely match the network. Immediate bandwidth, and the image quality transition is not smooth enough at the critical point where the code rate of the enhancement layer jumps. In order to obtain a finely scalable coded stream, the FGS coding scheme is adopted in the MPEG-4 standard. MPEG-4 FGS technology is based on video coding with scalable image quality, which compresses the original video sequence into two code streams - basic layer code stream and enhancement layer code stream. The base layer provides the lowest image quality acceptable to the user. The enhancement layer is coded with bit-plane technology to provide embedded SNR scalability. It allows the interception and transmission of any code rate for the code stream of the enhancement layer. The more code streams received by the user, the better the image quality, so it provides Very fine image quality gradability. In terms of robustness, since the bit rate of the base layer is lower than the minimum available bandwidth of the network, the probability of packet loss in the base layer stream is greatly reduced. It is well guaranteed, and different levels of error recovery mechanisms can be used for the enhanced layer code stream according to the specific requirements of the application. However, the image quality SNR fine gradability and robustness obtained by the FGS technology are at the cost of the image quality degradation after user decoding, that is to say, the video coding efficiency is reduced. This is mainly because the structure of FGS uses the image reconstructed from the base layer as the motion reference frame, which fails to remove the temporal correlation of the enhancement layer.

为提高FGS技术的编码效率，吴峰等人提出了一种PFGS(Progress FGS)技术。PFGS把原始序列压缩成一个基本层和多个(3-4)增强层，基本层编码使用的参考帧是从基本层重构的图像，而增强层编码使用的参考帧则是从一个或多个增强层中重构的图像。由于从增强层中重构的图像质量高于从基本层中重构的图像质量，因此使用增强层重构图像做参考帧提高了预测精度，即提高了编码效率。PFGS的优点是在提高编码效率的同时兼顾了码流的抗差错复原能力。其缺点是实现结构复杂，计算复杂度高，需要花费更多的物理缓存和更多的计算时间。In order to improve the coding efficiency of FGS technology, Wu Feng et al. proposed a PFGS (Progress FGS) technology. PFGS compresses the original sequence into a base layer and multiple (3-4) enhancement layers. The reference frame used for base layer coding is the image reconstructed from the base layer, while the reference frame used for enhancement layer coding is from one or more The reconstructed image in the enhancement layer. Since the quality of the image reconstructed from the enhancement layer is higher than that of the image reconstructed from the base layer, using the reconstructed image of the enhancement layer as a reference frame improves the prediction accuracy, that is, improves the coding efficiency. The advantage of PFGS is that it takes into account the anti-error recovery capability of the code stream while improving the coding efficiency. Its disadvantage is that the implementation structure is complex, the calculation complexity is high, and more physical caches and more calculation time are required.

FGS技术视频编码出的码流仅提供了的SNR精细可分级能力。然而，在不同应用场合下，可能需要不同的可分级能力，譬如时域可分级、空域可分级以及混合时空域可分级能力。因此，需要把FGS结构仅提供的SNR精细可分级扩展到时空域，得到一种混合SNR与时空域的FGSST结构(FGS spatial temporal)，从而实现混合SNR+空域+时域的精细可分级能力。由于FGSST结构由FGS结构扩展而来，因此FGSST结构继承了FGS结构的缺陷，即编码效率低。因此，需要找到一种改进的FGS结构，该结构不仅要能提高编码效率，而且结构实现必须简单，这样才能使得扩展到时空域后得到的改进FGSST结构不会因为过于复杂的结构而难于实用。The bit stream encoded by FGS technology video only provides the SNR fine grading capability. However, in different application scenarios, different scalability capabilities may be required, such as time-domain scalability, space-domain scalability, and mixed time-space domain scalability. Therefore, it is necessary to extend the SNR fine-scale scalability provided by the FGS structure to the time-space domain, and obtain a FGSST structure (FGS spatial temporal) that mixes SNR and time-space domains, so as to realize the fine-scale scalability of mixed SNR+space domain+time domain. Since the FGSST structure is extended from the FGS structure, the FGSST structure inherits the defect of the FGS structure, that is, the coding efficiency is low. Therefore, it is necessary to find an improved FGS structure. This structure must not only improve the coding efficiency, but also be simple in structure implementation, so that the improved FGSST structure obtained after extending to the space-time domain will not be difficult to be practical due to the overly complex structure.

发明内容Contents of the invention

本发明的目的是在于提供一种改进的FGS视频编码方法及其编解码器，能为网络流视频传输提供多种可选择的比特流，且这些可选择的比特流都具有SNR精细可分级的能力，用户可以根据自身网络的状况获取网络容许的最好视频质量。The purpose of the present invention is to provide an improved FGS video coding method and its codec, which can provide multiple selectable bit streams for network stream video transmission, and these selectable bit streams all have fine and scalable SNR Ability, users can obtain the best video quality allowed by the network according to their own network conditions.

为达到上述目的，本发明的构思如下：For achieving the above object, design of the present invention is as follows:

首先，本发明提出一种改进的基于增强层做运动补偿的单环MC+FGS结构，相对于原始FGS结构，改进的单环MC+FGS结构具有更高的编码效率；相对于PFGS，改进的单环MC+FGS结构更为简单，物理开销更小，计算时间更短。原始FGS结构在编码过程中使用从基本层重构的图像作为参考帧，由于基本层图像较增强层图像质量差，因此运动预测不够精确，从而导致编码效率较低。很明显，若使用重构出的增强层图像作为运动参考，编码效率肯定会得到较大提高，本发明提出的一种单环MC+FGS的结构就是基于这种设计思想的。MC+FGS把原始视频压缩成一个基本层码流和一个增强层码流，基本层码流和增强层码流的编码都采用从扩展的基本层(基本层+增强层)重构的图像(基本层码流重构的基本图像+增强层码流解码后的残差图像)作运动参考。相对于PFGS，MC+FGS的编码效率较PFGS更高。但MC+FGS结构的一个缺陷是：在解码时，若用户网络的带宽不足于使用户接收到所有的增强层，MC+FGS结构比PFGS更容易导致图像质量弱化，且这种弱化会继续影响到给后续帧的预测参考，直到当前GOP的结束，从而产生预测漂移(drift)。为了解决该问题，本发明对此采取了一些措施来抑制该结构的缺陷。First of all, the present invention proposes an improved single-loop MC+FGS structure based on the enhancement layer for motion compensation. Compared with the original FGS structure, the improved single-loop MC+FGS structure has higher coding efficiency; compared with PFGS, the improved The structure of single-loop MC+FGS is simpler, the physical overhead is smaller, and the calculation time is shorter. The original FGS structure uses the image reconstructed from the base layer as a reference frame in the coding process. Since the quality of the base layer image is poorer than that of the enhancement layer image, the motion prediction is not accurate enough, resulting in low coding efficiency. Obviously, if the reconstructed enhancement layer image is used as a motion reference, the coding efficiency will definitely be greatly improved. A single-loop MC+FGS structure proposed by the present invention is based on this design idea. MC+FGS compresses the original video into a base layer code stream and an enhancement layer code stream. Both the base layer code stream and the enhancement layer code stream are encoded using images reconstructed from the extended base layer (base layer + enhancement layer) ( The basic image reconstructed from the base layer code stream + the residual image after decoding the enhancement layer code stream) is used as a motion reference. Compared with PFGS, the coding efficiency of MC+FGS is higher than that of PFGS. However, a defect of the MC+FGS structure is that when decoding, if the bandwidth of the user network is not enough to allow the user to receive all the enhancement layers, the MC+FGS structure is more likely to cause image quality degradation than PFGS, and this weakening will continue to affect to the prediction reference for subsequent frames until the end of the current GOP, resulting in prediction drift. In order to solve this problem, the present invention takes some measures to suppress the defects of this structure.

其次，本发明在提出的改进MC+FGS结构的基础上，把MC+FGS结构扩展到时空域，从而获得混合MC+FGSST结构。MC+FGSST结构不仅获得较高的编码效率，同时提供了混合时域+空域+SNR的精细可分级能力，用户可以根据自己网络带宽的具体情况选择性地配置以获得所需的如更好的图像质量、更高的图像分辨率(空域)、更高的视频帧率(时域)等服务。Secondly, on the basis of the proposed improved MC+FGS structure, the present invention extends the MC+FGS structure to the space-time domain, thereby obtaining a mixed MC+FGSST structure. The MC+FGSST structure not only obtains higher coding efficiency, but also provides fine and scalable capabilities of mixed time domain + air domain + SNR. Users can selectively configure according to the specific conditions of their own network bandwidth to obtain what they need, such as better Image quality, higher image resolution (spatial domain), higher video frame rate (temporal domain) and other services.

最后，本发明为所提出的MC+FGSST结构设计了编解码器，并为编码器中如何确定需要多少个比特平面数来做运动补偿而开发了一种新的算法，该算法在诸如实时编码应用场合下具有非常好的效果。Finally, the present invention designs a codec for the proposed MC+FGSST structure, and develops a new algorithm for how to determine how many bit planes are needed for motion compensation in the coder. It has a very good effect in the application occasion.

根据上述的发明构思，本发明采用下述技术方案：According to above-mentioned inventive design, the present invention adopts following technical scheme:

一种改进的FGS视频编码方法，基于MPEG-4建议的对基本层进行了运动补偿的FGS结构，其特征在于对增强层也作运动补偿，构成单环MC+FGS结构；并采用下述三项措施抑制所述单环MC+FGS结构的缺陷：An improved FGS video coding method, based on the FGS structure that the base layer has been motion compensated based on the MPEG-4 suggestion, is characterized in that the enhancement layer is also motion compensated to form a single-loop MC+FGS structure; and adopts the following three One measure suppresses the defects of the monocyclic MC+FGS structure:

1)为控制低比特率时解码端产生的预测酒漂移，对会影响到预测精度的I帧和P帧，编码时用于运动参考的比特平面数为2～3个比特平面，对不会影响到后续运动参数精度的B帧，编码时使用尽可能多的比特平面用于运动参考，以提高编码效率；1) In order to control the prediction wine drift generated by the decoder at low bit rate, for the I frame and P frame that will affect the prediction accuracy, the number of bit planes used for motion reference during encoding is 2 to 3 bit planes. For B frames that affect the accuracy of subsequent motion parameters, use as many bit planes as possible for motion reference during encoding to improve encoding efficiency;

2)当增强层发生丢包或差错时，为降低其对基本层图像质量的影响，在所述单环MC+FGS结构中添加一帧存储器，用来保存从基本层重构出的图像，以获得一个等同于原始FGS结构中的保守图像质量；2) When packet loss or error occurs in the enhancement layer, in order to reduce its impact on the image quality of the base layer, a frame memory is added in the single-ring MC+FGS structure to save the image reconstructed from the base layer, to obtain a conservative image quality equivalent to that in the original FGS structure;

3)基本层视具体情况，选择性地使用增强层重构图像，或基本层重构图像作运动参考，以提高编码码流的抗差错能力；对于较可性靠的网络，如局域网，则基本层选用增强层重构图像作运动参考，以提高编码效率；对一般网络，如Internet，则基本混合使用基本层重构图像和增强层重构图像作运动参考。3) The base layer selectively uses the enhancement layer to reconstruct the image, or the base layer reconstructs the image as a motion reference, depending on the specific situation, to improve the error resistance of the coded stream; for a more reliable network, such as a local area network, then The base layer uses the reconstructed image of the enhancement layer as motion reference to improve coding efficiency; for general networks, such as the Internet, the reconstructed image of the base layer and the reconstructed image of the enhancement layer are basically mixed for motion reference.

在上述的单环MC+FGS结构的基础上，将其结构扩展到时空域，构成MC+FGSST结构，同时获得时域+空域+SNR的精细可分级；其时域可分级是通过下述两种方式中任一种来获得：①引入一个新的独立时域FGST增强层，提供时域FGST增强帧的比特流；②把原FGS层修改为FGST共享层，使原FGS层只提供SNR增强的单一比特流扩展到包含两种比特流的混合比特流。这两种比特流分别是为FGS帧提供SNR增强的比特流和为FGST帧提供SNR增强的比特流；对于空域可分级是引入一个新的FGSS空域增强层。On the basis of the above-mentioned single-loop MC+FGS structure, its structure is extended to the time-space domain to form the MC+FGSST structure, and at the same time, the fine gradability of time domain + space domain + SNR is obtained; the time-domain gradability is achieved through the following two It can be obtained by any of the following methods: ①Introduce a new independent time-domain FGST enhancement layer to provide the bit stream of the time-domain FGST enhanced frame; ②Modify the original FGS layer to the FGST shared layer, so that the original FGS layer only provides SNR enhancement A single bitstream is extended to a mixed bitstream containing two bitstreams. These two bit streams provide SNR enhanced bit streams for FGS frames and SNR enhanced bit streams for FGST frames respectively; for the spatial domain scalability, a new FGSS spatial domain enhancement layer is introduced.

一种采用上述改进的视频编码方法的编解码器，包括编码器和解码器，其特征在于编码器由三个子编码器构成，三个子编码器共享大多数模块；三个子编码器是：A kind of codec adopting above-mentioned improved video encoding method, comprises encoder and decoder, is characterized in that encoder is made of three sub-encoders, and three sub-encoders share most modules; Three sub-encoders are:

1.本层编码器：采用传统的基于运动补偿的DCT技术编码，它把低空域分辨率的原始帧与低空域分辨率的重建视频帧的残差编码成基本层比特流；1. Encoder of this layer: adopts traditional DCT technology encoding based on motion compensation, which encodes the residual of the original frame with low spatial resolution and the reconstructed video frame with low spatial resolution into the base layer bit stream;

2.时域FGST增强层子编码器：采用比特平面技术来编码运动补偿后的DCT残差系数，残差系统由两部分组成：一是FGS帧的用于SNR增强的DCT残差系数，二是FGST帧的DCT残差系数；2. Time-domain FGST enhancement layer sub-encoder: use bit-plane technology to encode the DCT residual coefficients after motion compensation. The residual system consists of two parts: one is the DCT residual coefficients of the FGS frame for SNR enhancement, and the other is is the DCT residual coefficient of the FGST frame;

3.空域FGSS增强层编码器：采用比特平面技术编码运动补偿后的FGSS帧的DCT残差系数；3. Space-domain FGSS enhancement layer encoder: use bit-plane technology to encode the DCT residual coefficients of the motion-compensated FGSS frame;

编码器中各符号定义如下：The symbols in the encoder are defined as follows:

LSPi：低空域分辨率的运动预测图像，包括FGS帧的预测图像和FGST帧的预测图像LSPi: Low spatial resolution motion prediction images, including prediction images of FGS frames and prediction images of FGST frames

LSd：原始低空域分辨率的图像与LSPi的残差图像LSd: The original low spatial resolution image and the residual image of LSPi

LSD：LSd经DCT变换后得到的DCT残差系数LSD: DCT residual coefficient obtained after LSd is transformed by DCT

LSTD：低空域分辨率的FGST帧的DCT残差系数LSTD: DCT residual coefficients for low spatial resolution FGST frames

LSBD：低空域分辨率的FGS帧的DCT残差系数LSBD: DCT residual coefficients for low spatial resolution FGS frames

LSBDR：从基本层编码码流中重建的低空域分辨率的残差图像LSBDR: Low spatial resolution residual image reconstructed from the base layer coded code stream

LSBR：重建的用于下一个低空域分辨率的FGS帧的预测参考图像LSBR: Reconstructed prediction reference image for the next low spatial resolution FGS frame

LSTR：重建的用于下一个低空域分辨率的FGST帧的预测参考图像LSTR: Reconstructed prediction reference image for the next low spatial resolution FGST frame

HSPi：高空域分辨率的运动预测图像HSPi: High Spatial Resolution Motion Prediction Imagery

HSPd：原始高空域分辨率的图像与HSPi的残差图像HSPd: original high spatial resolution image and HSPi residual image

HSPD：HSPd经DCT变换后得到的高空域分辨率图像的DCT残差系数HSPD: DCT residual coefficient of the high spatial resolution image obtained by HSPd after DCT transformation

HSBR：重建的用于下一个高空域分辨率的FGSS帧的预测参考图像HSBR: Reconstructed prediction reference image for next high spatial resolution FGSS frame

BMVs：FGS帧的基本层运动矢量BMVs: Base layer motion vectors for FGS frames

TMVs：FGST帧的增强层运动矢量TMVs: Enhancement layer motion vectors for FGST frames

SMVs：FGSS帧的增强层运动矢量SMVs: enhancement layer motion vectors for FGSS frames

解码器的设计基于编码器的结构，它与编码器对应使用；The design of the decoder is based on the structure of the encoder, which is used correspondingly with the encoder;

解码器中的各符号定义如下：The symbols in the decoder are defined as follows:

HSRD：解码后用于显示的高空域分辨率FGSS帧的残差图像HSRD: Residual images of high spatial resolution FGSS frames after decoding for display

LSRD：解码后用于显示的低空域分辨率FGS帧的SNR增强的残差图像LSRD: SNR-enhanced residual image of low spatial resolution FGS frames after decoding for display

LSTRD：解码后用于显示的低空域分辨率FGST帧的残差图像LSTRD: Residual images of low spatial resolution FGST frames after decoding for display

HSRPD：解码后用于运动参考的高空域分辨率FGSS帧的残差图像HSRPD: Residual images of high spatial resolution FGSS frames after decoding for motion reference

LSPRD：解码后用于运动参考的低空域分辨率FGS帧的SNR增强的残差图像LSPRD: SNR-Enhanced Residual Images of Low Spatial Resolution FGS Frames After Decoding for Motion Reference

LSTPRD：解码后用于运动参考的低空域分辨率FGST帧的残差图像LSTPRD: Residual images of low spatial resolution FGST frames after decoding for motion reference

SMVs：FGSS帧的增强层运动矢量。SMVs: Enhancement layer motion vectors for FGSS frames.

本发明与现有技术相比，具有如下显而易见的突出实质性特点和显著优点：本发明提供的改进的FGS视频编码方法基于FGS结构，对增强层也作运动补偿，构成单环MC+FGS结构，提高了运动参考的精确度和编码效率；在单环MC+FGS的结构基础上，把它结构扩展到时空域，得到MC+FGSST结构，它在获得混合时域+空域+SNR的精细可分级能力的同时，还获得较高的编码效率。本发明还为MC+FGSST结构设计了一套编码器，并为编码器中如何确定用于运动补偿的比特平面数开发一种新的算法。用户可以根据自身网络的状况获取网络容许的最好视频质量。Compared with the prior art, the present invention has the following obvious outstanding substantive features and significant advantages: the improved FGS video coding method provided by the present invention is based on the FGS structure, and motion compensation is also performed on the enhancement layer, forming a single-loop MC+FGS structure , which improves the accuracy and coding efficiency of the motion reference; on the basis of the structure of the single-loop MC+FGS, its structure is extended to the space-time domain, and the MC+FGSST structure is obtained. In addition to grading capability, higher coding efficiency is obtained. The invention also designs a set of encoder for MC+FGSST structure, and develops a new algorithm for how to determine the number of bit planes used for motion compensation in the encoder. Users can obtain the best video quality allowed by the network according to their own network conditions.

附图说明Description of drawings

图1是单环MC+FGS结构与原始FGS结构的对比，图中图(a)为FGS原始的结构图，图中图(b)为单环MC+FGS结构图。Figure 1 is a comparison between the single-ring MC+FGS structure and the original FGS structure. Figure (a) in the figure is the original structure diagram of FGS, and figure (b) is the structure diagram of the single-ring MC+FGS.

图2是单环MC+FGS结构的编码器框图。Fig. 2 is a block diagram of an encoder with a single-loop MC+FGS structure.

图3是改进的单环MC+FGS结构框图。Figure 3 is a block diagram of the improved single-ring MC+FGS structure.

图4是MC+FGSST结构原理图。Figure 4 is a schematic diagram of the structure of MC+FGSST.

图5是MC+FGSST结构的CODEC实现原理图，图中的图(a)为编码器框图，图(b)为解码器框图。Figure 5 is a schematic diagram of the CODEC implementation of the MC+FGSST structure, in which Figure (a) is a block diagram of an encoder, and Figure (b) is a block diagram of a decoder.

图6是仿真实验的编码结构图。Figure 6 is a coding structure diagram of the simulation experiment.

图7是MC+FGSST结构和FGSST结构的PSNR值的比较，图中图(a)是两种结构的FGS帧PSNR的比较，图(b)是两种结构的FGST帧PSNR的比较，图(c)两种结构的FGSS帧PSNR的比较是。Figure 7 is a comparison of the PSNR values of the MC+FGSST structure and the FGSST structure. In the figure (a) is a comparison of the FGS frame PSNR of the two structures, and figure (b) is a comparison of the FGST frame PSNR of the two structures, and the figure ( c) The comparison of FGSS frame PSNR of the two structures is.

具体实施方式Detailed ways

以下结合附图分三节分析说明本发明的基于增强层做运动补偿的MC+FGSST结构的视频编码方法。在下述的表述中，FGS帧代表在SNR上获得增强的非时域增强帧，FGST帧代表在时域分辨率上获得增强的帧，FGSS帧代表在空域分辨率上获得增强的帧，FGSST帧代表同时在时域和空域分辨率上获得增强的帧。The MC+FGSST structure video coding method based on the enhancement layer for motion compensation of the present invention will be analyzed and described below in three sections in conjunction with the accompanying drawings. In the following expressions, the FGS frame represents a non-temporal enhanced frame enhanced on the SNR, the FGST frame represents the enhanced frame in the temporal resolution, the FGSS frame represents the enhanced frame in the spatial resolution, and the FGSST frame Represents frames enhanced at both temporal and spatial resolution.

1.单环MC+FGS结构1. Single ring MC+FGS structure

附图1(a)是MPEG-4建议的原始FGS结构，图中灰度部分代表实际传输被接收到用于增强的部分。从图中可以看出原始FGS只对基本层进行了运动补偿，去除了基本层在时域的相关性，而对于增强层没有采用任何去除时域相关性的处理，这势必会降低压缩效率。为此本文提出对增强层也做运动补偿的单环MC+FGS结构，其思想就是使用更精确的运动参考来得到更精确的预测，以得到更精确的运动补偿，从而提高编码效率。附图1(b)是MC+FGS结构原理图，基本层和增强层(称之为扩展基本层)都被用来对后面基本层的帧进行运动预测和参考。这种结构的优点是：1)由于使用扩展基本层作为运动参考，提高了运动参考的精确度和编码效率，因此可弥补原始FGS会降低编码效率的缺陷；2)复杂度较低，编码器结构简单，实现容易。图2为单环MC+FGS结构的编码器框图。由于基本层码流和增强层码流都使用扩展基本层做运动参考和运动补偿，因此该单环结构也有缺点：1)在解码时，增强层发生的丢包会影响到基本层帧的图像质量。2)在解码时，当用户网络带宽容许的比特率小于扩展基本层比特率的时会导致图像质量弱化，且这种弱化会继续影响到给后续帧的预测参考，直到当前GOP的结束，从而产生预测漂移(drift)。Accompanying drawing 1 (a) is the original FGS structure suggested by MPEG-4, and the gray-scale part in the figure represents the part that is actually transmitted and received for enhancement. It can be seen from the figure that the original FGS only performs motion compensation on the base layer, and removes the correlation in the time domain of the base layer, but does not use any processing to remove the correlation in the time domain for the enhancement layer, which will inevitably reduce the compression efficiency. For this reason, this paper proposes a single-loop MC+FGS structure that also performs motion compensation on the enhancement layer. The idea is to use more accurate motion references to obtain more accurate predictions to obtain more accurate motion compensation, thereby improving coding efficiency. Accompanying drawing 1 (b) is the schematic diagram of the MC+FGS structure, the base layer and the enhancement layer (called the extended base layer) are used for motion prediction and reference to the frames of the following base layer. The advantages of this structure are: 1) Since the extended base layer is used as the motion reference, the accuracy and coding efficiency of the motion reference are improved, so it can make up for the defect that the original FGS will reduce the coding efficiency; 2) the complexity is low, and the encoder The structure is simple and the realization is easy. Fig. 2 is a block diagram of an encoder with a single-loop MC+FGS structure. Since both the base layer code stream and the enhancement layer code stream use the extended base layer for motion reference and motion compensation, this single-loop structure also has disadvantages: 1) During decoding, packet loss in the enhancement layer will affect the image of the base layer frame quality. 2) When decoding, when the bit rate allowed by the user network bandwidth is lower than the bit rate of the extended base layer, the image quality will be weakened, and this weakening will continue to affect the prediction reference for subsequent frames until the end of the current GOP, thus Generate prediction drift (drift).

采取以下措施可抑制单环结构的缺陷：The defects of the monocyclic structure can be suppressed by taking the following measures:

●为控制低比特率时解码端产生的预测漂移，对会影响到预测精度的I帧和P帧，编码时其用于运动参考的比特平面数始终不宜过多(如2-3个比特平面)，适当地牺牲编码效率来换取较好的抗差错能力。对不会影响到后续运动参考精度的B帧，编码时使用尽可能多的比特平面用于运动参考以提高编码效率。●In order to control the prediction drift generated by the decoder when the bit rate is low, for the I frame and P frame that will affect the prediction accuracy, the number of bit planes used for motion reference during encoding should not be too much (such as 2-3 bit planes) ), appropriately sacrificing coding efficiency in exchange for better error resistance. For B frames that do not affect the accuracy of subsequent motion references, as many bit planes as possible are used for motion references during encoding to improve encoding efficiency.

●当增强层发生丢包或差错时，为降低其对基本层图像质量的影响，在单环结构中添加一帧存储器用来保存从基本层中重构出的图像，以获得一个等同于原始FGS结构中的保守图像质量，并可减小其对后面的预测参考所带来的差错累积的影响。这样即使在容许比特率很低时至少可以得到原始FGS的图像质量。●When packet loss or error occurs in the enhancement layer, in order to reduce its impact on the image quality of the base layer, a frame memory is added in the single-ring structure to save the reconstructed image from the base layer to obtain an image equivalent to the original The conservative image quality in the FGS structure can reduce its impact on the error accumulation caused by the subsequent prediction reference. In this way at least the image quality of the original FGS can be obtained even when the allowable bit rate is very low.

●采用如附图3所示的改进单环MC+FGS结构，基本层视具体情况选择性的使用增强层重构图像或如图中虚线所示基本层重构图像做运动参考。对于易发生差错的网络如无线网络，则基本层选用从基本层重构的图像做运动参考，以提高编码码流的抗差错能力；而对于较可靠的网络如局域网，则基本层选用从增强层重构的图像做运动参考，以提高编码效率；对于一般网络如Internet，则基本层混合使用从基本层重构的图像和从增强层重构的图像做运动参考，具体如何混合使用本发明不作深入探讨。●Using the improved single-loop MC+FGS structure shown in Figure 3, the base layer selectively uses the reconstructed image of the enhancement layer or the reconstructed image of the base layer as shown by the dotted line in the figure as a motion reference depending on the specific situation. For error-prone networks such as wireless networks, the base layer uses images reconstructed from the base layer as motion references to improve the error resistance of the coded stream; while for more reliable networks such as local area networks, the base layer uses images from enhanced The image reconstructed from the base layer is used as a motion reference to improve coding efficiency; for a general network such as the Internet, the base layer mixes the image reconstructed from the base layer and the image reconstructed from the enhancement layer as a motion reference, specifically how to mix and use the present invention No in-depth discussion.

上述三种抑制单环MC+FGS结构抗差错能力较弱缺陷的措施是以牺牲部分编码效率和计算复杂度为代价的，可根据具体应用环境选择使用，以满足用户的需求。The above three measures to suppress the weak error resistance of the single-loop MC+FGS structure are at the cost of sacrificing part of the coding efficiency and computational complexity, and can be selected according to the specific application environment to meet the needs of users.

为验证单环MC+FGS的编码性能，进行以下仿真实验，并与MPEG-4的FGS方案(即表中原始FGS)进行了实验比较。实验的测试序列为Foreman、Coastguard、Akiyo(CIF格式)序列，采用TM5速率控制方式，帧率为10f/s。GOP结构为NI＝1、N_P＝4、N_B＝12。设定基本层比特率R_BL＝128kbits/s，采用MPEG-4视频压缩编码方案。增强层的码率可以根据网络可用带宽进行任意码率的截取，本实验中分别截取增强层码率64kbits/s、172kbits/s、…直到872kbits/s。表1所列为原始FGS结构与单环MC+FGS结构中三个序列的第5帧随码流比特率变化的Y亮度分量的PSNR值，需指出的是此PSNR的值是图像质量最好时的PSNR(非重构出的基本层图像质量)，即当增强层所有码流全部被接收并被正确解码而得出的图像的PSNR值。由表中的实验结果可知，本文方案的PSNR比MPEG-4 FGS方案，对Forman可提高1.23～5.83dB，对Coastguard可提高0.66～3.05dB，对Akiyo可提高4.43～9.64dB。In order to verify the coding performance of single-loop MC+FGS, the following simulation experiments are carried out, and the experimental comparison is made with the FGS scheme of MPEG-4 (ie, the original FGS in the table). The test sequence of the experiment is Foreman, Coastguard, Akiyo (CIF format) sequence, using TM5 rate control mode, the frame rate is 10f/s. The GOP structure is NI=1, N _P =4, and N _B =12. Set the base layer bit rate R _BL =128kbits/s, and adopt the MPEG-4 video compression coding scheme. The code rate of the enhancement layer can be intercepted at any code rate according to the available bandwidth of the network. In this experiment, the code rates of the enhancement layer are intercepted at 64kbits/s, 172kbits/s, ... until 872kbits/s. Table 1 lists the PSNR value of the Y luminance component of the fifth frame of the three sequences in the original FGS structure and the single-loop MC+FGS structure as the bit rate of the code stream changes. It should be pointed out that this PSNR value is the best image quality PSNR (non-reconstructed base layer image quality), that is, the PSNR value of the image obtained when all code streams of the enhancement layer are received and correctly decoded. It can be seen from the experimental results in the table that the PSNR of the proposed scheme can be improved by 1.23-5.83dB for Forman, 0.66-3.05dB for Coastguard, and 4.43-9.64dB for Akiyo compared with the MPEG-4 FGS scheme.

表1不同码率情况下Forman、Coastguard、Akiyo序列Y亮度分量的PSNR(dB)值比特率(kbits/s) Foreman Coastguard Akiyo 原始FGS MC+FGS 原始FGS MC+FGS 原始FGS MC+FGS 192 37.40 42.02 32.34 33.27 43.24 52.04 300 40.31 43.87 33.24 34.57 46.88 56.52 450 41.36 47.20 35.22 36.95 47.95 56.91 600 43.04 47.12 36.42 37.08 49.70 56.91 1000 46.39 47.62 38.84 41.89 52.54 56.91 Table 1 PSNR (dB) values of Forman, Coastguard, and Akiyo sequence Y luminance components at different code rates bit rate (kbits/s) Foreman Coastguard Akiyo Raw FGS MC+FGS Raw FGS MC+FGS Raw FGS MC+FGS 192 37.40 42.02 32.34 33.27 43.24 52.04 300 40.31 43.87 33.24 34.57 46.88 56.52 450 41.36 47.20 35.22 36.95 47.95 56.91 600 43.04 47.12 36.42 37.08 49.70 56.91 1000 46.39 47.62 38.84 41.89 52.54 56.91

2.MC+FGSST结构2. MC+FGSST structure

MC+FGS结构与FGS一样，仅能提供SNR精细可分级。为了使MC+FGS结构同时获得时域+空域+SNR的精细可分级，必须引入更多的增强层。时域可分级可通过两种方式来获得：1)引入一个新的独立时域FGST增强层，提供时域FGST增强帧的比特流；2)把原FGS层修改为FGST共享层，使原FGS层只提供SNR增强的单一比特流扩展到包含两种比特流的混合比特流，这两种比特流分别是为FGS帧提供SNR增强的比特流和为FGST帧提供SNR增强的比特流。引入一个独立增强层为服务器和解码端的解码提供了非常好的灵活性，它无需解码所有的FGS帧以作FGST帧解码的预测参考，因此FGS增强层图像质量和时域FGST增强帧两者之间是相互独立的，彼此不会互相影响。但它需要多传输一个增强层，这会提高结构实现的复杂度，且服务器要为客户多提供一个时域增强层比特流的服务，加重了服务器的负担。对于空域可分级，必须引入一个新的FGSS空域增强层，因为这个增强层与其他层的空域分辨率不同。考虑到整个MC+FGSST结构实现的复杂度，我们通过使用把FGS层修改成FGST共享层来实现时域可分级。这样，MC+FGSST的结构如附图4所示，整个结构分为三层。基本层提供低SNR、低帧率、低空域分辨率的图像；第一个增强层同时提供增强的SNR和具有SNR精细可分级能力FGST帧的图像；第二个增强层提供增具有SNR精细可分级能力的FGSS帧的图像，且如附图4中虚线部分所示，该增强层可被进一步选择性地配置，使得该层同时还提供具有SNR精细可分级能力的FGSST帧的图像。MC+FGSST结构不但获得混合时域+空域+SNR的精细可分级，还克服了原有FGS结构编码效率低的缺陷，获得较高的编码效率。不仅如此，该MC+FGSST结构采用了精简化的设计思路，因此它实现起来并不复杂，具有很好的实用性。The structure of MC+FGS is the same as that of FGS, which can only provide SNR fine grading. In order to make the MC+FGS structure obtain the fine scalability of temporal+spatial+SNR at the same time, more enhancement layers must be introduced. Time-domain scalability can be obtained in two ways: 1) Introducing a new independent time-domain FGST enhancement layer to provide the bit stream of time-domain FGST enhanced frames; 2) Modifying the original FGS layer to a FGST shared layer, making the original FGS A single bitstream providing only SNR enhancement is extended to a mixed bitstream containing two bitstreams, namely a bitstream providing SNR enhancement for FGS frames and a bitstream providing SNR enhancement for FGST frames. The introduction of an independent enhancement layer provides very good flexibility for the decoding of the server and the decoder. It does not need to decode all FGS frames as a prediction reference for FGST frame decoding. Therefore, the image quality of the FGS enhancement layer and the time-domain FGST enhancement frame are independent of each other and do not affect each other. But it needs to transmit one more enhancement layer, which will increase the complexity of the structure implementation, and the server needs to provide the client with one more time-domain enhancement layer bit stream service, which increases the burden on the server. For spatial scalability, a new FGSS spatial enhancement layer has to be introduced because this enhancement layer has a different spatial resolution than the other layers. Considering the complexity of the entire MC+FGSST structure implementation, we implement time-domain scalability by modifying the FGS layer into an FGST shared layer. In this way, the structure of MC+FGSST is shown in Figure 4, and the whole structure is divided into three layers. The basic layer provides images with low SNR, low frame rate, and low spatial resolution; the first enhancement layer provides images with enhanced SNR and FGST frames with SNR fine scalability; the second enhancement layer provides enhanced SNR fine scalability Scalable images of FGSS frames, and as shown by the dotted line in Figure 4, the enhancement layer can be further selectively configured so that this layer also provides images of FGSST frames with SNR fine scalability. The MC+FGSST structure not only obtains the fine gradability of mixed time domain + space domain + SNR, but also overcomes the defect of low coding efficiency of the original FGS structure and obtains higher coding efficiency. Not only that, the MC+FGSST structure adopts a simplified design idea, so it is not complicated to implement and has good practicability.

3.MC+FGSST结构的CODEC设计3. CODEC design of MC+FGSST structure

根据附图4中MC+FGSST结构图，我们为MC+FGSST设计了如附图5所示的CODEC。图中编码器中各符号定义如下：According to the MC+FGSST structure diagram in Figure 4, we designed the CODEC shown in Figure 5 for MC+FGSST. The symbols in the encoder in the figure are defined as follows:

编码器包含了三个子编码器，由附图5(a)可以看出三个子编码器共享了大多数模块。基本层编码器采用传统的基于运动补偿的DCT技术编码，它把低空域分辨率的原始视频帧与低空域分辨率的重建视频帧的残差编码成基本层比特流。实践证明，比特平面编码效率比run-level更高，且为编码码流提供了嵌入了精细可分级能力。因此，时域FGST增强层子编码器采用比特平面技术来编码运动补偿后的DCT残差系数，残差系数由两部分组成：一是FGS帧的用于SNR增强的DCT残差系数，二是FGST帧的DCT残差系数。比特平面技术为FGST帧提供了高的编码效率和SNR精细可分级能力。空域FGSS增强层编码器同样采用比特平面技术来编码运动补偿后的FGSS帧的DCT残差系数，它同样为FGSS帧提供了高的编码效率和SNR精细可分级能力。由于三个子编码器共用了大多数模块，使得我们设计的编码器节省了不少物理开销，降低了成本。采用MC+FGSST结构的编码出的两个增强层码流分别提供空域可分级、时域+SNR的精细可分级能力，用户可以根据自己网络带宽的具体情况选择性配置成接收一个增强层比特流或者同时接收两个增强层比特流，并与必须接收的基本层比特流组合起来解码以获得用户所需的服务。The encoder includes three sub-encoders, and it can be seen from Figure 5(a) that the three sub-encoders share most of the modules. The base layer encoder adopts the traditional DCT technique based on motion compensation to encode the residual of the low spatial resolution original video frame and the low spatial resolution reconstructed video frame into the base layer bit stream. Practice has proved that the coding efficiency of bit plane is higher than that of run-level, and it provides embedded fine scalability capability for coded stream. Therefore, the time-domain FGST enhancement layer sub-encoder uses bit-plane technology to encode the motion-compensated DCT residual coefficients. The residual coefficients are composed of two parts: one is the DCT residual coefficients of the FGS frame for SNR enhancement, and the other is The DCT residual coefficients of the FGST frame. Bit-plane technology provides high coding efficiency and SNR fine-scale scalability for FGST frames. The space-domain FGSS enhancement layer encoder also uses bit-plane technology to encode the DCT residual coefficients of the motion-compensated FGSS frame, which also provides high coding efficiency and fine SNR scalability for the FGSS frame. Since the three sub-encoders share most of the modules, the encoder we designed saves a lot of physical overhead and reduces the cost. The two enhancement layer bit streams encoded by the MC+FGSST structure provide space-domain scalability and time-domain + SNR fine-scale scalability respectively. Users can selectively configure to receive an enhancement layer bit stream according to the specific conditions of their own network bandwidth. Or receive two enhancement layer bit streams at the same time, and combine and decode with the basic layer bit stream that must be received to obtain the service required by the user.

在附图5所示的编码器框图中，如何决定用于运动补偿的比特平面数是一个很关键的问题，如果使用的比特平面数过多，则在解码时当用户无法完全接收到所有的增强层中用于运动补偿的比特流数据，则解码图像质量会弱化，且这种弱化的图像会继续影响后续帧的运动预测精度，从而导致预测漂移(drift)。因此，若服务器提供服务对象的目标码率较低时，对会影响后续预测参考精度的I帧和P帧不宜使用过多的比特平面(我们推荐使用2-3个比特平面)；而对于B帧，则使用正常的比特平面数用于运动补偿，以提高编码效率。若服务器提供服务对象的目标码率较高时，则所有的I、P或B都使用正常的比特平面数用于运动补偿。现在一个关键的问题是如何确定正常的用于运动补偿的比特平面数。为此，我们开发了一套算法来解决这个技术难题，它工作的一个前提条件是我们已经知道了当前所有用户接入服务器的可用网络带宽，在编码器中，当前所有用户的可用带宽通过模块“带宽估计和可用带宽计算”来获得。该算法的另一个优点是：它不仅确定了用于运动补偿的比特平用户面个数N_bitplaneused，还确定了第N_bitplaneused+1个平面中可被用于运动补偿的块(block)的总数M_blockused。由于使用了更多的增强层比特流用于运动补偿，进一步提高了运动补偿的精度，因此进一步提高了编码效率。In the block diagram of the encoder shown in Figure 5, how to determine the number of bit-planes used for motion compensation is a key issue. If too many bit-planes are used, the user cannot fully receive all the If the bitstream data used for motion compensation in the enhancement layer is used, the quality of the decoded image will be weakened, and this weakened image will continue to affect the motion prediction accuracy of subsequent frames, thereby causing prediction drift (drift). Therefore, if the target code rate of the service object provided by the server is low, it is not appropriate to use too many bit planes for I frames and P frames that will affect the accuracy of subsequent prediction references (we recommend using 2-3 bit planes); and for B frame, the normal number of bit-planes is used for motion compensation to improve coding efficiency. If the target code rate of the service object provided by the server is relatively high, all I, P or B will use the normal number of bit planes for motion compensation. A key issue now is how to determine the normal number of bit-planes used for motion compensation. To this end, we have developed a set of algorithms to solve this technical problem. A prerequisite for its work is that we already know the available network bandwidth of all current users accessing the server. In the encoder, the current available bandwidth of all users passes through the module "Bandwidth Estimation and Available Bandwidth Calculation" to obtain. Another advantage of this algorithm is that it not only determines the number of bit planes N _bitplaneused used for motion compensation, but also determines the total number of blocks (blocks) that can be used for motion compensation in the N _bitplaneused +1 plane M _blockused . Since more enhancement layer bit streams are used for motion compensation, the accuracy of motion compensation is further improved, and thus the coding efficiency is further improved.

假设当前有N个用户同时接入流服务器，由“带宽估计和可用带宽计算”可知这N个用户的当前可用带宽为：{B₁、B₂、B₃、...B_N}，取所有N个用户的最小带宽B_min，B_min即为用于运动补偿的目标码率。Assuming that there are currently N users accessing the streaming server at the same time, the current available bandwidth of these N users can be known from "Bandwidth Estimation and Available Bandwidth Calculation" as: {B ₁ , B ₂ , B ₃ ,...B _N }, take The minimum bandwidth B _min of all N users, B _min is the target code rate for motion compensation.

算法：algorithm:

           
　　Nbits＝0；               //Nbits：当前比特平面编码熵编码的比特数

　　Nbitplaneused＝Nmax；    //Nbitplaneused：用于运动补偿的比特平面数

　　　Mblockused＝Nblock；     //Mblockused：最低比特平面中用于运动补偿的块数
　　T＝Min{B1、B2、B3、...BN}         //T：用于运动补偿的目标码率

　　for(i＝0；i＜Nmax；i++){        //Nmax：最大比特平面数

　　    for(j＝0；j＜Nblock；j++){  //Nblock：帧图像的块总数

　　　        if(Nbits＜(T-RBL)/f    //RBL：基本层码率，f.帧率

　　             Nbits+＝BlockBitplaneEncoder()；//BlockBitplaneEncoder()：返回值为编码比特平面

　　                                             层上的当前“块”所花费的比特数

　　         else{

　　             Mblockused＝j；

　　             Break；

　　         }

　　    }

　　    if(Nbits＜(B-RBL)/f){

　　         Nbitplaneused＝i；
        <!-- SIPO <DP n="10"> -->
        <dp n="d10"/>
　　    Break；
　　}
　　}

Nbits=0; //Nbits: the number of bits entropy-encoded by the current bit-plane encoding

Nbitplaneused=Nmax; //Nbitplaneused: the number of bit planes used for motion compensation

Mblockused=Nblock; //Mblockused: the number of blocks used for motion compensation in the lowest bit plane
T＝Min{B1, B2, B3,...BN} //T: Target bit rate for motion compensation

for(i=0; i<Nmax; i++){ //Nmax: maximum number of bit planes

for(j=0; j<Nblock; j++){ //Nblock: the total number of blocks in the frame image

if(Nbits<(T-RBL)/f //RBL: base layer code rate, f. frame rate

Nbits+＝BlockBitplaneEncoder(); //BlockBitplaneEncoder(): The return value is the coded bitplane

The number of bits spent by the current "block" on the layer

Else {

Mblockused=j;

Break;

}

}

if(Nbits<(B-RBL)/f){

Nbitplaneused=i;
        <!-- SIPO <DP n="10"> -->
        <dp n="d10"/>
Break;
}
}

该算法特别适用于如视频直播等需要实时编码的场合，对于目前常用的预先存储视频的编码，则可以把算法中所有用户的当前最小可用网络带宽初始化一个预定值，具体如何确定视具体应用场合及网络状况而定，这里不作深入探讨。This algorithm is especially suitable for occasions that require real-time encoding, such as live video broadcasting. For the currently commonly used pre-stored video encoding, the current minimum available network bandwidth of all users in the algorithm can be initialized to a predetermined value. How to determine it depends on the specific application. and network conditions, and will not be discussed in depth here.

为了验证MC+FGSST结构的编码性能，我们做了以下仿真实验。实验测试了Foreman和Coastguard两种视频序列的前24帧，实验测试环境如下：高时域分辨率：30帧/秒低时域分辨率：15帧/秒高空域分辨率：CIF(352×288) 低空域分辨率：QCIF(176×144)基本层比特率：R_BL＝75kbits/sMC+FGST增强层比特率：R_FGST＝150kbit/s其中90kbits/s用于FGS帧的SNR增强，In order to verify the coding performance of the MC+FGSST structure, we did the following simulation experiments. Experimentally tested the first 24 frames of Foreman and Coastguard video sequences. The experimental test environment is as follows: High temporal resolution: 30 frames/second Low temporal resolution: 15 frames/second High spatial resolution: CIF (352×288 ) Low spatial domain resolution: QCIF (176×144) base layer bit rate: R _BL =75kbits/sMC+FGST enhancement layer bit rate: R _FGST =150kbit/s where 90kbits/s is used for SNR enhancement of FGS frame,

60kbit/s用于FGST帧的SNR增强MC+FGSS增强层比特率：R_FGSS＝90kbits/s用于FGSS帧的SNR增强使用本发明提出的编码器进行编码，编码结构如附图6所示，编码后得到三层比特流。然后，再对编码得到的三层比特流使用本发明提出的对应编码器的解码器，解码后得到FGS帧、FGST帧及FGSS帧的PSNR值，并把这些PSNR值与原始的FGSST(由FGS结构扩展到时空域)结构解码得到的的PSNR值进行对比。实验结果如附图7所示，由图7(a)(b)(c)可看出，MC+FGSST结构的编码性能整体上明显优于FGSST结构。对于图7(b)中的Coastguard序列的第14帧，MC+FGSST结构性能反而不如FGSST结构的可能是由运动估计的自身误差引起的；对于图7(c)中的Foreman序列的第17帧，MC+FGSST结构性能反而不如FGSST结构的可能是由低空域分辨率图像上抽样后引起的预测失真引起的。60kbit/s is used for the SNR enhancement MC+FGSS enhancement layer bit rate of FGST frame: R _FGSS =90kbits/s is used for the SNR enhancement of FGSS frame and uses encoder that the present invention proposes to encode, and encoding structure is as shown in accompanying drawing 6, After encoding, a three-layer bit stream is obtained. Then, use the decoder of the corresponding coder that the present invention proposes to the three-layer bit stream that coding obtains again, obtain the PSNR value of FGS frame, FGST frame and FGSS frame after decoding, and these PSNR values and original FGSST (by FGSST) The structure is extended to the space-time domain) and the PSNR value obtained by structure decoding is compared. The experimental results are shown in Figure 7. It can be seen from Figure 7(a)(b)(c) that the coding performance of the MC+FGSST structure is significantly better than that of the FGSST structure as a whole. For the 14th frame of the Coastguard sequence in Figure 7(b), the performance of the MC+FGSST structure is not as good as that of the FGSST structure, which may be caused by the own error of motion estimation; for the 17th frame of the Foreman sequence in Figure 7(c) , the performance of the MC+FGSST structure is not as good as that of the FGSST structure, which may be caused by the prediction distortion caused by the low spatial resolution image upsampling.

Claims

1. An improved FGS video coding method, based on the FGS structure that the base layer has been motion compensated based on MPEG-4 suggestion, is characterized in that the enhancement layer is also motion compensated to form a single loop MC+FGS structure; and adopts the following The above three measures suppress the defects of the monocyclic MC+FGS structure:

1) In order to control the prediction wine drift generated by the decoder at low bit rate, for the I frame and P frame that will affect the prediction accuracy, the number of bit planes used for motion reference during encoding is 2 to 3 bit planes. For B frames that affect the accuracy of subsequent motion parameters, use as many bit planes as possible for motion reference during encoding to improve encoding efficiency;

2) When packet loss or error occurs in the enhancement layer, in order to reduce its impact on the image quality of the base layer, a frame memory is added in the single-ring MC+FGS structure to save the image reconstructed from the base layer, to obtain a conservative image quality equivalent to that in the original FGS structure;

3) The base layer selectively uses the enhancement layer to reconstruct the image, or the base layer reconstructs the image as a motion reference, depending on the specific situation, to improve the error resistance of the coded stream; for a more reliable network, such as a local area network, then The base layer uses the reconstructed image of the enhancement layer as motion reference to improve coding efficiency; for general networks, such as the Internet, the reconstructed image of the base layer and the reconstructed image of the enhancement layer are basically mixed for motion reference.

2. the improved FGS video coding method according to claim 1 is characterized in that on the basis of the described single-ring MC+FGS structure, its structure is extended to the space-time domain, constitutes MC+FGSST structure, obtains time simultaneously Domain + space domain + SNR fine scalability; its time domain scalability is obtained by either of the following two ways: ① Introduce a new independent time domain FGST enhancement layer to provide bit stream of time domain FGST enhanced frame ; ② Modify the original FGS layer to the FGST shared layer, so that the original FGS layer only provides SNR-enhanced single bit stream and expands to a mixed bit stream containing two bit streams. These two bit streams provide SNR enhanced bit streams for FGS frames and SNR enhanced bit streams for FGST frames respectively; for the spatial domain scalability, a new FGSS spatial domain enhancement layer is introduced.

3. A codec adopting the improved FGS video coding method of claim 1, comprising a coder and a decoder, is characterized in that the coder is made of three sub-coders, and the three sub-coders share most of the modules; The sub-encoders are:

1) Base layer encoder: adopt traditional motion compensation-based DCT technology encoding, which encodes the residual of low spatial resolution original frame and low spatial resolution reconstructed video frame into base layer bit stream;

2) Time-domain FGST enhancement layer sub-encoder: bit-plane technology is used to encode DCT residual coefficients after motion compensation. The residual system consists of two parts: one is the DCT residual coefficients of the FGS frame for SNR enhancement, and the other is the DCT residual coefficients for SNR enhancement of the FGS frame. is the DCT residual coefficient of the FGST frame;

3) Spatial FGSS enhancement layer encoder: use bit-plane technology to encode the DCT residual coefficients of the motion-compensated FGSS frame;

The symbols in the encoder are defined as follows:

LSPi: Low spatial resolution motion prediction images, including prediction images of FGS frames and prediction images of FGST frames

LSd: The original low spatial resolution image and the residual image of LSPi

LSD: DCT residual coefficient obtained after LSd is transformed by DCT

LSTD: DCT residual coefficients for low spatial resolution FGST frames

LSBD: DCT residual coefficients for low spatial resolution FGS frames

LSBDR: Low spatial resolution residual image reconstructed from the base layer coded code stream

LSBR: Reconstructed prediction reference image for the next low spatial resolution FGS frame

LSTR: Reconstructed prediction reference image for the next low spatial resolution FGST frame

HSPi: High Spatial Resolution Motion Prediction Imagery

HSPd: original high spatial resolution image and HSPi residual image

HSPD: DCT residual coefficient of the high spatial resolution image obtained by HSPd after DCT transformation

HSBR: Reconstructed prediction reference image for next high spatial resolution FGSS frame

BMVs: Base layer motion vectors for FGS frames

TMVs: Enhancement layer motion vectors for FGST frames

SMVs: enhancement layer motion vectors for FGSS frames

The design of the decoder is based on the structure of the encoder, which is used correspondingly with the encoder;

The symbols in the decoder are defined as follows:

HSRD: Residual images of high spatial resolution FGSS frames after decoding for display

LSRD: SNR-enhanced residual image of low spatial resolution FGS frames after decoding for display

LSTRD: Residual images of low spatial resolution FGST frames after decoding for display

HSRPD: Residual images of high spatial resolution FGSS frames after decoding for motion reference

LSPRD: SNR-Enhanced Residual Images of Low Spatial Resolution FGS Frames After Decoding for Motion Reference

LSTPRD: Residual images of low spatial resolution FGST frames after decoding for motion reference

BMVs: Base layer motion vectors for FGS frames

TMVs: Enhancement layer motion vectors for FGST frames

SMVs: Enhancement layer motion vectors for FGSS frames.