CN1204757C

CN1204757C - Stereo video stream coder/decoder and stereo video coding/decoding system

Info

Publication number: CN1204757C
Application number: CN 03116541
Authority: CN
Inventors: 张兆杨; 安平; 骆艳; 戏昌满
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2003-04-22
Filing date: 2003-04-22
Publication date: 2005-06-01
Anticipated expiration: 2023-04-22
Also published as: CN1450816A

Abstract

The invention relates to a stereoscopic video data compression technology. When the present invention encodes the binocular video streams collected by the stereo camera system, one of them is encoded according to the standards compatible with the MPEG series, and the other image is respectively used for parallax compensation prediction, joint parallax and motion compensation prediction, and at the decoding end. Coded transmission is performed by frame estimation and interpolation recovery method. Among them, the disparity estimation adopts the multi-level segmentation block estimation based on the Markov model, and the method of frame estimation and interpolation is to use the reference frame image restored at the decoding end and the corresponding disparity and motion vector, and use the method based on the frame estimation probability model Estimate and interpolate. The decoding end is divided into two stages of decoding. One stage is to decode only the main video stream to obtain a single video displayed on a common display device. The other stage is to decode all dual video streams. The restored stereoscopic video signal is synthesized by an autostereoscopic display to display a stereoscopic image. .

Description

A Stereoscopic Video Stream Encoder/Decoder and Its Stereoscopic Video Encoding/Decoding System

技术领域technical field

本发明涉及运动图像处理技术，具体涉及一种立体视频数据的编码/解码方法和装置。The invention relates to moving image processing technology, in particular to a method and device for encoding/decoding stereoscopic video data.

背景技术Background technique

人类在观看周围世界时，不仅能看到物体的宽度和高度，而且能知道它们的深度，能判断物体之间或观看者与物体之间的距离。这种三维视觉特性产生的主要原因是：人们通常总是双目同时观看物体，而由于两只眼睛视轴的间距(约65mm)，左眼和右眼在看一定距离的物体时，所接收到的视觉图象是不同的，因而大脑通过眼球的运动、调整，综合了这两幅图象的信息，产生立体感。在单用左眼和右眼观看物体时，所产生的图象移位感觉就叫视差。When human beings watch the surrounding world, they can not only see the width and height of objects, but also know their depth, and can judge the distance between objects or between the viewer and objects. The main reason for this three-dimensional visual characteristic is that people usually always watch objects with both eyes at the same time, and due to the distance between the visual axes of the two eyes (about 65mm), when the left eye and the right eye look at an object at a certain distance, the received The visual images received are different, so the brain synthesizes the information of the two images through the movement and adjustment of the eyeballs to produce a three-dimensional effect. When viewing an object with only the left eye and the right eye, the resulting image displacement sensation is called parallax.

源于双目结构的三维视觉特性给我们提供了一个从左右两幅图象中获得真实世界相对深度感的直接而简单的方法，而这种相对深度信息在诸如远程通信(远程医疗，远程会议)、远程机器人(远程遥控，自主航空，监视)、娱乐(交互式HDTV，立体电影)和虚拟现实之类的应用中是至关重要的。但是这种为增加真实性而引入相对深度信息的明显代价是使得其传输和存储的数据量比单视系统增加一倍以上。为了满足数据量的增加，解决方式无外乎增加信道带宽、以高效的协议改进信道利用率和以高效的压缩技术减少信源码率这些途径。但是由于增加存储器容量和网络带宽的不经济性，因此必须采用行之有效的图象压缩技术。The three-dimensional visual characteristics derived from the binocular structure provide us with a direct and simple method to obtain the relative depth sense of the real world from the left and right images, and this relative depth information is used in such as remote communication (telemedicine, teleconferencing) ), telerobotics (remote control, autonomous aviation, surveillance), entertainment (interactive HDTV, stereoscopic movies) and virtual reality. But the obvious cost of introducing relative depth information to increase realism is that the amount of data transmitted and stored is more than double that of single-vision systems. In order to meet the increase in data volume, the solutions are nothing more than increasing channel bandwidth, improving channel utilization with efficient protocols, and reducing source code rates with efficient compression techniques. However, due to the uneconomical increase in memory capacity and network bandwidth, effective image compression techniques must be used.

现有技术下立体视频编码的方法在本质上都是利用双目视频流之间的相关性来整体提高两路视频信号的编码效率。大体上有两类方法，第一类为基于MPEG视频编码标准的立体视频流编码方法，其基本原理是对其中一路视频流单独进行编码，而另一路视频流则采用视差估计与补偿技术进行编码。这类方法大都采用混合编码方式，例如与标准兼容的混合清晰度编码(其中一个流的清晰度相对较差)、基于心理特征的比特分配法、基于多分辨率的立体编码以及采用帧估计内插重建右B帧拍，右路视频流B帧不传送，而是在解码端作内插恢复)等方式。这类方法存在的问题是：视差估计补偿的效率有待改进；在利用双目视差信息的同时忽略了对右路流运动信息的有效利用，整体编码效率还有较大的上升空间；采用帧估计内插技术的立体编码压缩率虽高，但现有的帧估计内插技术比较简单，重建图象质量不理想；总体上还缺乏成熟完善的立体编码系统。The stereoscopic video coding methods in the prior art essentially use the correlation between binocular video streams to improve the coding efficiency of the two video signals as a whole. There are generally two types of methods. The first type is a stereoscopic video stream encoding method based on the MPEG video coding standard. The basic principle is to encode one video stream separately, while the other video stream is encoded using disparity estimation and compensation technology. . Most of these methods use hybrid coding methods, such as standard-compliant mixed-definition coding (where one stream has relatively poor definition), psychometric-based bit allocation, multi-resolution-based stereo coding, and frame estimation. Interpolation and reconstruction of the right B-frame shooting, the B-frame of the right video stream is not transmitted, but is interpolated and restored at the decoding end) and other methods. The problems of this kind of method are: the efficiency of disparity estimation and compensation needs to be improved; while using the binocular disparity information, the effective use of the motion information of the right channel is ignored, and the overall coding efficiency still has a large room for improvement; the use of frame estimation Although the interpolation technology has a high compression rate for stereo coding, the existing frame estimation interpolation technology is relatively simple, and the quality of the reconstructed image is not ideal; in general, there is still a lack of a mature and perfect stereo coding system.

第二类为基于对象的立体编码方法，其基本原理是对场景中的对象进行分割提取并结合运动及双目深度信息进行编码。但是当场景中有多个物体出现时，这类方法的编码效果并不好，同时由于其计算的复杂性，实时性也较差，离实时系统应用的要求尚远。The second category is the object-based stereo encoding method, whose basic principle is to segment and extract objects in the scene and encode them in combination with motion and binocular depth information. But when there are multiple objects appearing in the scene, the encoding effect of this kind of method is not good. At the same time, due to the complexity of its calculation, the real-time performance is also poor, which is still far from the requirements of real-time system applications.

发明内容Contents of the invention

本发明的目的是提供一种立体视频流编码/解码器，它具有编码压缩率高、解码速率快和可与单视频编码方式兼容的优点。The object of the present invention is to provide a stereoscopic video stream encoder/decoder, which has the advantages of high encoding compression rate, fast decoding rate and compatibility with single video encoding mode.

本发明的上述目的通过以下技术方案实现：Above-mentioned purpose of the present invention is achieved through the following technical solutions:

一种立体视频流编码器，包括：A stereoscopic video stream encoder comprising:

主视频流编码单元，用于对立体视频流中的一路视频流按照MPEG协议进行编码以生成主视频码流；The main video stream encoding unit is used to encode one video stream in the stereoscopic video stream according to the MPEG protocol to generate the main video stream;

辅视频流编码单元，其包括：A secondary video stream coding unit, which includes:

视差/运动补偿估计单无，用于利用主视频流内的帧内编码帧和帧间预测帧分别对立体视频流中的另一路视频流内对应的帧内编码帧和帧间预测帧进行视差补偿估计，并利用辅视频流内先前的帧内编码帧和/或帧间预测帧对辅视频流内当前的帧间预测帧进行运动补偿估计，视差补偿估计的初始值按照下列方式获得：利用通过对辅视频流内的帧间预测帧进行运动补偿估计获得的运动矢量对先前的帧内编码帧或帧间预测帧的视差场进行运动补偿估计，并将新的视差场作为视差补偿估计的初始值：The disparity/motion compensation estimation unit is used to disparity the corresponding intra-coded frames and inter-frame predicted frames in another video stream in the stereoscopic video stream by using the intra-coded frames and inter-frame predicted frames in the main video stream compensation estimation, and use the previous intra-frame coding frame and/or inter-frame prediction frame in the secondary video stream to perform motion compensation estimation on the current inter-frame prediction frame in the secondary video stream, and the initial value of the disparity compensation estimation is obtained in the following manner: using Perform motion compensation estimation on the disparity field of the previous intra-coded frame or inter-frame prediction frame through the motion vector obtained by performing motion compensation estimation on the inter-frame prediction frame in the auxiliary video stream, and use the new disparity field as the disparity compensation estimation Initial value:

补偿预测编码单元，用于对辅视频流内的帧内编码帧的视差补偿估计信息以及帧间预测帧的视差补偿估计信息和运动补偿估计信息进行编码以生成辅视频码流；A compensated prediction coding unit, configured to encode the disparity compensation estimation information of the intra-coded frame in the secondary video stream, the disparity compensation estimation information and the motion compensation estimation information of the inter-frame prediction frame to generate the secondary video code stream;

复用器，用于将主视频码流和辅视频码流以时分复用方式生成立体视频码流。The multiplexer is used to time-division multiplex the main video code stream and the secondary video code stream to generate a stereoscopic video code stream.

比较好的是，在上述立体视频流编码器中，所述视差/运动估计单元采用基于分层马尔可夫概率模型和多级块匹配方式进行视差估计。更好的是，在所述分层马尔可夫概率模型和交叠块匹配方式中，分层等级设定为两级，分割块尺寸分为8×8和16×16两种。Preferably, in the aforementioned stereoscopic video stream encoder, the disparity/motion estimation unit performs disparity estimation based on a layered Markov probability model and a multi-level block matching method. More preferably, in the hierarchical Markov probability model and the overlapping block matching method, the hierarchical level is set to two levels, and the size of the divided block is divided into two types: 8×8 and 16×16.

比较好的是，在上述立体视频流编码器中，通过改变视差补偿估计信息中的残差图象DCT量化系数来调整辅视频码流占用的传输信道带宽。Preferably, in the stereoscopic video stream encoder, the transmission channel bandwidth occupied by the secondary video code stream is adjusted by changing the DCT quantization coefficient of the residual image in the parallax compensation estimation information.

一种立体视频流解码器，包括：A stereoscopic video stream decoder comprising:

去复用器，用于将立体视频码流分解为主视频码流和辅视频码流；A demultiplexer is used to decompose the stereoscopic video code stream into a main video code stream and an auxiliary video code stream;

主视频码流解码单元，用于对主视频码流按照MPEG协议进行解码以生成主视频流；The main video code stream decoding unit is used to decode the main video code stream according to the MPEG protocol to generate the main video stream;

辅视频码流解码单元，其包括：A secondary video code stream decoding unit, which includes:

视差/运动补偿预测单元，用于根据主视频流中的帧内编码帧和帧间预测帧以及辅视频码流中包含的视差补偿估计信息和运动补偿估计信息重建辅视频流的帧内编码帧和帧内图像预测帧；The disparity/motion compensation prediction unit is used to reconstruct the intra-coded frame of the secondary video stream according to the intra-frame coded frame and inter-frame predicted frame in the primary video stream and the parallax compensation estimation information and motion compensation estimation information contained in the secondary video code stream and intra picture prediction frame;

帧估计与内插单元，用于根据相应的主视频流双向预测/内插帧、辅视频流内的帧内编码帧和帧间预测帧以及辅视频码流中包含的视差补偿估计信息和运动补偿估计信息重建辅视频流内的双向预测/内插帧；The frame estimation and interpolation unit is used for bidirectionally predicting/interpolating frames according to the corresponding main video stream, intra-coded frames and inter-frame predicted frames in the secondary video stream, and the parallax compensation estimation information and motion contained in the secondary video stream Compensate estimated information to reconstruct bidirectionally predicted/interpolated frames in the secondary video stream;

辅视频流重建单元，用于对视差/运动补偿预测单元重建的帧内编码帧和帧内图像预测帧以及帧估计与内插单元重建的双向预测/双向内插帧按照时间先后排序以生成辅视频流。The auxiliary video stream reconstruction unit is used to sequence the intra-coded frames and intra-image prediction frames reconstructed by the disparity/motion compensation prediction unit and the bi-directional prediction/bi-directional interpolation frames reconstructed by the frame estimation and interpolation unit in order to generate auxiliary video streams. video stream.

比较好的是，在上述立体视频流解码器中，帧估计与内插单元采用基于贝叶斯最小代价方程的立体帧估计和内插方法重建双向预测/内插帧。Preferably, in the above-mentioned stereoscopic video stream decoder, the frame estimation and interpolation unit uses a stereoscopic frame estimation and interpolation method based on the Bayesian minimum cost equation to reconstruct bidirectionally predicted/interpolated frames.

在本发明的立体视频流编码器中，由于仅对其中一个视频流按照MPEG标准进行高质量的编码，而在另一个视频流中只有少数帧进行编码，其余帧则完全“跳过”而在解码端进行帧估计内插恢复，因此大大提高了编码效率，节省了传输带宽。In the stereoscopic video stream encoder of the present invention, because only one of the video streams is encoded with high quality according to the MPEG standard, only a few frames are encoded in the other video stream, and the remaining frames are then completely "skipped". The decoding end performs frame estimation, interpolation and recovery, thus greatly improving the coding efficiency and saving the transmission bandwidth.

本发明的目的是提供一种立体视频处理系统，它具有编码压缩率高、解码速率快和可与单视频编码方式兼容的优点。The object of the present invention is to provide a stereoscopic video processing system, which has the advantages of high encoding compression rate, fast decoding rate and compatibility with single video encoding mode.

一种视频处理系统，包含摄取左路和右路视频流的摄像机、使两台摄像机输出的视频流在时间上同步的时基校正器、将经过时基校正器时间同步处理后的两路视频流多路复用以形成立体视频流的帧顺序多路复用器、包含如权利要求1所述的立体视频编码器和如权利要求5所述的立体视频流解码器的计算机系统以及普通显示器和立体显示器，A video processing system, including a camera that ingests left and right video streams, a time base corrector that synchronizes the video streams output by the two cameras in time, and two channels of video that have been time-synchronized and processed by the time base corrector Frame sequential multiplexer for stream multiplexing to form stereoscopic video stream, computer system comprising stereoscopic video encoder as claimed in claim 1 and stereoscopic video stream decoder as claimed in claim 5 and general display and stereoscopic displays,

其中，当仅需传输单路视频流时，由立体视频流编码器的主视频流编码单元对视频图像进行编码并将信号送至传输信道，当需要传输两路视频流时，由主视频流编码单元和辅视频流编码单元分别对左右路视频图像进行编码并将信号送至传输信道，当接收的视频码流仅包含一路视频流时，由立体视频流解码器的主视频码流解码单元对编码码流进行解码并将解码信号送至普通显示器，当接收的码流包含左右两路视频流信号时，由主视频码流解码单元和辅视频码流解码单元分别对左右两路编码码流进行解码并将解码信号送至立体显示器。Among them, when only a single video stream needs to be transmitted, the main video stream encoding unit of the stereoscopic video stream encoder encodes the video image and sends the signal to the transmission channel; when two video streams need to be transmitted, the main video stream The encoding unit and the auxiliary video stream encoding unit respectively encode the left and right channel video images and send the signals to the transmission channel. When the received video stream only contains one video stream, the main video stream decoding unit of the stereoscopic video stream decoder Decode the encoded code stream and send the decoded signal to the ordinary display. When the received code stream contains left and right video stream signals, the main video code stream decoding unit and the auxiliary video code stream decoding unit respectively decode the left and right two code streams. The stream is decoded and the decoded signal is sent to the stereoscopic display.

本发明的视频系统除了具有编码效率高和传输带宽要求低的优点以外，还可与现有的单视频MPEG系列编码标准兼容。这在保持编码兼容性的前提下降低了系统的升级费用，而且提供了对立体显示质量的灵活控制。In addition to the advantages of high coding efficiency and low transmission bandwidth requirements, the video system of the present invention is also compatible with existing single-video MPEG serial coding standards. This reduces system upgrade costs while maintaining encoding compatibility, and provides flexible control over stereoscopic display quality.

附图说明Description of drawings

通过以下结合附图对本发明较佳实施例的描述，可以进一步理解本发明的目的、特征和优点，其中：Through the following description of the preferred embodiments of the present invention in conjunction with the accompanying drawings, you can further understand the purpose, features and advantages of the present invention, wherein:

图1为按照本发明的立体视频流编码/解码器的示意图。FIG. 1 is a schematic diagram of a stereoscopic video stream encoder/decoder according to the present invention.

图2为采用按照本发明的立体视频流编码/解码器的视频处理系统示意图。FIG. 2 is a schematic diagram of a video processing system using a stereoscopic video stream encoder/decoder according to the present invention.

具体实施方式Detailed ways

以下结合附图描述本发明的较佳实施例。Preferred embodiments of the present invention are described below in conjunction with the accompanying drawings.

图1为按照本发明的立体视频流编码/解码器的示意图。如图1所示，立体视频流编码器1负责对输入的左右视频流进行编码，以下为描述方便起见假定左视频流为主视频流而右视频流为辅视频流，但是这种假定不应理解为是对本发明的限定，实际上也可以是相反的假定。立体视频流编码器1编码生成的视频码流经信道2传输至立体视频流解码器3。FIG. 1 is a schematic diagram of a stereoscopic video stream encoder/decoder according to the present invention. As shown in Figure 1, the stereoscopic video stream encoder 1 is responsible for encoding the input left and right video streams. For the convenience of description, it is assumed that the left video stream is the main video stream and the right video stream is the secondary video stream, but this assumption should not It should be understood as a limitation of the present invention, but in fact it can also be a contrary assumption. The video code stream encoded by the stereoscopic video stream encoder 1 is transmitted to the stereoscopic video stream decoder 3 through the channel 2 .

参见图1，立体视频流编码器1包括作为主视频流编码单元的MPEG编码器4、复用器7以及由视差/运动补偿估计单元5和补偿预测编码单元6构成的辅视频流编码单元。Referring to FIG. 1 , a stereoscopic video stream encoder 1 includes an MPEG encoder 4 as a primary video stream encoding unit, a multiplexer 7 , and a secondary video stream encoding unit composed of a disparity/motion compensation estimation unit 5 and a compensated prediction encoding unit 6 .

MPEG数字视频编码技术实质上是一种利用视频序列在时间和空间方向上的统计冗余度实现的图像压缩方法，它依赖于像素之间(interpel)的相关性，包含这样一个假设：即在各连续帧之间存在简单的相关性平移运动。因此一个特殊画面上的像素量值，可以采用帧内编码技术根据同帧附近像素来加以预测，或者可以采用帧间技术根据附近帧中的像素来加以预测。MPEG digital video coding technology is essentially an image compression method that uses the statistical redundancy of video sequences in the time and space directions. It relies on the correlation between pixels (interpel), including such an assumption: that is, in There is simple correlated translational motion between successive frames. Therefore, the value of a pixel on a particular picture can be predicted based on nearby pixels in the same frame using intra-frame coding techniques, or can be predicted based on pixels in nearby frames using inter-frame coding techniques.

当一个视频序列镜头变化时，各附近帧中像素之间的时间相关性就很小，甚至消失，此时应采用帧内编码技术来开发空间相关性以实现有效的数据压缩。在MPEG压缩算法中，采用离散余弦变换(DCT)编码技术，以8×8像素的画面块为单位来有效开发同一画面各附近像索之间的空间相关性，以下将可根据帧内编码技术压缩的图像帧称为帧内编码帧，并简记为I^M或I^A，其中上标M和A分别代表主视频流和辅视频流。When the shot of a video sequence changes, the temporal correlation between pixels in nearby frames is small or even disappears. At this time, intra-frame coding technology should be used to develop spatial correlation to achieve effective data compression. In the MPEG compression algorithm, the discrete cosine transform (DCT) coding technology is used to effectively develop the spatial correlation between the nearby pixels of the same picture in units of 8×8 pixel picture blocks. The following will be based on the intra-frame coding technology A compressed image frame is called an intra-coded frame, and is abbreviated as I ^M or I ^A , where the superscripts M and A represent the main video stream and the auxiliary video stream, respectively.

如果附近帧中各像素间具有较大的相关性，也就是说，两个连续帧的内容很相似或相同时，就可以采用基于时间预测(帧间的运动补偿预测)的帧间DPCM编码技术，以下将可根据帧间编码技术压缩的图像帧称为帧间预测帧，并简记为P^M或P^A，其中上标M和A分别代表主视频流和辅视频流。If there is a large correlation between pixels in nearby frames, that is, when the content of two consecutive frames is very similar or the same, you can use inter-frame DPCM coding technology based on temporal prediction (inter-frame motion compensation prediction) , hereinafter, the image frames that can be compressed according to the inter-frame coding technology are called inter-frame predictive frames, and are abbreviated as ^PM or ^PA , where the superscripts M and A represent the primary video stream and the secondary video stream, respectively.

在MPEG标准中还引入一种称为双向预测帧的图像帧，它可采用过去帧和未来帧作为参考帧还原得到，但是其本身不能作为参考帧，以下将这类图像帧称为双向预测帧，并简记为B^M或B^A，其中上标M和A分别代表主视频流和辅视频流。The MPEG standard also introduces an image frame called a bidirectional predictive frame, which can be restored by using the past frame and the future frame as a reference frame, but it cannot be used as a reference frame. This type of image frame is called a bidirectional predictive frame below. , and abbreviated as B ^M or B ^A , where the superscripts M and A represent the primary video stream and the secondary video stream, respectively.

在本发明中，MPEG编码器4对左路视频流按照MPEG标准进行编码以生成主视频码流，该主视频码流由按照一定顺序排列的编码后I^M、P^M和B^M帧序列构成。In the present invention, the MPEG encoder 4 encodes the left video stream according to the MPEG standard to generate the main video code stream, which is composed of encoded I ^M , ^PM and B ^M frame sequences arranged in a certain order .

如图所示，左右两路视频流都被输入辅视频流编码单元中的视差/运动估计单元5，并在该单元中进行视差和运动估计。具体而言，将主视频流和辅视频流内同步或对应的帧内编码帧I^M与I^A以及帧间预测帧P^M与P^A进行比较以获得对辅视频流内图像帧I^A或P^A的视差估计；将辅视频流先前的帧内编码帧I^A或帧间预测帧P^A与当前的帧间预测帧P^A进行比较以获得对当前帧间预测帧的运动估计。之所以为每幅P^A帧提供运动估计信息和视差估计信息是因为，在一般情况下，将运动与视差进行混合补偿可得到最好的预测结果，因此在本发明中，为了使解码端恢复出较高质量的图像帧，视差/运动估计单元5为一幅P^A帧提供了运动估计信息(通过将先前的同一视频流内参考帧I^A帧或P^A帧与当前的P^A帧比较得到)和视差估计信息(根据对应的P^A和P^M帧得到)，这样可以有效解决因时域遮挡与视差遮挡造成的编码效率降低的问题。As shown in the figure, both the left and right video streams are input to the disparity/motion estimation unit 5 in the secondary video stream coding unit, and disparity and motion estimation are performed in this unit. Specifically, the synchronous or corresponding intra-frame coding frames I ^M and I ^A in the main video stream and the auxiliary video stream are compared with the inter-frame prediction frames P ^M and ^PA to obtain an image frame I A or I ^A in the auxiliary video stream. Parallax estimation of ^PA ; compare the previous intra-frame coding frame ^IA or inter-frame prediction frame ^PA of the secondary video stream with the current inter-frame prediction frame ^PA to obtain motion estimation for the current inter-frame prediction frame. The reason why motion estimation information and parallax estimation information are provided for each ^PA frame is because, in general, the best prediction result can be obtained by mixing motion and parallax compensation, so in the present invention, in order to make the decoder recover To produce higher quality image frames, the disparity/motion estimation unit 5 provides motion estimation information for a ^PA frame (by comparing the previous reference frame I ^A frame or ^PA frame with the current ^PA frame in the same video stream) ) and disparity estimation information (according to the corresponding ^PA and ^PM frames), which can effectively solve the problem of reduced coding efficiency due to temporal occlusion and parallax occlusion.

视差估计的方法有多种，在本发明中，视差/运动估计单元5采用基于分层马尔可夫概率模型和多级块匹配方式进行视差估计。该方法的优点是可获得一个平滑和相对准确的视差场，这将大大降低视差补偿残差图象的熵，从而进一步提高压缩率。为了与MPEG标准的块尺寸兼容，在采用上述分层马尔可夫概率模型和交叠块匹配方式时，将分层等级设定为两级，分割块尺寸分为8×8和16×16两种。There are many methods for disparity estimation. In the present invention, the disparity/motion estimation unit 5 performs disparity estimation based on hierarchical Markov probability model and multi-level block matching. The advantage of this method is that a smooth and relatively accurate disparity field can be obtained, which will greatly reduce the entropy of the disparity compensation residual image, thereby further improving the compression rate. In order to be compatible with the block size of the MPEG standard, when using the above-mentioned hierarchical Markov probability model and overlapping block matching method, the hierarchical level is set to two levels, and the block size is divided into two types: 8×8 and 16×16. kind.

运动补偿估计是一种基于时间的DPCM编码预测技术，其在MPEG1和MPEG2视频编码标准中得到了广泛应用。运动补偿概念是以对视频帧间运动的估算为基础的，也就是说，若视频镜头中所有物体均在空间上有一位移，则用有限的运动参数(例如对于像素的平移运动，可用运动矢量来描述)来对帧间运动加以描述。由于一些运动矢量之间的空间相关性通常较高，有时可以认为一个运动矢量代表了一个相邻像素块的运动，因此可将一帧画面划分为若干像素块(在MPEG1和MPEG2标准中一个像素块为16×16像素)，并只对代表每个像素块的一个运动矢量进行估算、编码和传送。由于只对预测误差画面(原始画面与运动补偿预测画面之间的差别)加以编码，因此减少了帧间的时间冗余度。Motion Compensated Estimation is a time-based DPCM coding prediction technique, which is widely used in MPEG1 and MPEG2 video coding standards. The concept of motion compensation is based on the estimation of motion between video frames, that is, if all objects in the video shot have a displacement in space, then limited motion parameters (for example, for translational motion of pixels, available motion vector to describe) to describe inter-frame motion. Because the spatial correlation between some motion vectors is usually high, sometimes it can be considered that a motion vector represents the motion of an adjacent pixel block, so a frame of picture can be divided into several pixel blocks (one pixel in MPEG1 and MPEG2 standards The block is 16×16 pixels), and only one motion vector representing each pixel block is estimated, coded and transmitted. Since only the prediction error picture (the difference between the original picture and the motion compensated predicted picture) is coded, temporal redundancy between frames is reduced.

实际观察表明，对于时间上连续的立体视频图象，它们的视差场同样具有高度时间冗余度，因此在本发明中，比较好的是按照下列方式获得视差补偿估计的初始值：首先对P^A帧进行运动补偿预测以获得运动矢量，然后对同一视频流内先前的参考帧I^A(或P^A)的视差场进行运动补偿预测，由此得到新视差场即可作为视差估计的初始值。这种方式可大大降低辅视频流编码所需的时间，提高了编码速度。Practical observations show that for time-continuous stereoscopic video images, their disparity fields also have a high degree of time redundancy, so in the present invention, it is better to obtain the initial value of the disparity compensation estimate in the following manner: at first for P Perform motion compensation prediction on frame ^A to obtain the motion vector, and then perform motion compensation prediction on the disparity field of the previous reference frame I ^A (or ^PA ) in the same video stream, so that the new disparity field can be used as the initial value of disparity estimation . This method can greatly reduce the time required for encoding the auxiliary video stream and improve the encoding speed.

补偿预测编码单元6与视差/运动估计单元5相连，其对视差/运动估计单元5获得的I^A帧视差估计补偿信息以及P^A帧的的视差估计补偿信息或运动估计补偿信息进行编码以生成辅视频码流。编码后的I^A帧视差估计补偿信息比特流分为三部分：视差矢量流、视差补偿残差图象以及四叉树结构，其中，视差矢量流采用差分脉冲编码方法(DPCM)编码，残差图象采用离散余弦变换(DCT)及标量量化方法来编码。The compensation prediction coding unit 6 is connected to the disparity/motion estimation unit 5, and it encodes the disparity estimation compensation information of the ^IA frame obtained by the disparity/motion estimation unit 5 and the disparity estimation compensation information or the motion estimation compensation information of the ^PA frame to generate Secondary video code stream. The coded ^IA frame parallax estimation compensation information bit stream is divided into three parts: a parallax vector stream, a parallax compensation residual image and a quadtree structure, wherein the parallax vector stream adopts differential pulse coding (DPCM) encoding, and the residual Images are coded using discrete cosine transform (DCT) and scalar quantization methods.

复用器7与MPEG编码器4和补偿预测编码单元6相连，它将主视频码流和辅视频码流以时分复用方式生成立体视频码流。在本发明中，为了提高编码效率，辅视频流内所有的双向预测/内插帧(B^A帧)都不作任何编码处理，也不作为辅视频码流一部分送入复用器7以在信道2上传输。The multiplexer 7 is connected with the MPEG encoder 4 and the compensation predictive encoding unit 6, and generates a stereoscopic video stream by time-division multiplexing the main video stream and the auxiliary video stream. In the present invention, in order to improve the encoding efficiency, all bidirectional prediction/interpolation frames ( ^BA frames) in the auxiliary video stream are not subjected to any encoding process, nor are they sent to the multiplexer 7 as a part of the auxiliary video code stream for channel 2 on the transmission.

在上述立体视频流编码器中，可以通过改变上述视差补偿后残差图象的DCT量化系数，灵活地改变传输信道的附加带宽以满足各种带宽需求下的立体显示。In the above-mentioned stereoscopic video stream encoder, by changing the DCT quantization coefficient of the residual image after parallax compensation, the additional bandwidth of the transmission channel can be flexibly changed to meet stereoscopic display under various bandwidth requirements.

再次参见图1，立体视频流解码器3包括作为主视频流解码单元的MPEG解码器9、去复用器7以及由视差/运动补偿预测单元10、帧估计与内插单元11和辅视频流重建单元12构成的辅视频流解码单元。Referring to Fig. 1 again, the stereoscopic video stream decoder 3 includes an MPEG decoder 9 as a main video stream decoding unit, a demultiplexer 7, and a disparity/motion compensation prediction unit 10, a frame estimation and interpolation unit 11 and a secondary video stream The reconstruction unit 12 constitutes a secondary video stream decoding unit.

如图1所示，去复用器8将信道2上传输的立体视频码流分解为主视频码流和辅视频码流并将主视频流提供给MPEG解码器9而将辅视频流提供给视差/运动补偿预测单元10和帧估计与内插单元11。As shown in Figure 1, the demultiplexer 8 decomposes the stereoscopic video code stream transmitted on the channel 2 into a main video code stream and an auxiliary video code stream and provides the main video stream to the MPEG decoder 9 and provides the auxiliary video stream to the A disparity/motion compensated prediction unit 10 and a frame estimation and interpolation unit 11 .

MPEG解码器9对主视频码流按照MPEG协议进行解码以生成主视频流，其由按照一定顺序排列的恢复后I^M、P^M和B^M帧序列构成。The MPEG decoder 9 decodes the main video code stream according to the MPEG protocol to generate the main video stream, which is composed of restored I ^M , PM ^and B ^M frame sequences arranged in a certain order.

视差/运动补偿预测单元10还与MPEG解码器9、帧估计与内插单元11和辅视频流重建单元12相连，其根据MPEG解码器9输出的主视频流中帧内编码帧I^M和帧间预测帧P^M以及去复用器8输出的辅视频码流中包含的视差估计补偿信息和运动估计补偿信息重建辅视频流内相应的帧内编码帧I^A和帧间预测帧P^A，其重建的I^A帧和P^A帧被输出至帧估计与内插单元11和辅视频流重建单元12。The disparity/motion compensation prediction unit 10 is also connected with the MPEG decoder 9, the frame estimation and interpolation unit 11 and the secondary video stream reconstruction unit 12, and it is based on the intraframe coding frame ^IM and frame The inter-prediction frame P ^M and the disparity estimation compensation information and motion estimation compensation information contained in the secondary video stream output by the demultiplexer 8 reconstruct the corresponding intra-frame coding frame I ^A and inter-frame prediction frame ^PA in the secondary video stream, The reconstructed I ^A frame and ^PA frame are output to the frame estimation and interpolation unit 11 and the secondary video stream reconstruction unit 12 .

帧估计与内插单元11还与MPEG解码器9和辅视频流重建单元12相连，其根据MPEG解码器9输出的主视频流内相应的双向预测帧B^M、辅视频流内相应的帧内编码帧I^A和帧间预测帧P^A(例如该B^A前后邻近的I^A帧和P^A帧)以及辅视频码流中包含的视差估计补偿信息和运动估计补偿信息重建辅视频流的双向预测/内插帧，其重建的B^A帧被输出至辅视频流重建单元12。The frame estimation and interpolation unit 11 is also connected with the MPEG decoder 9 and the auxiliary video stream reconstruction unit 12, which is based on the corresponding bidirectional prediction frame B ^M in the main video stream output by the MPEG decoder 9 and the corresponding intra-frame B M in the auxiliary video stream. Coding frame I ^A and inter-frame prediction frame ^PA (such as the I ^A frame and ^PA frame adjacent to the B ^A ) and the disparity estimation compensation information and motion estimation compensation information contained in the secondary video code stream reconstruct the two-way of the secondary video stream The predicted/interpolated frame, and its reconstructed B ^A frame is output to the auxiliary video stream reconstruction unit 12.

在辅视频流重建单元12内，视差/运动补偿预测单元10重建的帧内编码帧I^A和帧间预测帧P^A以及帧估计与内插单元重建的双向预测/内插帧B^A按照采集时间先后排序以生成辅视频流。In the secondary video stream reconstruction unit 12, the intra-coded frame I ^A and the inter-frame prediction frame PA reconstructed by the disparity/motion compensation prediction unit 10 and the bidirectional prediction/interpolation frame B ^A reconstructed by the frame estimation and interpolation ^unit follow the acquisition time sequence to generate secondary video streams.

由于辅视频流内绝大多数为B^A帧，因此在立体编解码结构中，B^A帧重建速度和图象质量是十分重要的。为此，在本发明中采用一种帧估计方法，其基于贝叶斯最小代价方程的立体帧估计和内插方法(SFEI_BLCF)。该方法利用在解码端获得的运动、视差和图象信息(在图2中以虚线所示箭头表示)以及立体视频序列自身的特点，可以快速合成B^A帧，并且重建图象在立体视觉意义上具有可接受的质量。具体重建步骤如下：Since most of the secondary video streams are ^BA frames, the reconstruction speed and image quality of ^BA frames are very important in the stereo codec structure. To this end, a frame estimation method is adopted in the present invention, which is based on the Bayesian minimum cost equation stereo frame estimation and interpolation method (SFEI_BLCF). This method utilizes the motion, parallax and image information (indicated by the arrow shown in dotted line in Fig. 2) obtained at the decoding end and the characteristics of the stereoscopic video sequence itself to quickly synthesize B ^{and A} frames, and reconstruct the image in the sense of stereo vision is of acceptable quality. The specific reconstruction steps are as follows:

(1)由于B^A帧是内插于I^A与P^A帧之间的，所以对I^A与P^A帧之间的运动矢量按B^A帧到I^A帧的距离进行伸缩以确定I^A帧内的像素点在B^A帧内的位置。(1) Since the B ^A frame is interpolated between the I ^A and P ^A frames, the motion vector between the I ^A and P ^A frames is scaled according to the distance from the B ^A frame to the I ^A frame to determine the I ^A The position of the pixel in the frame in the B ^A frame.

(2)对于同一个像素点，如果其在相应的B^M、I^A和P^A帧内的像素值之差小于设定值，则将其视为可视区域，对这些像素值的加权平均值作为B^A帧内相应像素点的取值，并且记录下B^A帧内该像素点指向I^A和P^A帧的运动矢量以及指向B^M帧的视差矢量。(2) For the same pixel, if the difference between the pixel values in the corresponding B ^M , I ^A and ^PA frames is less than the set value, it will be regarded as a visible area, and the weighted average of these pixel values The value is used as the value of the corresponding pixel in the ^BA frame, and the motion vector of the pixel pointing to the ^IA and ^PA frames in the ^BA frame and the disparity vector pointing to the ^BM frame are recorded.

(3)对于同一个像素点，如果其在相应的B^M、I^A和P^A帧内的像素值之差大于或等于设定值，则将该像素点视为遮挡点，在其邻域的可视区域中选择与各个像素点相关的运动矢量中的一个作为匹配运动矢量，并根据这个运动矢量映射到相应的图象帧以获得该点的最终像素值。(3) For the same pixel, if the difference between the pixel values in the corresponding B ^M , I ^A and ^PA frames is greater than or equal to the set value, the pixel is regarded as an occlusion point, and in its neighborhood Select one of the motion vectors related to each pixel in the visible area of the pixel as the matching motion vector, and map to the corresponding image frame according to this motion vector to obtain the final pixel value of the point.

由上可见，在本发明的立体视频流编码器/解码器中，主视频流的B^M帧、辅视频流的I^A和P^A帧作为帧间补偿预测的参考帧，均需要进行编码传输。但是在解码时，可以直接利用解码端获得的运动和视差矢量值对辅视频流B^A帧进行恢复和重建而无需进行匹配搜索，因此本发明具有编码压缩率高和解码速度快的特点。As can be seen from the above, in the stereoscopic video stream encoder/decoder of the present invention, the B ^M frame of the main video stream, the I ^A and ^PA frames of the auxiliary video stream are used as reference frames for inter-frame compensation prediction, and all need to be coded and transmitted . However, during decoding, the motion and disparity vector values obtained at the decoding end can be directly used to recover and reconstruct the secondary video stream B ^A frame without matching search, so the present invention has the characteristics of high coding compression rate and fast decoding speed.

图2示出了本发明的视频处理系统示意图。如图2所示，该视频处理系统包含两台分别摄取左路和右路视频流的摄像机21a和21b、与摄像机相连的时基校正器22、与时基校正器22相连的帧顺序多路复用器23、计算机系统24以及普通显示器25和立体显示器26，其中计算机系统24包含上述立体视频编码器和立体视频流解码器。Fig. 2 shows a schematic diagram of the video processing system of the present invention. As shown in Figure 2, the video processing system includes two cameras 21a and 21b that capture the left and right video streams respectively, a time base corrector 22 connected to the cameras, and a frame sequential multiplexer connected to the time base corrector 22. A multiplexer 23, a computer system 24, a common display 25 and a stereoscopic display 26, wherein the computer system 24 includes the aforementioned stereoscopic video encoder and stereoscopic video stream decoder.

在上述视频处理系统中，当进行编码时，两台摄像机21a和21b输出的左右视频流经时基校正器22进行时间同步处理后输出至帧顺序多路复用器23，经过多路复用形成立体视频流后送入计算机系统24。当只有一路视频流输入计算机系统24或者仅需传输单路视频流时，由立体视频流编码器的主视频流编码单元对视频图像进行编码并将MPEG标准码流信号送至传输信道，当需要传输两路视频流时，由立体视频流编码器的主视频流编码单元和辅视频流编码单元分别对左右路视频图像进行编码并将包含主视频码流和辅视频码流的信号送至传输信道。In the above-mentioned video processing system, when encoding, the left and right video streams output by the two cameras 21a and 21b are time-synchronized by the time base corrector 22 and then output to the frame sequential multiplexer 23 for multiplexing. After the stereoscopic video stream is formed, it is sent to the computer system 24 . When only one video stream is input into the computer system 24 or when only a single video stream needs to be transmitted, the video image is encoded by the main video stream encoding unit of the stereoscopic video stream encoder and the MPEG standard code stream signal is sent to the transmission channel, when needed When transmitting two video streams, the main video stream encoding unit and the auxiliary video stream encoding unit of the stereoscopic video stream encoder encode the left and right video images respectively and send the signals containing the main video code stream and the auxiliary video code stream to the transmission channel.

解码由计算机系统24的立体视频流解码器完成，当接收的视频流仅包含一路视频流时，由立体视频流解码器的主视频码流解码单元对编码码流进行解码并将解码信号送至普通显示器，当接收的码流包含左右两路视频流信号时，由立体视频流解码器的主视频码流解码单元和辅视频码流解码单元分别对左右两路编码码流进行解码并将解码信号送至自动立体显示器。The decoding is completed by the stereoscopic video stream decoder of the computer system 24. When the received video stream only contains one video stream, the main video stream decoding unit of the stereoscopic video stream decoder decodes the coded stream and sends the decoded signal to For an ordinary display, when the received code stream contains left and right video stream signals, the main video code stream decoding unit and the auxiliary video code stream decoding unit of the stereoscopic video stream decoder decode the left and right two code streams respectively and decode The signal is sent to an autostereoscopic display.

以下以一个具体应用示例说明本发明的效果。假设图像帧为CIF格式(352×288)，对主视频流按照MPEG编码的语法标准进行编码，该路图象质量相对较高(平均峰值信噪比PSNR在35dB左右)，编码率为0.14MbS～2.55MbS。辅视频流中仅有少数帧进行预测编码和传送，其余帧则完全“跳过”，在编码端被“跳过”的帧在解码端通过帧估计和内插进行实时的恢复，该路视频流的平均编码率为14.8Kbs～108Kbs。通过比较可见，传送辅视频流所需要的附加带宽极低，使得立体数字电视的总比特流仅是通常单视数字电视传输比特流的1.15～1.3倍左右。虽然辅视频流的图象质量比主视频流稍低(平均峰值信噪比PSNR在30dB左右)，但这种具有混合分辨率的左右图象在解码端完全可以利用人体视觉系统特性(HumanVisualsystem，HVS)以及相应的立体显示器合成为具有高度视觉清晰度和足够深度感的立体图象。The effect of the present invention is illustrated below with a specific application example. Assuming that the image frame is in CIF format (352×288), the main video stream is encoded according to the syntax standard of MPEG encoding. The image quality of this channel is relatively high (the average peak signal-to-noise ratio PSNR is about 35dB), and the encoding rate is 0.14MbS ~2.55MbS. Only a few frames in the auxiliary video stream are predictively encoded and transmitted, and the remaining frames are completely "skipped". The average coding rate of the stream is 14.8Kbs~108Kbs. Through comparison, it can be seen that the additional bandwidth required to transmit the auxiliary video stream is extremely low, so that the total bit stream of stereoscopic digital TV is only about 1.15 to 1.3 times that of normal single-view digital TV transmission bit stream. Although the image quality of the auxiliary video stream is slightly lower than that of the main video stream (the average peak signal-to-noise ratio (PSNR) is about 30dB), this left and right images with mixed resolutions can fully utilize the characteristics of the human visual system (HumanVisualsystem, HVS) and corresponding stereoscopic displays synthesize stereoscopic images with high visual clarity and sufficient depth perception.

Claims

1. A stereoscopic video stream encoder, characterized in that, comprising:

The main video stream encoding unit is used to encode one video stream in the stereoscopic video stream according to the MPEG protocol to generate the main video stream;

A secondary video stream coding unit, which includes:

The disparity/motion compensation estimation unit is used to perform disparity compensation on the corresponding intra-coded frames and inter-frame predicted frames in another video stream in the stereoscopic video stream by using the intra-frame coded frames and inter-frame predicted frames in the main video stream estimate, and use the previous intra-coded frame and/or inter-frame predicted frame in the secondary video stream to perform motion compensation estimation on the current inter-frame predicted frame in the secondary video stream, wherein the initial value of the parallax compensation estimate is obtained in the following manner: Perform motion-compensated estimation on the disparity field of the previous intra-coded frame or inter-predicted frame using the motion vector obtained by performing motion-compensated estimation on the inter-predicted frame in the secondary video stream, and use the new disparity field as the disparity-compensated estimate initial value;

A compensated prediction coding unit, configured to encode the disparity compensation estimation information of the intra-coded frame in the secondary video stream, the disparity compensation estimation information and the motion compensation estimation information of the inter-frame prediction frame to generate the secondary video code stream;

The multiplexer is used to time-division multiplex the main video code stream and the secondary video code stream to generate a stereoscopic video code stream.

2. The stereoscopic video stream encoder according to claim 1, wherein the disparity/motion estimation unit performs disparity estimation based on a hierarchical Markov probability model and a multi-level block matching method.

3. stereoscopic video stream encoder as claimed in claim 2, is characterized in that, in described layered Markov probability model and overlapping block matching mode, hierarchical level is set as two levels, and segmented block size is divided into two stages. There are two types of 8×8 and 16×16.

4. The stereoscopic video stream encoder according to any one of claims 1-3, wherein it adjusts the transmission occupied by the auxiliary video code stream by changing the DCT quantization coefficient of the residual image in the parallax compensation estimation information channel bandwidth.

5. A stereoscopic video stream decoder, characterized in that, comprising:

A demultiplexer is used to decompose the stereoscopic video code stream into a main video code stream and an auxiliary video code stream;

The main video code stream decoding unit is used to decode the main video code stream according to the MPEG protocol to generate the main video stream;

A secondary video code stream decoding unit, which includes:

The disparity/motion compensation prediction unit is used to reconstruct the intra-coded frame of the secondary video stream according to the intra-frame coded frame and inter-frame predicted frame in the primary video stream and the parallax compensation estimation information and motion compensation estimation information contained in the secondary video code stream and intra picture prediction frame;

The frame estimation and interpolation unit is used for bidirectionally predicting/interpolating frames according to the corresponding main video stream, intra-coded frames and inter-frame predicted frames in the secondary video stream, and the parallax compensation estimation information and motion contained in the secondary video stream Compensate estimated information to reconstruct bidirectionally predicted/interpolated frames in the secondary video stream;

The secondary video stream reconstruction unit is used to sort the intra-coded frames and intra-image prediction frames reconstructed by the disparity/motion compensation prediction unit and the bi-directional prediction/bi-directional interpolation frames reconstructed without frame estimation and interpolation in order of time to generate secondary video stream.

6. The stereoscopic video stream decoder according to claim 5, wherein the frame estimation and interpolation unit uses a stereoscopic frame estimation and interpolation method based on the Bayesian minimum cost equation to reconstruct bidirectionally predicted/interpolated frames.

7. A video processing system, characterized in that it includes a camera that takes in the left and right video streams, a time base corrector that synchronizes the video streams output by the two cameras in time, and a time base corrector that passes through the time base corrector. The processed two-way video stream is multiplexed to form a frame sequential multiplexer for a stereoscopic video stream, comprising the stereoscopic video encoder as claimed in claim 1 and the stereoscopic video stream decoder as described in claim 5 computer systems and regular and stereoscopic displays,

Among them, when only a single video stream needs to be transmitted, the main video stream encoding unit of the stereoscopic video stream encoder encodes the video image and sends the signal to the transmission channel; when two video streams need to be transmitted, the main video stream The encoding unit and the auxiliary video stream encoding unit respectively encode the left and right channel video images and send the signal to the transmission channel. When the received video stream only contains one video stream, it is decoded by the main video stream of the stereoscopic video stream decoder The unit decodes the encoded code stream and sends the decoded signal to the ordinary display. When the received code stream contains left and right video stream signals, the main video code stream decoding unit and the auxiliary video code stream decoding unit encode the left and right two channels respectively. The code stream is decoded and the decoded signal is sent to the stereoscopic display.