CN101867810B

CN101867810B - Method for pre-processing deep video sequence

Info

Publication number: CN101867810B
Application number: CN 201010144951
Authority: CN
Inventors: 彭宗举; 蒋刚毅; 郁梅; 卢小明; 陈芬
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2010-04-07
Filing date: 2010-04-07
Publication date: 2011-12-14
Anticipated expiration: 2030-04-07
Also published as: CN101867810A

Abstract

The invention discloses a preprocessing method of a depth video sequence, which performs transformation processing on the depth video sequence to be processed, then uses the relevant information of the corresponding color video sequence to perform time smoothing processing, and then performs inverse transformation processing to obtain the preprocessed On the basis of maintaining the rendering performance of the virtual viewpoint image, the preprocessed depth video sequence can effectively improve the temporal correlation of the depth video sequence, greatly improve the compression coding efficiency of the depth video sequence, and save Code flow can reach 14.43% ~ 37.07%.

Description

A Preprocessing Method of Depth Video Sequence

技术领域 technical field

本发明涉及一种视频信号的处理方法，尤其是涉及一种深度视频序列的预处理方法。The present invention relates to a processing method of a video signal, in particular to a preprocessing method of a depth video sequence.

背景技术 Background technique

在多视点视频系统中，多视点视频信号主要由多视点彩色视频序列信号和与多视点彩色视频序列信号对应的多视点深度视频序列信号组成，深度视频是多视点视频系统中非常重要的辅助信息，其可以通过深度相机进行获取，也可以通过深度估计方法计算得到，其中由于深度相机价格非常昂贵，因此目前用于测试的多视点深度视频序列大多数是通过深度估计方法计算得到的。深度视频代表的是其对应的彩色视频场景到摄像机成像平面的距离信息，它将实际距离值量化到[0，255]。为了推广应用和降低成本，用于虚拟视点图像绘制的深度视频不适合在接收端通过深度估计产生，而需要在发送端通过采集或者估计然后压缩编码再传送给接收端，因此，深度视频的压缩编码对多视点视频系统来说是非常关键的。In a multi-view video system, a multi-view video signal is mainly composed of a multi-view color video sequence signal and a multi-view depth video sequence signal corresponding to the multi-view color video sequence signal. Depth video is very important auxiliary information in a multi-view video system. , which can be acquired by a depth camera or calculated by a depth estimation method. Since the depth camera is very expensive, most of the multi-view depth video sequences currently used for testing are calculated by a depth estimation method. Depth video represents the distance information from its corresponding color video scene to the camera imaging plane, and it quantizes the actual distance value to [0, 255]. In order to promote the application and reduce the cost, the depth video used for virtual viewpoint image rendering is not suitable for depth estimation at the receiving end, but needs to be collected or estimated at the sending end and then compressed and encoded before being transmitted to the receiving end. Therefore, the compression of depth video Coding is critical to multiview video systems.

多视点编码信号中，包含着时间上、视点间和空间相关性。在编码过程中充分利用这些相关性来获得高的压缩效率。然而，通过深度相机获取或者通过深度估计得到的深度视频序列信息在时间方向上和对应的彩色视频序列相比，具有更多的不一致性，这种不一致性降低了深度视频的时间相关性，最终导致了深度视频的编码压缩效率的下降。此外，由于在多视点视频信号中，大多数序列都是时间方向的相关性强于视点间的相关性。相应地，在多视点视频编码中，采用了大量以时间参考为主的视点时间预测编码结构。深度视频的时间相关性的下降引起视点相关性的上升，这种以时间参考为主的预测编码结构将不能充分挖掘深度视频信号中的相关性，不能获得理想的压缩性能。Multi-view coded signals contain temporal, inter-view and spatial correlations. These correlations are fully utilized in the encoding process to obtain high compression efficiency. However, compared with the corresponding color video sequence, the depth video sequence information acquired by the depth camera or obtained by depth estimation has more inconsistencies in the time direction. This inconsistency reduces the temporal correlation of the depth video, and finally This leads to a decrease in coding and compression efficiency of depth video. In addition, since in multi-view video signals, most of the sequences are correlated in the time direction stronger than the correlation between viewpoints. Correspondingly, in multi-view video coding, a large number of viewpoint temporal prediction coding structures based on temporal reference are adopted. The decrease of temporal correlation of depth video leads to the increase of viewpoint correlation. This kind of predictive coding structure based on temporal reference will not be able to fully tap the correlation in depth video signal and cannot obtain ideal compression performance.

发明内容 Contents of the invention

本发明所要解决的技术问题是提供一种能够在维持虚拟视点图像绘制性能的基础上，能够有效提高深度视频序列在时间上的相关性，可大大提高深度视频序列的压缩编码效率的多视点深度视频序列的预处理方法。The technical problem to be solved by the present invention is to provide a multi-viewpoint depth that can effectively improve the temporal correlation of depth video sequences and greatly improve the compression coding efficiency of depth video sequences on the basis of maintaining the virtual viewpoint image rendering performance. Preprocessing methods for video sequences.

本发明解决上述技术问题所采用的技术方案为：一种多视点深度视频序列的预处理方法，多视点视频信号主要由彩色视频序列和与所述的彩色视频序列对应的深度视频序列组成，该预处理方法包括以下步骤：The technical solution adopted by the present invention to solve the above technical problems is: a preprocessing method for a multi-viewpoint depth video sequence. The multi-viewpoint video signal is mainly composed of a color video sequence and a depth video sequence corresponding to the color video sequence. The preprocessing method includes the following steps:

①记待预处理的深度视频序列为D(m，n，k)，其中，m表示深度视频序列D(m，n，k)中的深度视频帧的水平分辨率，n表示深度视频序列D(m，n，k)中的深度视频帧的竖直分辨率，m×n表示深度视频序列D(m，n，k)中的深度视频帧的分辨率，k表示深度视频序列D(m，n，k)中包含的深度视频帧的帧数；① Record the depth video sequence to be preprocessed as D(m, n, k), where m represents the horizontal resolution of the depth video frame in the depth video sequence D(m, n, k), and n represents the depth video sequence D The vertical resolution of the depth video frame in (m, n, k), m × n represents the resolution of the depth video frame in the depth video sequence D (m, n, k), and k represents the depth video sequence D (m , n, the frame number of the depth video frame contained in k);

②对深度视频序列D(m，n，k)作变换处理，得到变换处理后的深度视频序列，记为D′(m，k，n)，其中，变换处理后的深度视频序列D′(m，k，n)中的深度视频帧的分辨率为m×k，变换处理后的深度视频序列D′(m，k，n)中包含的深度视频帧的帧数为n；② Transform the depth video sequence D(m, n, k) to obtain the transformed depth video sequence, denoted as D′(m, k, n), where the transformed depth video sequence D′( m, k, the resolution of the depth video frame in n) is m × k, and the frame number of the depth video frame contained in the depth video sequence D' (m, k, n) after transformation process is n;

③依次对变换处理后的深度视频序列D′(m，k，n)中每帧深度视频帧中的每列像素进行时间平滑处理，得到平滑处理后的深度视频序列，记为D″(m，k，n)，其中，平滑处理后的深度视频序列D″(m，k，n)中的深度视频帧的分辨率为m×k，平滑处理后的深度视频序列D″(m，k，n)中包含的深度视频帧的帧数为n；③Sequentially perform temporal smoothing processing on each column of pixels in each frame of depth video frame in the transformed depth video sequence D′(m,k,n) to obtain the smoothed depth video sequence, denoted as D″(m , k, n), wherein, the resolution of the depth video frame in the smoothed depth video sequence D″(m, k, n) is m×k, and the smoothed depth video sequence D″(m, k , the frame number of the depth video frame contained in n) is n;

④对平滑处理后的深度视频序列D″(m，k，n)作逆变换处理，得到预处理后的深度视频序列，记为D″′(m，n，k)，其中，预处理后的深度视频序列D″′(m，n，k)中的深度视频帧的分辨率为m×n，预处理后的深度视频序列D″′(m，n，k)中包含的深度视频帧的帧数为k。④ Perform inverse transformation processing on the smoothed depth video sequence D″(m, k, n) to obtain the preprocessed depth video sequence, denoted as D″′(m, n, k), where, after preprocessing The resolution of the depth video frames in the depth video sequence D"'(m, n, k) is m×n, and the depth video frames contained in the preprocessed depth video sequence D"'(m, n, k) The number of frames is k.

所述的步骤②中对深度视频序列D(m，n，k)作变换处理的具体过程为：In the described step ②, the specific process of transforming the depth video sequence D (m, n, k) is:

②-1、依次取出深度视频序列D(m，n，k)中所有深度视频帧的第i行的m个像素的亮度分量Y，其中，1≤i≤n；②-1, sequentially take out the luminance component Y of m pixels in the i-th row of all depth video frames in the depth video sequence D (m, n, k), wherein, 1≤i≤n;

②-2、把取出的k行像素的亮度分量Y按序作为变换处理后的深度视频序列D′(m，k，n)中的第i帧深度视频帧的亮度分量Y，然后将变换处理后的深度视频序列D′(m，k，n)中所有深度视频帧的第一色度分量U和第二色度分量V的值均置为128，其中，1≤i≤n。②-2, take the luminance component Y of the k rows of pixels taken out in order as the luminance component Y of the i-th frame depth video frame in the depth video sequence D' (m, k, n) after the transformation process, and then transform the The values of the first chrominance component U and the second chrominance component V of all depth video frames in the subsequent depth video sequence D′(m, k, n) are set to 128, where 1≤i≤n.

所述的步骤③中依次对变换处理后的深度视频序列D′(m，k，n)中每帧深度视频帧中的每列像素进行时间平滑处理的具体过程为：In the described step 3., the specific process of carrying out time smoothing processing to each column pixel in each frame of depth video frame in the depth video sequence D' (m, k, n) after the transformation process is as follows:

③-1、记与待预处理的深度视频序列D(m，n，k)对应的彩色视频序列为C(m，n，k)，其中，彩色视频序列C(m，n，k)中的彩色视频帧的分辨率为m×n，彩色视频序列C(m，n，k)中包含的彩色视频帧的帧数为k；然后采用与深度视频序列D(m，n，k)的变换处理相同的操作对彩色视频序列C(m，n，k)作变换处理，得到变换处理后的彩色视频序列，记为C′(m，k，n)，其中，变换处理后的彩色视频序列C′(m，k，n)中的彩色视频帧的分辨率为m×k，变换处理后的彩色视频序列C′(m，k，n)中包含的彩色视频帧的帧数为n；③-1. Note that the color video sequence corresponding to the depth video sequence D (m, n, k) to be preprocessed is C (m, n, k), wherein, in the color video sequence C (m, n, k) The resolution of the color video frame is m×n, and the number of color video frames contained in the color video sequence C (m, n, k) is k; The same transformation process is performed on the color video sequence C (m, n, k), and the transformed color video sequence is obtained, denoted as C'(m, k, n), where the transformed color video sequence The resolution of the color video frames in the sequence C'(m,k,n) is m×k, and the number of color video frames contained in the transformed color video sequence C'(m,k,n) is n ;

③-2、将变换处理后的深度视频序列D′(m，k，n)中的第p帧深度视频帧定义为当前深度视频帧，其中，p的初始值为1，1≤p≤n；③-2. Define the pth depth video frame in the transformed depth video sequence D′(m, k, n) as the current depth video frame, where the initial value of p is 1, 1≤p≤n ;

③-3、以列为单位对当前深度视频帧进行处理，将当前深度视频帧中当前正在处理的第q列像素定义为当前列，其中，q的初始值为1，1≤q≤m；③-3. Process the current depth video frame in units of columns, define the qth column of pixels currently being processed in the current depth video frame as the current column, where the initial value of q is 1, 1≤q≤m;

③-4、根据当前列中各个像素的深度值对各个像素进行分类，得到若干个第一像素集合，其中分类的具体过程为：a1.将当前列中的第1个像素归入第s个第一像素集合中，将当前列中的第2个像素定义为当前深度像素；b1.判断当前深度像素的深度值与第s个第一像素集合中所有像素的深度值的平均值之差的绝对值是否小于设定的第一阈值，如果是，则将当前深度像素归入第s个第一像素集合中，然后执行步骤d1，否则，将当前深度像素归入第s+1个第一像素集合中，并继续执行；c1.令s′＝s+1，s＝s′；d1.将当前列中的下一个像素作为当前深度像素，并返回步骤b1继续执行，直至当前列中所有像素处理完毕，其中，s的初始值为1，s′的初始值为0；③-4. Classify each pixel according to the depth value of each pixel in the current column, and obtain several first pixel sets, wherein the specific process of classification is: a1. Classify the first pixel in the current column into the sth In the first pixel set, define the second pixel in the current column as the current depth pixel; b1. determine the difference between the depth value of the current depth pixel and the average value of the depth values of all pixels in the sth first pixel set Whether the absolute value is less than the set first threshold, if yes, classify the current depth pixel into the sth first pixel set, and then perform step d1, otherwise, classify the current depth pixel into the s+1th first pixel set in the pixel set, and continue to execute; c1. make s'=s+1, s=s'; d1. use the next pixel in the current column as the current depth pixel, and return to step b1 to continue execution until all The pixel is processed, and the initial value of s is 1, and the initial value of s′ is 0;

③-5、将变换处理后的彩色视频序列C′(m，k，n)中与当前列对应的列定义为对应列，根据对应列中各个像素的亮度分量值对各个像素进行分类，得到若干个第二像素集合，其中分类的具体过程为：a2.将对应列中的第1个像素归入第t个第二像素集合中，将对应列中的第2个像素定义为当前彩色像素；b2.判断当前彩色像素的亮度分量值与第t个第二像素集合中所有像素的亮度分量值的平均值之差的绝对值是否小于设定的第二阈值，如果是，则将当前彩色像素归入第t个第二像素集合中，然后执行步骤d2，否则，将当前彩色像素归入第t+1个第二像素集合中，并继续执行；c2.令t′＝t+1，t＝t′；d2.将对应列中的下一个像素作为当前彩色像素，并返回步骤b2继续执行，直至对应列中所有像素处理完毕，其中，t的初始值为1，t′的初始值为0；③-5, define the column corresponding to the current column in the color video sequence C' (m, k, n) after the conversion process as the corresponding column, and classify each pixel according to the brightness component value of each pixel in the corresponding column, and obtain Several second pixel sets, wherein the specific process of classification is: a2. Classify the first pixel in the corresponding column into the t-th second pixel set, and define the second pixel in the corresponding column as the current color pixel ; b2. determine whether the absolute value of the difference between the brightness component value of the current color pixel and the average value of the brightness component values of all pixels in the tth second pixel set is less than the second threshold value set, if so, then the current color The pixel is included in the tth second pixel set, and then step d2 is performed, otherwise, the current color pixel is included in the t+1th second pixel set, and continues to execute; c2. Make t′=t+1, t=t'; d2. Use the next pixel in the corresponding column as the current color pixel, and return to step b2 to continue until all pixels in the corresponding column are processed, where the initial value of t is 1, and the initial value of t' is 0;

③-6、根据第一像素集合和第二像素集合重新对当前列中的各个像素进行分类，得到若干个第三像素集合，其中重新分类的具体过程为：a3.将当前列中的第1个像素归入第v个第三像素集合中，将当前列中的第2个像素定义为当前深度像素；b3.判断当前深度像素和其前一个像素是否属于相同的第一像素集合且对应列中与当前深度像素相对应的彩色像素和其前一个像素是否属于相同的第二像素集合，如果是，则将当前深度像素归入第v个第三像素集合中，然后执行步骤d3，否则，将当前深度像素归入第v+1个第三像素集合中，并继续执行；c3.令v′＝v+1，v＝v′；d3.将当前列中的下一个像素作为当前深度像素，并返回步骤b3继续执行，直至当前列中所有像素处理完毕，其中，v的初始值为1，v′的初始值为0；③-6. Reclassify each pixel in the current column according to the first pixel set and the second pixel set to obtain several third pixel sets, wherein the specific process of reclassification is: a3. Pixels are classified into the vth third pixel set, and the second pixel in the current column is defined as the current depth pixel; b3. Determine whether the current depth pixel and its previous pixel belong to the same first pixel set and the corresponding column Whether the color pixel corresponding to the current depth pixel and its previous pixel belong to the same second pixel set, if yes, then classify the current depth pixel into the vth third pixel set, and then perform step d3, otherwise, Classify the current depth pixel into the v+1th third pixel set, and continue to execute; c3. make v'=v+1, v=v'; d3. use the next pixel in the current column as the current depth pixel , and return to step b3 to continue until all pixels in the current column are processed, wherein the initial value of v is 1, and the initial value of v' is 0;

③-7、将当前列中各个第三像素集合中各自所包含的所有像素的深度值的平均值作为该第三像素集合中所有像素的深度值，得到平滑处理后的列像素；③-7. Taking the average value of the depth values of all pixels included in each third pixel set in the current column as the depth values of all pixels in the third pixel set to obtain the smoothed column pixels;

③-8、将当前深度视频帧中的下一列作为当前列，然后返回执行步骤③-4，直至当前深度视频帧中所有列处理完毕；③-8. Use the next column in the current depth video frame as the current column, and then return to step ③-4 until all columns in the current depth video frame are processed;

③-9、将变换处理后的深度视频序列D′(m，k，n)中的下一个深度视频帧作为当前深度视频帧，然后返回执行步骤③-3，直至变换处理后的深度视频序列D′(m，k，n)中所有深度视频帧处理完毕。③-9, the next depth video frame in the transformed depth video sequence D′ (m, k, n) is used as the current depth video frame, and then returns to perform step ③-3 until the transformed depth video sequence All depth video frames in D'(m, k, n) are processed.

所述的第一阈值为10，所述的第二阈值为5。The first threshold is 10, and the second threshold is 5.

所述的步骤④中对平滑处理后的深度视频序列D″(m，k，n)作逆变换处理的具体过程为：In the described step ④, the specific process of inverse transforming the smoothed depth video sequence D″ (m, k, n) is:

④-1、依次取出平滑处理后的深度视频序列D″(m，k，n)中所有深度视频帧的第j行的m个像素的亮度分量Y，其中，1≤j≤k；4.-1, take out the luminance component Y of the m pixels of the jth row of all depth video frames in the depth video sequence D" (m, k, n) after smoothing in turn, wherein, 1≤j≤k;

④-2、把取出的n行像素的亮度分量Y按序作为预处理后的深度视频序列D″′(m，n，k)的第j帧深度视频帧的亮度分量Y，然后将预处理后的深度视频序列D″′(m，n，k)中所有深度视频帧的第一色度分量U和第二色度分量V的值均置为128，其中，1≤j≤k。④-2, take the luminance component Y of the n rows of pixels taken out in sequence as the luminance component Y of the jth frame depth video frame of the depth video sequence D"' (m, n, k) after preprocessing, and then preprocess The values of the first chrominance component U and the second chrominance component V of all depth video frames in the subsequent depth video sequence D″'(m, n, k) are set to 128, where 1≤j≤k.

与现有技术相比，本发明的优点在于通过对待处理的深度视频序列进行变换处理，然后利用对应的彩色视频序列的相关信息进行时间平滑处理，再作逆变换处理，得到预处理后的深度视频序列，经预处理后的深度视频序列在维持虚拟视点图像绘制性能的基础上，能够有效提高深度视频序列在时间上的相关性，大大提高了深度视频序列的压缩编码效率，节省的码流可达到14.43％～37.07％。Compared with the prior art, the present invention has the advantage of transforming the depth video sequence to be processed, then using the relevant information of the corresponding color video sequence to perform time smoothing processing, and then performing inverse transformation processing to obtain the preprocessed depth Video sequence, the preprocessed depth video sequence can effectively improve the time correlation of the depth video sequence on the basis of maintaining the rendering performance of the virtual viewpoint image, greatly improving the compression coding efficiency of the depth video sequence, and saving the code stream It can reach 14.43% to 37.07%.

附图说明 Description of drawings

图1为本发明的流程框图；Fig. 1 is a block flow diagram of the present invention;

图2a为“Door Flowers”测试视频序列中的第7视点对应的深度视频序列的第1帧深度视频帧；Figure 2a is the depth video frame of the first frame of the depth video sequence corresponding to the seventh viewpoint in the "Door Flowers" test video sequence;

图2b为图2a所示的深度视频帧经变换处理后得到的深度视频帧；Fig. 2b is the depth video frame obtained after the depth video frame shown in Fig. 2a is transformed;

图2c为“Door Flowers”测试视频序列中的第7视点对应的彩色视频序列的第1帧彩色视频帧；Figure 2c is the first color video frame of the color video sequence corresponding to the seventh viewpoint in the "Door Flowers" test video sequence;

图2d为图2c所示的彩色视频帧经变换处理后得到的彩色视频帧；Fig. 2 d is the color video frame obtained after the color video frame shown in Fig. 2c is transformed;

图3a为当前列的分类结果示意图；Figure 3a is a schematic diagram of the classification results of the current column;

图3b为对应列的分类结果示意图；Figure 3b is a schematic diagram of the classification results of the corresponding columns;

图3c为对图3a所示的当前列进行重新分类的结果示意图；Fig. 3c is a schematic diagram of the result of reclassifying the current column shown in Fig. 3a;

图4为图2b所示的变换处理后的深度视频帧经时间平滑处理后得到的平滑处理后的深度视频帧；Fig. 4 is the smoothed depth video frame obtained after the transformed depth video frame shown in Fig. 2b is temporally smoothed;

图5为图4所示的平滑处理后的深度视频帧作逆变换处理后得到的深度视频帧；Fig. 5 is the depth video frame obtained after the depth video frame after the smoothing processing shown in Fig. 4 is processed by inverse transformation;

图6a为Book Arrival测试序列的原始深度视频序列和预处理后的深度视频序列的编码率失真性能对比示意图；Figure 6a is a schematic diagram of the comparison of the encoding rate-distortion performance of the original depth video sequence and the preprocessed depth video sequence of the Book Arrival test sequence;

图6b为Door Flowers测试序列的原始深度视频序列和预处理后的深度视频序列的编码率失真性能对比示意图；Fig. 6b is a schematic diagram of the coding rate-distortion performance comparison between the original depth video sequence and the preprocessed depth video sequence of the Door Flowers test sequence;

图6c为Leave Laptop测试序列的原始深度视频序列和预处理后的深度视频序列的编码率失真性能对比示意图；Figure 6c is a schematic diagram of the rate-distortion performance comparison between the original depth video sequence and the preprocessed depth video sequence of the Leave Laptop test sequence;

图6d为Alt Moabit测试序列的原始深度视频序列和预处理后的深度视频序列的编码率失真性能对比示意图；Figure 6d is a schematic diagram of the comparison of the rate-distortion performance of the original depth video sequence and the preprocessed depth video sequence of the Alt Moabit test sequence;

图7a为Door Flowers测试序列的原始帧S₈T₀；Figure 7a is the original frame S ₈ T ₀ of the Door Flowers test sequence;

图7b为利用原始的视点7和10的彩色视频和对应深度视频绘制后得到的S₈T₀；Figure 7b is the S ₈ T ₀ obtained after drawing the color video and the corresponding depth video using the original viewpoints 7 and 10;

图7c为利用原始的视点7和10的彩色视频和对应预处理后的深度视频绘制后得到的S₈T₀。Fig. 7c is S ₈ T ₀ obtained after rendering using the original color videos of viewpoints 7 and 10 and the corresponding preprocessed depth videos.

具体实施方式 Detailed ways

以下结合附图实施例对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

由于多视点视频信号中的深度视频序列在时间上内容的不一致而导致了较弱的相关性，因此为了提高深度视频序列的编码效率，本发明提出了一种多视点深度视频序列的预处理方法，在对深度视频序列编码之前对其进行预处理，以增强在其时间上的一致性。Due to the inconsistency of the depth video sequence in the multi-view video signal in time, the weak correlation is caused. Therefore, in order to improve the coding efficiency of the depth video sequence, the present invention proposes a preprocessing method for the multi-view depth video sequence. , to preprocess the depth video sequence before encoding it to enhance its temporal coherence.

本发明提出的多视点深度视频序列的预处理方法的流程如图1所示，其包括以下步骤：The flow process of the preprocessing method of the multi-viewpoint depth video sequence proposed by the present invention is as shown in Figure 1, and it comprises the following steps:

①记待预处理的深度视频序列为D(m，n，k)，其中，m表示深度视频序列D(m，n，k)中的深度视频帧的水平分辨率，n表示深度视频序列D(m，n，k)中的深度视频帧的竖直分辨率，m×n表示深度视频序列D(m，n，k)中的深度视频帧的分辨率，k表示深度视频序列D(m，n，k)中包含的深度视频帧的帧数。① Record the depth video sequence to be preprocessed as D(m, n, k), where m represents the horizontal resolution of the depth video frame in the depth video sequence D(m, n, k), and n represents the depth video sequence D The vertical resolution of the depth video frame in (m, n, k), m × n represents the resolution of the depth video frame in the depth video sequence D (m, n, k), and k represents the depth video sequence D (m , n, k) is the number of depth video frames contained in the frame.

②对深度视频序列D(m，n，k)作变换处理，得到变换处理后的深度视频序列，记为D′(m，k，n)，其中，变换处理后的深度视频序列D′(m，k，n)中的深度视频帧的分辨率为m×k，变换处理后的深度视频序列D′(m，k，n)中包含的深度视频帧的帧数为n。② Transform the depth video sequence D(m, n, k) to obtain the transformed depth video sequence, denoted as D′(m, k, n), where the transformed depth video sequence D′( The resolution of the depth video frames in m, k, n) is m×k, and the number of depth video frames included in the transformed depth video sequence D′(m, k, n) is n.

在此具体实施例中，对深度视频序列D(m，n，k)作变换处理的具体过程为：In this specific embodiment, the specific process of transforming the depth video sequence D (m, n, k) is:

②-1、依次取出深度视频序列D(m，n，k)中所有深度视频帧的第i行的m个像素的亮度分量Y，其中，1≤i≤n。②-1. Sequentially extract the luminance components Y of m pixels in the i-th row of all depth video frames in the depth video sequence D(m, n, k), where 1≤i≤n.

②-2、把取出的k行像素的亮度分量Y按序作为变换处理后的深度视频序列D′(m，k，n)中的第i帧深度视频帧的亮度分量Y，然后将变换处理后的深度视频序列D′(m，k，n)中所有深度视频帧的第一色度分量U和第二色度分量V的值均置为128，其中，1≤i ≤n。②-2, take the luminance component Y of the k rows of pixels taken out in order as the luminance component Y of the i-th frame depth video frame in the depth video sequence D' (m, k, n) after the transformation process, and then transform the The values of the first chrominance component U and the second chrominance component V of all depth video frames in the subsequent depth video sequence D′(m, k, n) are set to 128, where 1≤i≤n.

图2a给出了“Door Flowers”测试视频序列中的第7视点对应的深度视频序列的第1帧深度视频帧，图2b给出了图2a所示的深度视频帧经变换处理后得到的深度视频帧，图2b中圆圈内所示的明暗部分对应着运动的部分，图2b中方框内所示的部分体现了深度视频帧D′(m，k，n)中的每列像素发生了微小的变化，这是深度视频帧D(m，n，k)在时间上不一致的体现。Figure 2a shows the depth video frame of the first frame of the depth video sequence corresponding to the 7th viewpoint in the "Door Flowers" test video sequence, and Figure 2b shows the depth obtained by transforming the depth video frame shown in Figure 2a In the video frame, the light and dark parts shown in the circles in Figure 2b correspond to the moving parts, and the parts shown in the boxes in Figure 2b reflect the tiny This is the embodiment of the temporal inconsistency of the depth video frame D(m,n,k).

③深度视频序列在时间上较弱的相关性，体现在变换处理后的深度视频序列D′(m，k，n)中所有分辨率为m×k的深度视频帧中的各列像素的深度值的细微变化，因此本发明提出需依次对变换处理后的深度视频序列D′(m，k，n)中每帧深度视频帧中的每列像素进行时间平滑处理，得到平滑处理后的深度视频序列，记为D″(m，k，n)，其中，平滑处理后的深度视频序列D″(m，k，n)中的深度视频帧的分辨率为m×k，平滑处理后的深度视频序列D″(m，k，n)中包含的深度视频帧的帧数为n。在此对变换处理后的深度视频序列D′(m，k，n)中每帧深度视频帧中的每列像素进行时间平滑处理，可以有效增强变换处理后的深度视频序列D′(m，k，n)在时间上的一致性，并且能够保留变换处理后的深度视频序列D′(m，k，n)中的运动部分。③ The weak temporal correlation of the depth video sequence is reflected in the depth of each column of pixels in the depth video frame with a resolution of m×k in the transformed depth video sequence D′(m, k, n) Therefore, the present invention proposes to perform temporal smoothing processing on each column of pixels in each frame of depth video frame in the transformed depth video sequence D′(m, k, n) sequentially to obtain the smoothed depth The video sequence is denoted as D″(m, k, n), wherein the resolution of the depth video frame in the smoothed depth video sequence D″(m, k, n) is m×k, and the smoothed The number of depth video frames contained in the depth video sequence D″ (m, k, n) is n. In the depth video sequence D′ (m, k, n) after the transformation process, each frame of the depth video frame Temporal smoothing of each column of pixels can effectively enhance the temporal consistency of the transformed depth video sequence D′(m,k,n), and can preserve the transformed depth video sequence D′(m, k, the moving part in n).

在此具体实施例中，依次对变换处理后的深度视频序列D′(m，k，n)中每帧深度视频帧中的每列像素进行时间平滑处理的具体过程为：In this specific embodiment, the specific process of performing temporal smoothing processing on each column of pixels in each frame of the depth video frame in the transformed depth video sequence D'(m, k, n) in turn is as follows:

③-1、记与待预处理的深度视频序列D(m，n，k)对应的彩色视频序列为C(m，n，k)，其中，彩色视频序列C(m，n，k)中的彩色视频帧的分辨率为m×n，彩色视频序列C(m，n，k)中包含的彩色视频帧的帧数为k，在此彩色视频序列C(m，n，k)和深度视频序列D(m，n，k)相对应，它们具有相同帧数的视频帧及彩色视频帧与深度视频帧具有相同的分辨率；然后采用与深度视频序列D(m，n，k)的变换处理相同的操作对彩色视频序列C(m，n，k)作变换处理，得到变换处理后的彩色视频序列，记为C′(m，k，n)，其中，变换处理后的彩色视频序列C′(m，k，n)中的彩色视频帧的分辨率为m×k，变换处理后的彩色视频序列C′(m，k，n)中包含的彩色视频帧的帧数为n。③-1. Note that the color video sequence corresponding to the depth video sequence D (m, n, k) to be preprocessed is C (m, n, k), wherein, in the color video sequence C (m, n, k) The resolution of the color video frame is m×n, and the number of color video frames contained in the color video sequence C(m,n,k) is k, where the color video sequence C(m,n,k) and depth The video sequence D (m, n, k) is corresponding, and they have the video frame of the same number of frames and the color video frame has the same resolution as the depth video frame; then adopt the same resolution as the depth video sequence D (m, n, k) The same transformation process is performed on the color video sequence C (m, n, k), and the transformed color video sequence is obtained, denoted as C'(m, k, n), where the transformed color video sequence The resolution of the color video frames in the sequence C'(m,k,n) is m×k, and the number of color video frames contained in the transformed color video sequence C'(m,k,n) is n .

在此具体实施例中，对彩色视频序列C(m，n，k)作变换处理的具体过程为：首先依次取出彩色视频序列C(m，n，k)中所有彩色视频帧的第i行的m个像素的亮度分量Y，其中，1≤i≤n；然后把取出的k行像素的亮度分量Y按序作为变换处理后的彩色视频序列C′(m，k，n)中的第i帧彩色视频帧的亮度分量Y，然后将变换处理后的彩色视频序列C′(m，k，n)中所有彩色视频帧的第一色度分量U和第二色度分量V的值均置为128，其中，1≤i≤n。In this specific embodiment, the specific process of transforming the color video sequence C (m, n, k) is as follows: first take out the i-th row of all color video frames in the color video sequence C (m, n, k) sequentially The luminance components Y of the m pixels in , where 1≤i≤n; then take the luminance components Y of the k rows of pixels taken out in order as the color video sequence C′(m, k, n) after the transformation process The luminance component Y of the i frame color video frame, then the values of the first chrominance component U and the second chrominance component V of all color video frames in the color video sequence C' (m, k, n) after the transformation process are equal Set to 128, where 1≤i≤n.

图2c给出了“Door Flowers”测试视频序列中的第7视点对应的彩色视频序列的第1帧彩色视频帧，图2d给出了图2c所示的彩色视频帧经变换处理后得到的彩色视频帧，图2d中圆圈内所示的明暗部分对应着运动的部分，图2d中方框内所示的部分与图2b中方框内所示的部分有明显区别，图2d中方框内所示的部分体现了彩色视频帧中的每列像素基本一致。Figure 2c shows the first color video frame of the color video sequence corresponding to the seventh viewpoint in the "Door Flowers" test video sequence, and Figure 2d shows the color video frame obtained after the color video frame shown in Figure 2c is transformed. In the video frame, the light and dark parts shown in the circles in Figure 2d correspond to the moving parts, the parts shown in the boxes in Figure 2d are obviously different from the parts shown in the boxes in Figure 2b, and the parts shown in the boxes in Figure 2d Partially reflects that each column of pixels in a color video frame is substantially consistent.

③-2、将变换处理后的深度视频序列D′(m，k，n)中的第p帧深度视频帧定义为当前深度视频帧，其中，p的初始值为1，1≤p≤n。③-2. Define the pth depth video frame in the transformed depth video sequence D′(m, k, n) as the current depth video frame, where the initial value of p is 1, 1≤p≤n .

③-3、以列为单位对当前深度视频帧进行处理，将当前深度视频帧中当前正在处理的第q列像素定义为当前列，其中，q的初始值为1，1≤q≤m。③-3. The current depth video frame is processed in units of columns, and the qth column of pixels currently being processed in the current depth video frame is defined as the current column, where the initial value of q is 1, 1≤q≤m.

③-4、根据当前列中各个像素的深度值对各个像素进行分类，得到若干个第一像素集合，其中分类的具体过程为：a1.将当前列中的第1个像素归入第s个第一像素集合中，将当前列中的第2个像素定义为当前深度像素；b1.判断当前深度像素的深度值与第s个第一像素集合中所有像素的深度值的平均值之差的绝对值是否小于设定的第一阈值，如果是，则认为当前深度像素的深度值未发生剧烈变化，认为当前深度像素与前边的像素属于同一类，因此将当前深度像素归入第s个第一像素集合中，然后执行步骤d1，否则，认为当前深度像素的深度值发生了剧烈变化，认为当前深度像素与前边的像素不属于同一类，因此将当前深度像素归入第s+1个第一像素集合中，并继续执行；c1.令s′＝s+1，s＝s′；d1.将当前列中的下一个像素作为当前深度像素，并返回步骤b1继续执行，直至当前列中所有像素处理完毕，其中，s的初始值为1，s′的初始值为0。假设当前列分类的结果如图3a所示，图3a中前3个像素属于第1个第一像素集合，中间5个像素属于第2个第一像素集合，最后2个像素属于第3个第一像素集合。③-4. Classify each pixel according to the depth value of each pixel in the current column, and obtain several first pixel sets, wherein the specific process of classification is: a1. Classify the first pixel in the current column into the sth In the first pixel set, define the second pixel in the current column as the current depth pixel; b1. determine the difference between the depth value of the current depth pixel and the average value of the depth values of all pixels in the sth first pixel set Whether the absolute value is less than the set first threshold, if so, it is considered that the depth value of the current depth pixel has not changed drastically, and the current depth pixel is considered to belong to the same category as the previous pixel, so the current depth pixel is classified as the sth In a set of pixels, then execute step d1, otherwise, it is considered that the depth value of the current depth pixel has changed drastically, and it is considered that the current depth pixel does not belong to the same class as the previous pixel, so the current depth pixel is classified into the s+1th In a pixel set, and continue to execute; c1. make s'=s+1, s=s'; d1. use the next pixel in the current column as the current depth pixel, and return to step b1 to continue until the current column All pixels are processed, and the initial value of s is 1, and the initial value of s′ is 0. Suppose the result of the current column classification is shown in Figure 3a, the first 3 pixels in Figure 3a belong to the first first pixel set, the middle 5 pixels belong to the second first pixel set, and the last 2 pixels belong to the third first pixel set A collection of pixels.

③-5、将变换处理后的彩色视频序列C′(m，k，n)中与当前列对应的列定义为对应列，根据对应列中各个像素的亮度分量值对各个像素进行分类，得到若干个第二像素集合，其中分类的具体过程为：a2.将对应列中的第1个像素归入第t个第二像素集合中，将对应列中的第2个像素定义为当前彩色像素；b2.判断当前彩色像素的亮度分量值与第t个第二像素集合中所有像素的亮度分量值的平均值之差的绝对值是否小于设定的第二阈值，如果是，则将当前彩色像素归入第t个第二像素集合中，然后执行步骤d2，否则，将当前彩色像素归入第t+1个第二像素集合中，并继续执行；c2.令t′＝t+1，t＝t′；d2.将对应列中的下一个像素作为当前彩色像素，并返回步骤b2继续执行，直至对应列中所有像素处理完毕，其中，t的初始值为1，t′的初始值为0。假设对应列分类的结果如图3b所示，图3b中前3个用a表示的像素属于第1个第二像素集合，中间3个用b表示的像素属于第2个第二像素集合，中间3个用c表示的像素属于第3个第二像素集合，最后1个用d表示的像素属于第4个第二像素集合。③-5, define the column corresponding to the current column in the color video sequence C' (m, k, n) after the conversion process as the corresponding column, and classify each pixel according to the brightness component value of each pixel in the corresponding column, and obtain Several second pixel sets, wherein the specific process of classification is: a2. Classify the first pixel in the corresponding column into the t-th second pixel set, and define the second pixel in the corresponding column as the current color pixel ; b2. determine whether the absolute value of the difference between the brightness component value of the current color pixel and the average value of the brightness component values of all pixels in the tth second pixel set is less than the second threshold value set, if so, then the current color The pixel is included in the tth second pixel set, and then step d2 is performed, otherwise, the current color pixel is included in the t+1th second pixel set, and continues to execute; c2. Make t′=t+1, t=t'; d2. Use the next pixel in the corresponding column as the current color pixel, and return to step b2 to continue until all pixels in the corresponding column are processed, where the initial value of t is 1, and the initial value of t' is 0. Assuming that the result of the corresponding column classification is shown in Figure 3b, the first three pixels denoted by a in Figure 3b belong to the first second pixel set, the middle three pixels denoted by b belong to the second second pixel set, and the middle The three pixels denoted by c belong to the third second pixel set, and the last pixel denoted by d belongs to the fourth second pixel set.

③-6、根据第一像素集合和第二像素集合重新对当前列中的各个像素进行分类，得到若干个第三像素集合，其中重新分类的具体过程为：a3.将当前列中的第1个像素归入第v个第三像素集合中，将当前列中的第2个像素定义为当前深度像素；b3.判断当前深度像素和其前一个像素是否属于相同的第一像素集合且对应列中与当前深度像素相对应的彩色像素和其前一个像素是否属于相同的第二像素集合，如果是，则将当前深度像素归入第v个第三像素集合中，然后执行步骤d3，否则，将当前深度像素归入第v+1个第三像素集合中，并继续执行；c3.令v′＝v+1，v＝v′；d3.将当前列中的下一个像素作为当前深度像素，并返回步骤b3继续执行，直至当前列中所有像素处理完毕，其中，v的初始值为1，v′的初始值为0。③-6. Reclassify each pixel in the current column according to the first pixel set and the second pixel set to obtain several third pixel sets, wherein the specific process of reclassification is: a3. Pixels are classified into the vth third pixel set, and the second pixel in the current column is defined as the current depth pixel; b3. Determine whether the current depth pixel and its previous pixel belong to the same first pixel set and the corresponding column Whether the color pixel corresponding to the current depth pixel and its previous pixel belong to the same second pixel set, if yes, then classify the current depth pixel into the vth third pixel set, and then perform step d3, otherwise, Classify the current depth pixel into the v+1th third pixel set, and continue to execute; c3. make v'=v+1, v=v'; d3. use the next pixel in the current column as the current depth pixel , and return to step b3 to continue until all pixels in the current column are processed, wherein the initial value of v is 1, and the initial value of v′ is 0.

根据图3a和图3b所示的像素集合对图3a所示的当前列进行重新分类得到的结果如图3c所示，从图3c中可以看出当前列重新被划分成了5个第三像素集合，即当前列中的第1-3个像素归为第1个第三像素集合，第4-6个像素归为第2个第三像素集合，第7-8个像素归为第3个第三像素集合，第9个像素归为第4个第三像素集合，第10个像素归为第5个第三像素集合。According to the pixel sets shown in Figure 3a and Figure 3b, the result obtained by reclassifying the current column shown in Figure 3a is shown in Figure 3c. It can be seen from Figure 3c that the current column has been re-divided into five third pixels Set, that is, the 1st-3rd pixels in the current column are classified as the 1st third pixel set, the 4th-6th pixels are classified as the 2nd third pixel set, and the 7th-8th pixels are classified as the 3rd pixel set In the third pixel set, the 9th pixel is classified into the 4th third pixel set, and the 10th pixel is classified into the 5th third pixel set.

③-7、将当前列中各个第三像素集合中各自所包含的所有像素的深度值的平均值作为该第三像素集合中所有像素的深度值，得到平滑处理后的列像素。③-7. Taking the average value of the depth values of all the pixels included in each third pixel set in the current column as the depth value of all the pixels in the third pixel set to obtain the smoothed column pixels.

③-8、将当前深度视频帧中的下一列作为当前列，然后返回执行步骤③-4，直至当前深度视频帧中所有列处理完毕。③-8. Set the next column in the current depth video frame as the current column, and then return to step ③-4 until all columns in the current depth video frame are processed.

在本发明中，在对变换处理后的深度视频序列D′(m，k，n)中的每一列像素进行分类的过程中，采用了对应变换处理后的彩色视频序列C′(m，k，n)中相应的列像素的亮度分量的主要目的是保留那些在时间上深度变化细微而彩色视频中亮度变化强烈的区域的信息，这样既能在时间对深度视频序列进行平滑，又能维持重点区域的深度信息。因此，对变换处理后的深度视频序列D′(m，k，n)中每帧深度视频帧中的每列像素进行时间平滑处理，既有利于深度视频序列的压缩编码，又有利于虚拟视点图像的绘制。In the present invention, in the process of classifying each column of pixels in the transformed depth video sequence D'(m,k,n), the corresponding transformed color video sequence C'(m,k,n) is used , n) The main purpose of the luminance component of the corresponding column pixels is to preserve the information of those areas whose depth changes are subtle in time but the luminance changes in the color video are strong, so that the depth video sequence can be smoothed in time and maintain Depth information for key areas. Therefore, temporally smoothing each column of pixels in each depth video frame in the transformed depth video sequence D′(m, k, n) is beneficial to the compression coding of the depth video sequence and to the virtual viewpoint The drawing of the image.

在此具体实施例中，第一阈值可以取值为10，第二阈值可以取值为5，其中第一阈值大于第二阈值的原因是彩色视频序列C(m，n，k)在时间上比深度视频序列D(m，n，k)更加具有一致性。In this specific embodiment, the first threshold can take a value of 10, and the second threshold can take a value of 5, wherein the reason why the first threshold is greater than the second threshold is that the color video sequence C(m, n, k) is temporally It is more consistent than the depth video sequence D(m,n,k).

图4给出了图2b所示的变换处理后的深度视频帧经时间平滑处理后得到的平滑处理后的深度视频帧，与图2b所示的变换处理后的深度视频序列D′(m，k，n)相比，除了能保持深度视频序列中运动区域的信息外，该帧在垂直方向的像素具有更强的一致性。Fig. 4 provides the smoothed depth video frame obtained after the transformed depth video frame shown in Fig. 2b is temporally smoothed, and the transformed depth video sequence D'(m, k, n), in addition to maintaining the information of the motion region in the depth video sequence, the pixels of the frame in the vertical direction have stronger consistency.

在此具体实施例中，对平滑处理后的深度视频序列D″(m，k，n)作逆变换处理的具体过程为：In this specific embodiment, the specific process of performing inverse transformation processing on the smoothed depth video sequence D″(m, k, n) is:

④-1、依次取出平滑处理后的深度视频序列D″(m，k，n)中所有深度视频帧的第j行的m个像素的亮度分量Y，其中，1≤j≤k。④-1. Sequentially extract the luminance components Y of the m pixels in the jth row of all depth video frames in the smoothed depth video sequence D″(m, k, n), where 1≤j≤k.

图5给出了图4所示的平滑处理后的深度视频帧作逆变换处理后得到的深度视频帧。FIG. 5 shows the depth video frame obtained after the smoothed depth video frame shown in FIG. 4 is inversely transformed.

为了验证本发明的预处理方法的有效性和可行性，首先选取了HHI提供的BookArrival、Door Flowers、Leave Laptop和Alt Moabit测试序列，然后利用现有的深度估计方法DERS计算出这些序列的第7个视点的深度视频，将这些深度视频作为原始深度视频序列，再利用本发明提出的预处理方法对这些序列的第7个视点的深度视频进行处理得到预处理后的深度视频序列。In order to verify the validity and feasibility of the preprocessing method of the present invention, the BookArrival, Door Flowers, Leave Laptop and Alt Moabit test sequences provided by HHI were first selected, and then the seventh-order values of these sequences were calculated by using the existing depth estimation method DERS. The depth videos of the 7th viewpoints are taken as the original depth video sequences, and then the depth videos of the 7th viewpoint of these sequences are processed by the preprocessing method proposed by the present invention to obtain the preprocessed depth video sequences.

在此将从深度视频序列的压缩效率和虚拟视点图像的绘制来衡量本发明所提出的预处理方法的性能。Here, the performance of the preprocessing method proposed by the present invention will be measured from the compression efficiency of the depth video sequence and the rendering of the virtual viewpoint image.

在深度视频序列压缩效率方面，本发明方法能节约14.43％～37.07％的码率，表1列出了上述各序列的原始深度视频序列和预处理后的深度视频序列在相同条件下(BasisQP＝22)编码的码率对比。由于本发明的预处理方法主要是增强深度视频序列在时间上的相关性，所以对B帧更加有效，B帧能节省19.16％-41.28％的码率。本发明的预处理方法的效能主要由各原始深度视频序列在时间上的不一致程度确定的。由于Alt Moabit深度视频序列在前景和背景区域存在大片的时间上不一致的区域，因此本发明的预处理方法能够大大提高该序列的编码压缩效率。图6a、图6b、图6c及图6d分别给出了BookArrival、Door Flowers、Leave Laptop和Alt Moabit测试序列的原始深度视频序列和预处理后的深度视频序列的编码率失真性能对比图，从图6a、图6b、图6c及图6d可以看出，本发明的预处理方法能够大大提高率失真性能。In terms of depth video sequence compression efficiency, the inventive method can save 14.43%～37.07% code rate, and table 1 has listed the original depth video sequence of above-mentioned each sequence and the depth video sequence after preprocessing under the same condition (BasisQP= 22) Code rate comparison of encoding. Since the preprocessing method of the present invention mainly enhances the temporal correlation of the depth video sequence, it is more effective for the B frame, and the B frame can save 19.16%-41.28% of the code rate. The effectiveness of the preprocessing method of the present invention is mainly determined by the degree of inconsistency in time of each original depth video sequence. Since the Alt Moabit depth video sequence has large temporally inconsistent regions in the foreground and background regions, the preprocessing method of the present invention can greatly improve the encoding and compression efficiency of the sequence. Fig. 6a, Fig. 6b, Fig. 6c and Fig. 6d respectively show the coding rate-distortion performance comparison charts of the original depth video sequence and the preprocessed depth video sequence of the BookArrival, Door Flowers, Leave Laptop and Alt Moabit test sequences, from Fig. 6a, 6b, 6c and 6d, it can be seen that the preprocessing method of the present invention can greatly improve the rate-distortion performance.

表1原始深度视频序列和预处理后的深度视频序列编码码率的比较Table 1 Comparison of coding bit rates between the original depth video sequence and the preprocessed depth video sequence

在虚拟视点图像绘制方面，本发明的预处理方法对虚拟视点图像的绘制没有太大的影响。图7a给出了Door Flowers测试序列的原始帧S₈T₀；图7b给出了利用原始的视点7和10的彩色视频和对应深度视频绘制后得到的S₈T₀；图7c给出了利用原始的视点7和10的彩色视频和对应预处理后的深度视频绘制后得到的S₈T₀。从主观视觉上看，图7b和图7c所示的图像的质量几乎一样。为了更清楚的体现两者的区别，在此表2列出了利用原始深度视频和预处理后的深度视频绘制的虚拟视点图像相对原始视点图像的峰值信噪比PSNR，以及两种对比情况，从表2中可以看出，亮度分量Y的PNSR几乎不变，第一色度分量U和第二色度分量V的PNSR几乎一样，有些还有略微的上升。In terms of virtual viewpoint image rendering, the preprocessing method of the present invention does not have much influence on the virtual viewpoint image rendering. Figure 7a shows the original frame S ₈ T ₀ of the Door Flowers test sequence; Figure 7b shows the S ₈ T ₀ obtained after rendering using the original color video of viewpoint 7 and 10 and the corresponding depth video; Figure 7c shows S ₈ T ₀ obtained after rendering by using the original color videos of viewpoints 7 and 10 and the corresponding preprocessed depth videos. From a subjective perspective, the quality of the images shown in Figure 7b and Figure 7c is almost the same. In order to reflect the difference between the two more clearly, Table 2 lists the peak signal-to-noise ratio (PSNR) of the virtual viewpoint image drawn using the original depth video and the preprocessed depth video relative to the original viewpoint image, and two comparison situations, It can be seen from Table 2 that the PNSR of the luma component Y is almost unchanged, and the PNSR of the first chrominance component U and the second chrominance component V are almost the same, and some have a slight increase.

表2虚拟视点图像绘制的质量比较(dB)Table 2 Quality comparison of virtual viewpoint image rendering (dB)

Claims

1. a preprocessing method of multi-viewpoint depth video sequence, multiviewpoint video signal is made up of color video sequence and the corresponding depth video sequence with described color video sequence, it is characterized in that this preprocessing method comprises the following steps:

① Record the depth video sequence to be preprocessed as D(m, n, k), where m represents the horizontal resolution of the depth video frame in the depth video sequence D(m, n, k), and n represents the depth video sequence D The vertical resolution of the depth video frame in (m, n, k), m × n represents the resolution of the depth video frame in the depth video sequence D (m, n, k), and k represents the depth video sequence D (m , n, the frame number of the depth video frame contained in k);

② Transform the depth video sequence D(m, n, k) to obtain the transformed depth video sequence, denoted as D′(m, k, n), where the transformed depth video sequence D′( m, k, the resolution of the depth video frame in n) is m × k, and the frame number of the depth video frame contained in the depth video sequence D' (m, k, n) after transformation process is n;

Here, the specific process of transforming the depth video sequence D(m, n, k) is:

②-1, sequentially take out the luminance component Y of m pixels in the i-th row of all depth video frames in the depth video sequence D (m, n, k), wherein, 1≤i≤n;

②-2, take the luminance component Y of the k rows of pixels taken out in order as the luminance component Y of the i-th frame depth video frame in the depth video sequence D' (m, k, n) after the transformation process, and then transform the The values of the first chrominance component U and the second chrominance component V of all depth video frames in the depth video sequence D' (m, k, n) after are set to 128, wherein, 1≤i≤n;

③Sequentially perform temporal smoothing processing on each column of pixels in each frame of depth video frame in the transformed depth video sequence D′(m,k,n) to obtain the smoothed depth video sequence, denoted as D″(m , k, n), wherein, the resolution of the depth video frame in the smoothed depth video sequence D″(m, k, n) is m×k, and the smoothed depth video sequence D″(m, k , the frame number of the depth video frame contained in n) is n;

Here, the specific process of performing temporal smoothing processing on each column of pixels in each frame of depth video frame in the transformed depth video sequence D′(m, k, n) in turn is as follows:

③-1. Note that the color video sequence corresponding to the depth video sequence D (m, n, k) to be preprocessed is C (m, n, k), wherein, in the color video sequence C (m, n, k) The resolution of the color video frame is m×n, and the number of color video frames contained in the color video sequence C (m, n, k) is k; The same transformation process is performed on the color video sequence C (m, n, k), and the transformed color video sequence is obtained, denoted as C'(m, k, n), where the transformed color video sequence The resolution of the color video frames in the sequence C'(m,k,n) is m×k, and the number of color video frames contained in the transformed color video sequence C'(m,k,n) is n ;

③-2. Define the pth depth video frame in the transformed depth video sequence D′(m, k, n) as the current depth video frame, where the initial value of p is 1, 1≤p≤n ;

③-3. Process the current depth video frame in units of columns, define the qth column of pixels currently being processed in the current depth video frame as the current column, where the initial value of q is 1, 1≤q≤m;

③-4. Classify each pixel according to the depth value of each pixel in the current column, and obtain several first pixel sets, wherein the specific process of classification is: a1. Classify the first pixel in the current column into the sth In the first pixel set, define the second pixel in the current column as the current depth pixel; b1. determine the difference between the depth value of the current depth pixel and the average value of the depth values of all pixels in the sth first pixel set Whether the absolute value is less than the set first threshold, if yes, classify the current depth pixel into the sth first pixel set, and then perform step d1, otherwise, classify the current depth pixel into the s+1th first pixel set in the pixel set, and continue to execute; c1. make s'=s+1, s=s'; d1. use the next pixel in the current column as the current depth pixel, and return to step b1 to continue execution until all The pixel is processed, and the initial value of s is 1, and the initial value of s′ is 0;

③-5, define the column corresponding to the current column in the color video sequence C' (m, k, n) after the conversion process as the corresponding column, and classify each pixel according to the brightness component value of each pixel in the corresponding column, and obtain Several second pixel sets, wherein the specific process of classification is: a2. Classify the first pixel in the corresponding column into the t-th second pixel set, and define the second pixel in the corresponding column as the current color pixel ; b2. determine whether the absolute value of the difference between the brightness component value of the current color pixel and the average value of the brightness component values of all pixels in the tth second pixel set is less than the second threshold value set, if so, then the current color The pixel is included in the tth second pixel set, and then step d2 is performed, otherwise, the current color pixel is included in the t+1th second pixel set, and continues to execute; c2. Make t′=t+1, t=t'; d2. Use the next pixel in the corresponding column as the current color pixel, and return to step b2 to continue until all pixels in the corresponding column are processed, where the initial value of t is 1, and the initial value of t' is 0;

③-6. Reclassify each pixel in the current column according to the first pixel set and the second pixel set to obtain several third pixel sets, wherein the specific process of reclassification is: a3. Pixels are classified into the vth third pixel set, and the second pixel in the current column is defined as the current depth pixel; b3. Determine whether the current depth pixel and its previous pixel belong to the same first pixel set and the corresponding column Whether the color pixel corresponding to the current depth pixel and its previous pixel belong to the same second pixel set, if yes, then classify the current depth pixel into the vth third pixel set, and then perform step d3, otherwise, Classify the current depth pixel into the v+1th third pixel set, and continue to execute; c3. make v'=v+1, v=v'; d3. use the next pixel in the current column as the current depth pixel , and return to step b3 to continue until all pixels in the current column are processed, wherein the initial value of v is 1, and the initial value of v' is 0;

③-7. Taking the average value of the depth values of all pixels included in each third pixel set in the current column as the depth values of all pixels in the third pixel set to obtain the smoothed column pixels;

③-8. Use the next column in the current depth video frame as the current column, and then return to step ③-4 until all columns in the current depth video frame are processed;

③-9, the next depth video frame in the transformed depth video sequence D′ (m, k, n) is used as the current depth video frame, and then returns to perform step ③-3 until the transformed depth video sequence All depth video frames in D'(m, k, n) are processed;

④ Perform inverse transformation processing on the smoothed depth video sequence D″(m, k, n) to obtain the preprocessed depth video sequence, denoted as D′″(m, n, k), where, after preprocessing The resolution of the depth video frames in the depth video sequence D′″(m, n, k) is m×n, and the depth video frames contained in the preprocessed depth video sequence D′″(m, n, k) The number of frames is k;

Here, the specific process of inverse transforming the smoothed depth video sequence D″(m, k, n) is:

4.-1, take out the luminance component Y of the m pixels of the jth row of all depth video frames in the depth video sequence D" (m, k, n) after smoothing in turn, wherein, 1≤j≤k;

④-2, take the luminance component Y of the n rows of pixels taken out in order as the luminance component Y of the jth frame depth video frame of the depth video sequence D' "(m, n, k) after preprocessing, and then preprocess The values of the first chrominance component U and the second chrominance component V of all depth video frames in the subsequent depth video sequence D′″(m, n, k) are set to 128, where 1≤j≤k.

2 . The method for preprocessing a multi-view depth video sequence according to claim 1 , wherein the first threshold is 10, and the second threshold is 5. 3 .