CN1750659A

CN1750659A - Method for forming interpolation image memory organization and fractional pixel and predicating error index calculation

Info

Publication number: CN1750659A
Application number: CN 200410076759
Authority: CN
Inventors: 罗忠; 王静; 宋彬; 常义林
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2004-09-14
Filing date: 2004-09-14
Publication date: 2006-03-22
Anticipated expiration: 2024-09-14
Also published as: CN100502511C

Abstract

The present invention relates to ^fractional pixel precision motion prediction technology in video compression coding, in order to solve the problem of inconvenient reading, low efficiency, etc. Problem, the present invention proposes a memory organization method for efficiently storing images according to the characteristics of pixels. When performing 1/2 ⁿ pixel precision motion prediction, according to the 2 ⁿ times interpolation image to be generated, the pixels in it are divided into integer positions Subsets, 1/2 ^1- position subsets, 1/2 ^2- position subsets, ..., and 1/2 ⁿ -position subsets, the final classification divides 2 ²ⁿ sub-images with the same size as the original image; then each sub-image Images form a contiguous memory area and are stored in memory. Based on this method of memory organization, the present invention also provides a method for filtering and interpolating to generate the interpolated image and a cost function ( SAD) method.

Description

Interpolation Image Memory Organization, Fractional Pixel Generation and Calculation Method of Prediction Error Index

技术领域technical field

本发明涉及视频压缩编码中的分数像素精度运动预测算法，更具体地说，涉及一种视频压缩编码中用于分数像素精度运动预测的插值图像内存组织方法、基于该插值图像内存组织方法生成4倍插值图像的各分数像素的方法、以及基于这两种方法快速计算预测误差指标SAD的方法。The present invention relates to a fractional pixel precision motion prediction algorithm in video compression coding, more specifically, to a method for organizing interpolation image memory for fractional pixel precision motion prediction in video compression coding, and based on the interpolation image memory organization method to generate 4 A method for doubling each fractional pixel of an interpolated image, and a method for quickly calculating the prediction error index SAD based on these two methods.

背景技术Background technique

目前在业界应用中居于主流的视频压缩编码标准中，不论是国际上的H.263、H.263+、H.264、以及MPEG-4，还是国内的AVS(Advanced Audio-VideoSystem，我国的先进音频视频编码系统)都基于一个共同的框架，即：图像分块+运动预测+残差图像DCT变换(也称整数变换、Hadarmard变换)+量化+熵编码。其中，利用运动图像前后帧之间的相关性，通过前帧中的区域来预测经过运动后的后帧中的对应区域，从而获得残差图像并进行量化和熵编码，充分利用了运动图像帧之间的统计相关性来消除冗余，达到数据压缩的目的。因此，运动预测是这类基于共同框架的视频压缩编码标准的核心部分，是影响整体压缩效率的最主要的因素。At present, it is the mainstream video compression coding standard in the industry, whether it is the international H.263, H.263+, H.264, and MPEG-4, or the domestic AVS (Advanced Audio-Video System, my country's advanced Audio and video coding systems) are based on a common framework, namely: image block + motion prediction + residual image DCT transform (also called integer transform, Hadarmard transform) + quantization + entropy coding. Among them, the correlation between the front and back frames of the moving image is used to predict the corresponding area in the moving frame through the area in the previous frame, so as to obtain the residual image and perform quantization and entropy coding, making full use of the moving image frame The statistical correlation between them is used to eliminate redundancy and achieve the purpose of data compression. Therefore, motion prediction is the core part of this kind of common frame-based video compression coding standard, and it is the most important factor affecting the overall compression efficiency.

运动预测的一般过程是这样的：对于当前视频帧中某个给定区域(如MB)，在参考帧(前一帧，或者多参考帧情况下的前k帧)中搜索基于某种误差准则最优的匹配区域。被预测区域相对于参考区域几何位置的变化，可以用一个二维矢量来表示，该矢量叫做运动矢量或者位移矢量。这个基于某种特定误差准则搜索最优运动矢量的过程叫做运动估计，是整个运动预测的一部分。运动预测的效率取决于被预测区域和预测区域之间的残差图像，残差越小，效率越高。进一步，运动预测的效率实际上决定于运动估计的精度，而预测精度又直接依赖于运动矢量。The general process of motion prediction is as follows: for a given region (such as MB) in the current video frame, search in the reference frame (the previous frame, or the first k frames in the case of multiple reference frames) based on some error criterion best matching area. The change of the geometric position of the predicted area relative to the reference area can be represented by a two-dimensional vector, which is called a motion vector or a displacement vector. This process of searching for the optimal motion vector based on a certain error criterion is called motion estimation and is part of the overall motion prediction. The efficiency of motion prediction depends on the residual image between the predicted area and the predicted area, the smaller the residual, the higher the efficiency. Further, the efficiency of motion prediction actually depends on the accuracy of motion estimation, and the prediction accuracy directly depends on the motion vector.

数字视频中的图像都是模拟视频在时间和空间上进行离散采样并数字化的结果，在时间上采样形成离散的各个帧，在空间上采样形成帧中的各个像素。在一个帧中，像素是对于空间连续的模拟图像按照一定的采样间隔采样得到的。因此两个相邻的像素之间的距离就是采样间隔。为了更加精确表示运动矢量，需要引入分数采样位置(Fractional Sample Position)运动预测的概念。采用分数采样位置，可以设想在两个相邻整数像素之间有分数位置像素，比如1/2像素(距离整数像素距离为1/2个采样间隔)，1/4像素(距离整数像素距离为1/4个采样间隔)，1/8像素(距离整数像素距离为1/8个采样间隔)等。事实证明，采用了分数采样位置运动预测后，视频压缩编码效率能够提高很多，比如采用1/4像素精度运动预测后，一般压缩视频的PSNR(Peak Signal to Noise Ratio，峰值信噪比)可以提高2dB。目前，H.263、H.263+采用的是1/2像素精度运动预测，H.264采用的是1/4像素精度运动预测，国内的AVS采用的是1/4像素精度运动预测。The images in digital video are the results of discrete sampling and digitization of analog video in time and space, sampling in time to form discrete frames, and sampling in space to form individual pixels in the frame. In a frame, pixels are obtained by sampling a spatially continuous analog image at a certain sampling interval. So the distance between two adjacent pixels is the sampling interval. In order to represent the motion vector more accurately, the concept of fractional sample position (Fractional Sample Position) motion prediction needs to be introduced. Using fractional sampling positions, it can be imagined that there are fractional position pixels between two adjacent integer pixels, such as 1/2 pixel (distance from integer pixel is 1/2 sampling interval), 1/4 pixel (distance from integer pixel distance is 1/4 sampling interval), 1/8 pixel (1/8 sampling interval from integer pixel distance), etc. Facts have proved that after using fractional sampling position motion prediction, the efficiency of video compression coding can be improved a lot. For example, after using 1/4 pixel precision motion prediction, the PSNR (Peak Signal to Noise Ratio) of general compressed video can be improved. 2dB. Currently, H.263 and H.263+ use 1/2 pixel precision motion prediction, H.264 uses 1/4 pixel precision motion prediction, and domestic AVS uses 1/4 pixel precision motion prediction.

分数采样位置运动预测的一般过程是：首先采用一定的整数像素运动估计算法，比如全搜索、3步法、新3步法、4步法等，获得最优的整数像素运动矢量；然后再在这个整数像素运动矢量位置周围进行1/2像素运动估计，找到最优的1/2像素运动矢量位置；如果需要进行1/4像素运动估计，则以这个最优1/2像素位置为中心，在周围进行1/4像素运动估计；同样，在获取最优1/4像素运动矢量位置后，可以进行1/8像素运动估计。The general process of fractional sampling position motion prediction is as follows: first, use a certain integer pixel motion estimation algorithm, such as full search, 3-step method, new 3-step method, 4-step method, etc., to obtain the optimal integer pixel motion vector; Perform 1/2 pixel motion estimation around this integer pixel motion vector position to find the optimal 1/2 pixel motion vector position; if 1/4 pixel motion estimation is required, then center on this optimal 1/2 pixel position, 1/4 pixel motion estimation is performed around; also, after obtaining the optimal 1/4 pixel motion vector position, 1/8 pixel motion estimation can be performed.

以H.264/AVC中规定的1/4像素精度运动预测标准实现方法为例，如图1所示，图中有阴影的圆为当前宏块位置、无阴影的圆为整数像素位置、三角形为1/2像素位置、小黑点为1/4像素位置，图中的箭头表示进行运动搜索时候搜索的路径方向。运动矢量是当前宏块相对于其参考宏块在x、y两个方向上位置的差值组成的矢量比如[2，-3]等；在H.264/AVC标准中，1/4像素精度运动预测的搜索过程可以分为三个步骤：Take the 1/4 pixel precision motion prediction standard implementation method specified in H.264/AVC as an example, as shown in Figure 1, the shaded circle in the figure is the current macroblock position, the unshaded circle is the integer pixel position, and the triangle is the 1/2 pixel position, and the small black dot is the 1/4 pixel position. The arrow in the figure indicates the direction of the search path when performing motion search. The motion vector is a vector composed of the difference between the current macroblock and its reference macroblock in the x and y directions, such as [2, -3], etc.; in the H.264/AVC standard, 1/4 pixel precision The search process for motion prediction can be divided into three steps:

1)采用某种运动估计方法，找出整数像素最佳匹配位置；1) Use a certain motion estimation method to find out the best matching position of integer pixels;

2)从整数像素最佳匹配位置及其周围的8个1/2像素位置中找出1/2像素最佳匹配位置；2) Find the best matching position of 1/2 pixel from the best matching position of integer pixel and its surrounding 8 1/2 pixel positions;

3)从1/2像素最佳匹配位置及其周围的8个1/4像素位置中找出1/4像素最佳匹配位置。3) Find the 1/4 pixel best matching position from the 1/2 pixel best matching position and the 8 1/4 pixel positions around it.

上述过程中，整数像素位置的匹配搜索是以前面某一帧的本地解码重建图像为参考图像的。而1/2像素位置和1/4像素位置的匹配搜索则要以本地解码重建图像插值后的图像为参考图像，这个参考图像的宽和高都4倍于原图像。In the above process, the matching search of the integer pixel position takes the locally decoded and reconstructed image of a previous frame as the reference image. The matching search of the 1/2 pixel position and the 1/4 pixel position needs to use the interpolated image of the locally decoded reconstructed image as a reference image, and the width and height of this reference image are 4 times that of the original image.

4倍参考图像的结构如图2所示。其中的像素分成以下几类：The structure of the 4x reference image is shown in Figure 2. The pixels in it are divided into the following categories:

1)整数像素：行、列坐标都是采样间隔整数倍的那些像素，如图2中的有阴影的圆圈像素A、B、C、D、E、F、G、H、I、J、K、L等等。1) Integer pixels: those pixels whose row and column coordinates are integer multiples of the sampling interval, such as the shaded circle pixels A, B, C, D, E, F, G, H, I, J, K in Figure 2 , L and so on.

2)1/2像素：即行、列坐标中至少有一个具有(k+1/2)d或者(k-1/2)d的形式，但是行、列坐标都不具有(k+1/4)d或者(k-1/4)d形式的那些像素，其中k为整数，d为采样间隔。如图2中所示，其中的各个1/2像素又分成两个子类：2) 1/2 pixel: that is, at least one of the row and column coordinates has the form of (k+1/2)d or (k-1/2)d, but neither row nor column coordinates have (k+1/4 )d or those pixels of the form (k-1/4)d, where k is an integer and d is the sampling interval. As shown in Figure 2, each 1/2 pixel is divided into two subclasses:

A、全1/2像素：即行、列坐标都具有(k+1/2)d或者(k-1/2)d的形式，如像素j、gg、hh等等。A. Full 1/2 pixel: that is, the row and column coordinates all have the form of (k+1/2)d or (k-1/2)d, such as pixels j, gg, hh and so on.

B、半1/2像素：即行，列坐标中只有一个具有(k+1/2)d或者(k-1/2)d的形式，例如像素b、h、m、s、aa、bb、cc、dd、ee、ff等等。B. Half 1/2 pixel: that is, only one of the row and column coordinates has the form of (k+1/2)d or (k-1/2)d, such as pixels b, h, m, s, aa, bb, cc, dd, ee, ff, etc.

3)1/4像素：即行、列坐标中至少有一个具有(k+1/4)d或者(k-1/4)d的形式的那些像素，其中k为整数，d为采样间隔。如图2中无阴影的圆圈像素a、c、d、e、f、g、i、k等等。3) 1/4 pixel: that is, those pixels in which at least one of the row and column coordinates has the form of (k+1/4)d or (k-1/4)d, where k is an integer, and d is the sampling interval. Unshaded circle pixels a, c, d, e, f, g, i, k, etc. in Figure 2.

4倍参考图像的生成是采用一个多阶段插值过程完成的。分为如下步骤：Generation of the 4x reference image is done using a multi-stage interpolation process. Divided into the following steps:

1)由整数像素通过插值生成1/2像素，其中采用的插值滤波器是一个6阶的FIR(Finite Impulse Response有限冲击响应)滤波器，其权向量是w＝[1，-5，20，20，-5，1]^T。过程如下：1) 1/2 pixels are generated by interpolation from integer pixels, wherein the interpolation filter used is a 6-order FIR (Finite Impulse Response finite impulse response) filter, and its weight vector is w=[1,-5,20, 20, -5, 1] ^T . The process is as follows:

A、由整数像素通过插值产生半1/2像素，以像素b、h为例：A. Half 1/2 pixels are generated by interpolation from integer pixels, taking pixels b and h as examples:

b₁＝(E-5*F+20*G+20*H-5*I+J)，生成中间值b₁，b ₁ =(E-5*F+20*G+20*H-5*I+J), generating an intermediate value b ₁ ,

b＝Clip((b₁+16)＞＞5)，偏移，规一化，剪切。b=Clip((b ₁ +16)>>5), Offset, Normalize, Clip.

其中，偏移是加上一个数(偏移量，可正可负)；规一化指对于一个变量除以一个正数，使得在该变量取值范围内，商的绝对值始终不大于1；剪切表示对于超过某个范围的变量，强制其值在这个范围内。比如变量x的范围是[0，18]，当x＝20时，超出了该范围，则x将被剪切到x＝18。Among them, the offset is to add a number (offset, which can be positive or negative); normalization refers to dividing a variable by a positive number so that the absolute value of the quotient is always no greater than 1 within the value range of the variable ;Cut means that for a variable that exceeds a certain range, force its value to be within this range. For example, the range of the variable x is [0, 18]. When x=20, if it exceeds this range, then x will be clipped to x=18.

因为滤波器各权值的绝对值之和为32，因此归一化就是除以32，用右移5位操作实现。剪切函数Clip把不在[0，255]范围内的数值通过剪切调整到[0，255]范围内。同样道理，可以求得h：Because the sum of the absolute values of the weights of the filter is 32, the normalization is divided by 32 and implemented by shifting right by 5 bits. The clipping function Clip adjusts the values that are not in the range of [0, 255] to the range of [0, 255] by clipping. In the same way, h can be obtained:

h₁＝(A-5*C+20*G+20*M-5*R+T)h ₁ ＝(A-5*C+20*G+20*M-5*R+T)

h＝Clip((h₁+16)＞＞5)h=Clip((h ₁ +16)>>5)

B、由半1/2像素通过插值产生全1/2像素。插值所用的滤波器仍然是上面的6阶FIR滤波器。以像素j为例：B. Generate full 1/2 pixels from half 1/2 pixels through interpolation. The filter used for interpolation is still the above 6th-order FIR filter. Take pixel j as an example:

j₁＝(bb-5gg+20*h₁+20*m₁-5*kk+cc)，生成中间值j₁ j ₁ =(bb-5gg+20*h ₁ +20*m ₁ -5*kk+cc), generating the intermediate value j ₁

j＝Clip((j₁+512)＞＞10)j=Clip((j ₁ +512)>>10)

经过以上两个子步骤，所有1/2像素都生成了。After the above two sub-steps, all 1/2 pixels are generated.

2)由整数像素和1/2像素通过插值生成1/4像素。1/4像素都是位于两个整数像素或者1/2像素之间的(水平方向、垂直方向、对角线方向)，因此可采用对于临近的两个整数像素或者1/2像素进行算术平均的方法求得。具体计算公式如下：2) Generate 1/4 pixels by interpolation from integer pixels and 1/2 pixels. 1/4 pixels are located between two integer pixels or 1/2 pixels (horizontal direction, vertical direction, diagonal direction), so the arithmetic mean of two adjacent integer pixels or 1/2 pixels can be used method to obtain. The specific calculation formula is as follows:

a＝(D+b+1)＞＞1a=(D+b+1)>>1

c＝(E+b+1)＞＞1c=(E+b+1)>>1

d＝(D+h+1)＞＞1d=(D+h+1)>>1

n＝(H+h+1)＞＞1n=(H+h+1)>>1

以上为水平方向求均值。对于对角线方向求均值的情况，计算方式如下：The above is the average value in the horizontal direction. For the case of averaging in the diagonal direction, the calculation method is as follows:

e＝(b+h+1)＞＞1e=(b+h+1)>>1

g＝(b+m+1)＞＞1g=(b+m+1)>>1

现有技术中，4倍插值图像在内存中的存放是按照自然顺序连续方式进行的，即按照图2所示模式存放。然而，将4倍插值图像按照自然顺序存放并不是最合理的模式。在插值生成4倍图像的过程中，首先生成的是整数像素(已经存在)，然后生成1/2像素，最后生成1/4像素。在生成1/2像素过程中，如果采用基于SIMD(Single Instruction Multiple Data，单指令多数据)的DSP(Digital Signal Processor，数字信号处理芯片)加速处理技术，SIMD指令需要整块读取整数像素；同样在生成1/4像素过程中，需要整块读取1/2像素和整数像素。按照这样的自然顺序组织图像内存，无法做到整块读取各类像素，因为目前的存储方法是按照自然顺序，对于像素不分类。比如从4倍插值图像左上角像素开始，依次存储起来。这样的顺序中就是：整、1/4、1/2、1/4、整、1/4、1/2、1/4、……，第一行结束后，进入第二行，重复这个顺序，直到最后一行。这样的话，在任何一个内存区域中，像素都不是按照类连续排列的。比如任何两个整数像素不连续出现，任何两个1/2像素不连续出现。更不用说一整块同类像素了。因此如果要读取所有整像素，就必须在内存中按照一定间隔(每隔3个数)读取，效率很低。In the prior art, the 4x interpolation images are stored in the memory in a continuous manner in a natural order, that is, in the mode shown in FIG. 2 . However, storing 4x interpolated images in their natural order is not the most reasonable mode. In the process of interpolating to generate a 4x image, the integer pixels (which already exist) are first generated, then 1/2 pixels are generated, and finally 1/4 pixels are generated. In the process of generating 1/2 pixels, if the DSP (Digital Signal Processor, digital signal processing chip) acceleration processing technology based on SIMD (Single Instruction Multiple Data, single instruction multiple data) is used, the SIMD instruction needs to read the integer pixels as a whole; Also in the process of generating 1/4 pixels, it is necessary to read 1/2 pixels and integer pixels as a whole. The image memory is organized in such a natural order, and it is impossible to read all kinds of pixels in one block, because the current storage method is in accordance with the natural order and does not classify the pixels. For example, starting from the pixel in the upper left corner of the 4-fold interpolated image, it is stored sequentially. This sequence is: integer, 1/4, 1/2, 1/4, integer, 1/4, 1/2, 1/4, ..., after the end of the first line, enter the second line, repeat this sequentially until the last line. In this case, in any memory area, the pixels are not arranged consecutively according to the class. For example, any two integer pixels appear discontinuously, and any two 1/2 pixels appear discontinuously. Not to mention a whole block of pixels of the same kind. Therefore, if all integer pixels are to be read, they must be read at certain intervals (every 3 numbers) in the memory, which is very inefficient.

另外，在4倍插值图像生成之后进行运动估计的过程中，在整数像素精度运动预测时，只需要把被预测宏块和4倍图像中的整数位置子集进行比较，求SAD(Summed Absolute Difference，绝对差和)。同样，进行1/2像素精度运动预测时，只需要和4倍插值图像中的1/2位置子集的部分进行比较；1/4像素精度运动预测需要比较的只是1/4位置子集的部分。因此如果采用SIMD类DSP加速技术，每次比较计算SAD，都只需要整块读取某一类具有共同属性的像素。但是在图像的自然顺序内存组织中，这些具有共同属性的像素不是连续存放的，不便于整块读取。In addition, in the process of motion estimation after the 4 times interpolation image is generated, in the integer pixel precision motion prediction, it is only necessary to compare the predicted macroblock with the integer position subset in the 4 times image, and find the SAD (Summed Absolute Difference , and the absolute difference). Similarly, when performing 1/2-pixel precision motion prediction, it only needs to be compared with the part of the 1/2 position subset in the 4-fold interpolated image; 1/4-pixel precision motion prediction needs to compare only the 1/4 position subset part. Therefore, if SIMD-like DSP acceleration technology is used, each comparison and calculation of SAD only needs to read a certain type of pixels with common attributes in one block. However, in the natural sequential memory organization of images, these pixels with common attributes are not stored consecutively, which is not convenient for whole block reading.

发明内容Contents of the invention

针对现有技术的上述缺陷，本发明要解决现有视频压缩编码技术中因插值图像在内存中按照自然顺序存放而引起的无法做到整块读取各类像素等问题，提供一种新的图像内存组织方法，使得1/2像素和1/4像素精度运动预测能够充分利用SIMD类DSP加速算法，以提高计算效率。Aiming at the above-mentioned defects of the prior art, the present invention aims to solve the problem in the existing video compression coding technology that the interpolation images are stored in the memory according to the natural order and cannot read all kinds of pixels as a whole, and provides a new The image memory organization method enables 1/2 pixel and 1/4 pixel precision motion prediction to make full use of SIMD-like DSP acceleration algorithms to improve computing efficiency.

为解决上述技术问题，本发明提供一种用于分数像素精度运动预测的插值图像内存组织方法，在进行1/2ⁿ像素精度运动预测时(其中n为自然数)，按以下步骤组织插值图像的内存：In order to solve the above-mentioned technical problems, the present invention provides a kind of interpolation image memory organization method that is used for fractional pixel precision motion prediction, when carrying out 1/2 ⁿ pixel precision motion prediction (wherein n is a natural number), organize the interpolation image according to the following steps Memory:

(1)根据要生成的2ⁿ倍插值图像，将其中的像素分成整数位置子集、1/2¹位置子集、1/2²位置子集、…、以及1/2ⁿ位置子集，所述各个子集中分别包含全体整数像素、全体1/2像素、全体1/4像素、…、以及全体1/2ⁿ像素；(1) According to the 2 ⁿ times interpolation image to be generated, divide the pixels into integer position subsets, 1/2 ¹ position subsets, 1/2 ² position subsets, ..., and 1/2 ⁿ position subsets, The respective subsets include all integer pixels, all 1/2 pixels, all 1/4 pixels, ..., and all 1/2 ⁿ pixels;

(2)以所述整数位置子集中的全体整数像素形成一个与原始图像尺寸相同的整像素子图像；(2) forming an integer pixel sub-image identical in size to the original image with all the integer pixels in the integer position subset;

按各个1/2像素与相邻整数像素之间的垂直、水平、以及对角位置关系进行分类，将所述1/2位置子集中的全体1/2像素进一步分成3个更小的子集，一一对应地构成3个与原始图像尺寸相同的1/2像素子图像；According to the vertical, horizontal, and diagonal positional relationship between each 1/2 pixel and adjacent integer pixels, all the 1/2 pixels in the 1/2 position subset are further divided into 3 smaller subsets , forming three 1/2 pixel sub-images with the same size as the original image in one-to-one correspondence;

按各个1/4像素与相邻的整数像素及1/2像素之间的垂直、水平、以及对角位置关系和距离关系进行分类，将所述1/4位置子集中的全体1/4像素进一步分成12个更小的子集，一一对应地构成12个与原始图像尺寸相同的1/4像素子图像；According to the vertical, horizontal, and diagonal positional relationship and distance relationship between each 1/4 pixel and adjacent integer pixels and 1/2 pixels, all 1/4 pixels in the 1/4 position subset Further divided into 12 smaller subsets to form 12 1/4 pixel sub-images with the same size as the original image in one-to-one correspondence;

依此类推，按各个1/2ⁿ像素与相邻的整数像素、1/2像素、1/4像素、…、1/2^(n-1)像素之间的垂直、水平、以及对角位置关系和距离关系进行分类，将所述1/2ⁿ子集中的全体1/2ⁿ像素进一步分成(2²ⁿ-2^2(n-1))个更小的子集，一一对应地构成(2²ⁿ-2^2(n-1))个与原始图像尺寸相同的1/2ⁿ像素子图像；And so on, according to the vertical, horizontal, and diagonal positions between each 1/2 ⁿ pixel and adjacent integer pixels, 1/2 pixels, 1/4 pixels, ..., 1/2 ^(n-1) pixels Classify the relationship and distance relationship, and further divide all 1/2 ⁿ pixels in the 1/2 ⁿ subset into (2 ²ⁿ -2 ^2(n-1) ) smaller subsets to form ( 2 ²ⁿ -2 ^2(n-1) ) 1/2 ⁿ -pixel sub-images of the same size as the original image;

(3)将所述各个子图像形成一个连续内存区域存储到存储器中；(3) forming a continuous memory area of each sub-image and storing it in the memory;

(4)除了整像素子图像外，对已存储的所述各个子图像进行零初始化处理。(4) Perform zero initialization processing on the stored sub-images except the integer-pixel sub-images.

在本发明所述插值图像内存组织方法的第(3)步中，可按以下三种拼接方法中的任一种将所述各个子图像形成一个连续内存区域存储到存储器中：In the step (3) of the interpolation image memory organization method of the present invention, each sub-image can be formed into a continuous memory area and stored in the memory according to any one of the following three splicing methods:

A、2²ⁿ×1拼接，即竖条拼接；A. 2 ²ⁿ ×1 splicing, that is, vertical splicing;

B、2ⁿ×2ⁿ拼接，即正方形拼接；B. 2 ⁿ × 2 ⁿ splicing, that is, square splicing;

C、1×2²ⁿ拼接，即横条形拼接。C. 1×2 ²ⁿ splicing, that is, horizontal strip splicing.

针对1/4像素精度运动预测，本发明还提供一种根据上述插值图像内存组织方法、利用DSP提供的SIMD类加速技术生成4倍插值图像的各分数像素的方法，其中包括以下步骤：For 1/4 pixel precision motion prediction, the present invention also provides a method for generating each fractional pixel of a 4-fold interpolation image according to the above-mentioned interpolation image memory organization method and using the SIMD acceleration technology provided by DSP, which includes the following steps:

(1)利用整像素子图像SP₀，通过滤波插值生成1/2像素子图像SP₄、SP₈。(1) Using the integer-pixel sub-image SP ₀ , generate 1/2-pixel sub-images SP ₄ , SP ₈ through filtering and interpolation.

(2)利用1/2像素子图像SP₅，通过滤波插值生成1/2像素子图像SP₁₂。(2) Using the 1/2-pixel sub-image SP ₅ , generate a 1/2-pixel sub-image SP ₁₂ through filtering and interpolation.

(3)利用整像素子图像SP₀和1/2像素子图像SP₄、SP₈、SP₁₂，通过水平方向滤波插值生成1/4像素子图像SP₁、SP₅、SP₉、SP₁₃。(3) Using the integer-pixel sub-image SP ₀ and the 1/2-pixel sub-images SP ₄ , SP ₈ , SP ₁₂ , generate 1/4-pixel sub-images SP ₁ , SP ₅ , SP ₉ , SP ₁₃ through horizontal filtering and interpolation.

(4)利用整像素子图像SP₀和1/2像素子图像SP₄、SP₈、SP₁₂，通过垂直方向滤波插值生成1/4像素子图像SP₂SP₆、SP₁₀、SP₁₄。(4) Using the whole-pixel sub-image SP ₀ and the 1/2-pixel sub-images SP ₄ , SP ₈ , SP ₁₂ to generate 1/4-pixel sub-images SP ₂ SP ₆ , SP ₁₀ , SP ₁₄ through vertical filtering and interpolation.

(5)利用整像素子图像SP₀和1/2像素子图像SP₄、SP₈、SP₁₂，通过+45°和-45°对角线方向滤波插值生成1/4像素子图像SP₃、SP₇、SP₁₁、SP₁₅。(5) Utilize the whole-pixel sub-image SP ₀ and 1/2-pixel sub-image SP ₄ , SP ₈ , SP ₁₂ to generate 1/4-pixel sub-image SP 3 , SP 3 , SP ₁₂ through +45° and -45° diagonal direction filter interpolation SP ₇ , SP ₁₁ , SP ₁₅ .

针对1/4像素精度运动预测，本发明还提供一种根据上述生成4倍插值图像的各分数像素的方法、利用DSP提供的SIMD类加速技术快速计算预测误差指标SAD的方法，其中按以下步骤计算当前宏块MB₀与运动估计过程中，参考帧中某个位置的参考宏块MB_r之间的预测误差指标SAD：For 1/4 pixel precision motion prediction, the present invention also provides a method for generating each fractional pixel of a 4-fold interpolation image according to the above method, and a method for quickly calculating the prediction error index SAD by using the SIMD acceleration technology provided by DSP, wherein the following steps Calculate the prediction error index SAD between the current macroblock MB ₀ and the reference macroblock MB _r at a certain position in the reference frame during the motion estimation process:

(1)在进行整像素精度运动预测时，根据MB_r的位置，从子图像SP₀中整块读取MB_r的数据，然后计算SAD。(1) When performing motion prediction with integer pixel precision, according to the position of MB _r , the data of MB _r is read from the sub-image SP ₀ as a whole, and then the SAD is calculated.

(2)在进行1/2像素精度运动预测时，根据MB_r的位置，从子图像SP₄、SP₈、SP₁₂中的某个整块读取MB_r的数据，然后计算SAD。(2) When performing 1/2 pixel precision motion prediction, according to the position of MB _r , read the data of MB _r from one of the sub-images SP ₄ , SP ₈ , SP ₁₂ as a whole, and then calculate the SAD.

(3)在进行1/4像素精度运动预测时，根据MB_r的位置，从子图像SP₁，SP₂，SP₃、SP₅、SP₇、SP₇、SP₉、SP₁₀、SP₁₁、SP₁₃、SP₁₄、SP₁₅中的某个整块读取MB_r的数据，然后计算SAD。(3) When performing 1/4 pixel precision motion prediction, according to the position of MB _r , from sub-images SP ₁ , SP ₂ , SP ₃ , SP ₅ , SP ₇ , SP ₇ , SP ₉ , SP ₁₀ , SP ₁₁ , One of SP ₁₃ , SP ₁₄ , and SP ₁₅ reads the data of MB _r as a whole, and then calculates SAD.

本发明的方法克服了现有4倍插值图像在内存中按照自然顺序存放时所带来的读取不便的问题。使得1/2像素和1/4像素精度运动预测能够充分利用SIMD类DSP加速算法以提高效率。其中，按照属性将4倍插值图像中的各个像素分成子集，每种子集再分成若干子图像，使得每个子图像能够在SIMD类指令运算中作为整块数据被读取。采用本发明的方法，可以在H.263/H.263+、H.264、MPEG-4等国际标准和AVS 1.0国家标准中，对于1/2像素和1/4像素精度运动预测过程中的插值生成4倍插值图像运算和运动估计运算，能进行有效的加速。尤其是借助SIMD类DSP加速机制的时候。因此，可以在其它条件不变的前提下，提高视频编码和解码的帧率。提高视频通信类设备比如视频会议或者可视电话的性能，或者通过采用处理能力更低的DSP达到同样的性能，来降低产品的成本。The method of the invention overcomes the problem of inconvenient reading caused by storing the existing 4-fold interpolation images in the memory according to the natural order. The 1/2 pixel and 1/4 pixel precision motion prediction can make full use of SIMD DSP acceleration algorithm to improve efficiency. Among them, each pixel in the 4-fold interpolation image is divided into subsets according to attributes, and each subset is further divided into several sub-images, so that each sub-image can be read as a whole block of data in SIMD-like instruction operations. By adopting the method of the present invention, in international standards such as H.263/H.263+, H.264, MPEG-4 and AVS 1.0 national standard, for 1/2 pixel and 1/4 pixel precision motion prediction process Interpolation generates 4 times interpolation image operation and motion estimation operation, which can effectively accelerate. Especially when using the SIMD-like DSP acceleration mechanism. Therefore, under the premise that other conditions remain unchanged, the frame rate of video encoding and decoding can be increased. Improve the performance of video communication equipment such as video conferencing or videophone, or reduce the cost of products by using DSP with lower processing power to achieve the same performance.

附图说明Description of drawings

下面将结合附图及实施例对本发明作进一步说明，附图中：The present invention will be further described below in conjunction with accompanying drawing and embodiment, in the accompanying drawing:

图1是现有H.264/AVC标准中的1/4像素精度运动预测的搜索过程；Fig. 1 is the search process of 1/4 pixel precision motion prediction in the existing H.264/AVC standard;

图2是整数像素、1/2像素和1/4像素在4倍插值图像中的相对几何位置关系；Figure 2 is the relative geometric positional relationship of integer pixels, 1/2 pixels and 1/4 pixels in the 4 times interpolation image;

图3是本发明中对4倍插值图像中的各种像素按照属性进行分类和各类的编号；Fig. 3 is that in the present invention, various pixels in the 4 times interpolation image are classified and numbered according to attributes;

图4a、图4b、图4c分别是原始图像、传统存储方法下的4倍插值图像、以及本发明存储方法下的4倍插值图像；Fig. 4a, Fig. 4b, Fig. 4c are original image, 4 times interpolation image under traditional storage method and 4 times interpolation image under storage method of the present invention respectively;

图5a、图5b、图5c分别本发明所得4倍插值图像P_4×4中的16个子图像的拼接存储方法。Fig. 5a, Fig. 5b and Fig. 5c are respectively the splicing and storing methods of 16 sub-images in the 4 times interpolation image P _4×4 obtained in the present invention.

具体实施方式Detailed ways

下面将以1/4像素精度运动预测为例对本发明进行说明。The present invention will be described below by taking 1/4 pixel precision motion prediction as an example.

对于1/4像素精度运动预测，本发明方法的关键在于一种对于4倍(即宽度和高度都是原来图像的4倍的插值图像)插值参考图像内容的重新组织和拼接方法，从而保证其在内存中连续存储。使用本发明的方法后，在通过插值形成4倍插值参考图像以及进行1/2像素、1/4像素运动估计时，可以充分利用被相继访问的像素数据在存储空间中的邻近性来显著提高计算效率。当利用DSP提供的SIMD加速功能，如Intel CPU的MMX、SSE等进行加速处理时，其性能提升尤其明显，因为这种数据空间邻近性非常适合SIMD。该方法主要针对H.264/AVC中要求的1/4像素精度运动预测的高效实现，但是其原理完全可以用于H.263/H.263+中的1/2像素精度运动预测，MPEG-4中的1/4像素精度运动预测，以及AVS标准中的1/4像素精度运动预测。For 1/4 pixel precision motion prediction, the key of the method of the present invention is a kind of reorganization and stitching method for 4 times (that is, the interpolation image whose width and height are 4 times of the original image) interpolation reference image content, thereby ensuring its stored contiguously in memory. After using the method of the present invention, when forming a 4-fold interpolation reference image by interpolation and performing 1/2 pixel and 1/4 pixel motion estimation, the proximity of successively accessed pixel data in the storage space can be fully utilized to significantly improve Computational efficiency. When using the SIMD acceleration function provided by DSP, such as Intel CPU's MMX, SSE, etc. for accelerated processing, the performance improvement is particularly obvious, because this data space proximity is very suitable for SIMD. This method is mainly aimed at the efficient realization of 1/4 pixel precision motion prediction required in H.264/AVC, but its principle can be used for 1/2 pixel precision motion prediction in H.263/H.263+, MPEG- 1/4 pixel precision motion prediction in 4, and 1/4 pixel precision motion prediction in AVS standard.

其中，根据将要生成的4倍插值图像，将其中的像素分成三个子集，如下：Among them, according to the 4 times interpolation image to be generated, the pixels in it are divided into three subsets, as follows:

整数位置子集S_IP＝{全体整数像素}，在图3中该子集的像素用实心大园表示；Integer position subset _SIP ={all integer pixels}, in Fig. 3, the pixels of this subset are represented by solid large circles;

1/2位置子集S_HP＝{全体1/2像素}，在图3中该子集的像素用实心正方形表示；1/2 position subset S _HP ={all 1/2 pixels}, the pixels of this subset are represented by solid squares in Fig. 3;

1/4位置子集S_QP＝{全体1/4像素}，在图3中该子集的像素用空心正方形表示。The 1/4 position subset S _QP ={all 1/4 pixels}, and the pixels of this subset are represented by hollow squares in FIG. 3 .

从图3中的虚线框内可以看出，针对右上角的整数像素A(编号为0)，相应会有3个1/2像素(编号为4、8、12)，并有12个1/4像素(编号为1、2、3、5、6、7、9、10、11、13、14、15)。因此，本发明中以整数位置子集S_IP中的全体像素作构成一个整像素子图像SP₀，该子图像与原图像尺寸相同。It can be seen from the dotted line box in Figure 3 that for the integer pixel A (numbered 0) in the upper right corner, there will be three 1/2 pixels (numbered 4, 8, 12) and 12 1/2 pixels 4 pixels (numbered 1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15). Therefore, in the present invention, all the pixels in the integer position subset _SIP are used to form an integer-pixel sub-image SP ₀ , and the size of the sub-image is the same as that of the original image.

再按各个1/2像素与相邻整数像素之间的垂直、水平、以及对角位置关系进行分类，将1/2位置子集S_HP中的全体1/2像素进一步分成3个子集，每一个子集构成一个与原始图像尺寸相同的1/2像素子图像，共构成3个1/2像素子图像SP₄、SP₈、SP₁₂；Then classify according to the vertical, horizontal, and diagonal positional relationship between each 1/2 pixel and adjacent integer pixels, and further divide all 1/2 pixels in the 1/2 position subset S _HP into 3 subsets, each A subset constitutes a 1/2 pixel sub-image with the same size as the original image, and constitutes three 1/2 pixel sub-images SP ₄ , SP ₈ , SP ₁₂ in total;

再按各个1/4像素与相邻整数像素及1/2像素之间的垂直、水平、以及对角位置关系和距离关系(对于1/4像素分类要考虑距离)进行分类，将1/4位置子集S_HP中的全体1/4像素进一步分成12个子集，每一个子集构成一个与原始图像尺寸相同的1/4像素子图像，共构成12个1/4像素子图像SP₁、SP₂、SP₃、SP₅、SP₆、SP₇、SP₉、SP₁₀、SP₁₁、SP₁₃、SP₁₄、SP₁₅；Then classify according to the vertical, horizontal, and diagonal positional relationship and distance relationship between each 1/4 pixel and adjacent integer pixels and 1/2 pixels (for 1/4 pixel classification, distance should be considered), and 1/4 All 1/4 pixels in the position subset S _HP are further divided into 12 subsets, and each subset constitutes a 1/4 pixel sub-image with the same size as the original image, and a total of 12 1/4-pixel sub-images SP ₁ , SP ₂ , SP ₃ , SP ₅ , SP ₆ , SP ₇ , SP ₉ , SP ₁₀ , SP ₁₁ , SP ₁₃ , SP ₁₄ , SP ₁₅ ;

因此，4倍插值图像最终可以表示成：Therefore, the 4 times interpolated image can finally be expressed as:

${P P}_{44 x x 44} = = [\begin{matrix} {SP SP}_{00},, {SP SP}_{11},, {SP SP}_{44},, {SP SP}_{55} \\ {SP SP}_{22},, {SP SP}_{33},, {SP SP}_{66},, {SP SP}_{77} \\ {SP SP}_{88},, {SP SP}_{99},, {SP SP}_{1212},, {SP SP}_{1313} \\ {SP SP}_{1010},, {SP SP}_{1111},, {SP SP}_{1414},, {SP SP}_{1515} \end{matrix}]$

在上面对于P_4×4进行按像素分类重新组织之后，生成了16个与原图像等大小的子图像。这些子图像具体在内存中的存储可以有很多方式。最简单的方法就是把16幅图像分别单独存储，但是，这种存储方法会导致计算各位置相应SAD值时发生跳跃式读取数据，如在1/2像素搜索过程中，计算最佳整数像素位置正上方的1/2像素位置对应的SAD值时，需要从第8幅图像中读取数据，但计算左上方和右上方1/2像素位置对应的SAD值时，却需要从第12幅图像中读取数据，若16幅图像分别独立存储，则第8幅图像和第12幅图像在内存中的存放位置会相距很远，在这两个位置往复访问势必会影响存取速度。因此，更为科学的方法是拼接存储，即把16幅图像拼接成一幅大的图像存储在内存中。为此，本发明提出了三种拼接方法，After the per-pixel reorganization above for P _4×4 , 16 sub-images of the same size as the original image are generated. There are many ways to store these sub-images in the memory. The easiest way is to store the 16 images separately. However, this storage method will lead to skip reading data when calculating the corresponding SAD value of each position, such as calculating the best integer pixel during the 1/2 pixel search process. When calculating the SAD value corresponding to the 1/2 pixel position directly above the position, the data needs to be read from the 8th image, but when calculating the SAD value corresponding to the upper left and upper right 1/2 pixel positions, it is necessary to read the data from the 12th image Reading data in the image, if the 16 images are stored independently, the storage locations of the 8th image and the 12th image in the memory will be far apart, and reciprocating access to these two locations will inevitably affect the access speed. Therefore, a more scientific method is splicing storage, that is, splicing 16 images into one large image and storing it in memory. For this reason, the present invention proposes three splicing methods,

所谓内存拼接，主要是用于那些对于内存中读取写入数据有按照2的整数次方倍数字节边界对次要求的DSP设计的提高读写效率的方法。如果读写的某个数据块没有按照一定字节数(比如32、64字节)的边界对齐，就需要多读或写一些数据(通常补零)来凑成边界对齐，这样一来自然会影响效率。The so-called memory splicing is mainly used to improve the reading and writing efficiency of those DSPs that have the requirements for reading and writing data in the memory according to the integer multiple of 2 byte boundaries. If a data block to be read or written is not aligned according to the boundary of a certain number of bytes (such as 32, 64 bytes), it is necessary to read or write some more data (usually padding with zeros) to make up the boundary alignment, which will naturally affect efficiency.

对于1/2像素和1/4像素精度运动预测，本发明可以给出三种满足边界对齐的拼接方法。图5a、图5b、图5c示出了对于1/4像素精度运动预测情况适用的三种内存拼接方法。该图中的每个块表示一个子图像。分别是：For 1/2 pixel and 1/4 pixel precision motion prediction, the present invention can provide three splicing methods satisfying boundary alignment. Fig. 5a, Fig. 5b and Fig. 5c show three memory splicing methods applicable to the case of 1/4 pixel precision motion prediction. Each block in this figure represents a sub-image. They are:

A、16×1拼接，即16个子图像拼接成一个竖条；A. 16×1 splicing, that is, 16 sub-images are spliced into a vertical bar;

B、4×4拼接，(即16个子图像拼接成一个正方形)；B, 4 × 4 splicing, (that is, 16 sub-images are spliced into a square);

C、1×16拼接，即16个子图像拼接成一个(横条)。C. 1×16 splicing, that is, 16 sub-images are spliced into one (horizontal bar).

对于1/2像素精度运动预测情况，同样也可以相应有三种拼接策略：For 1/2 pixel precision motion prediction, there are also three splicing strategies:

A、4×1拼接(竖条)；A. 4×1 splicing (vertical bars);

B、2×2拼接(正方形)；B. 2×2 splicing (square);

C、1×4拼接(横条)。C, 1×4 splicing (horizontal bars).

对于1/8像素，乃至更大的n，这三种拼接策略都适用，但是可能还存在更多的策略。因此对于一般的n，三种拼接策略是：For 1/8 pixel and even larger n, these three stitching strategies are applicable, but there may be more strategies. So for a general n, the three splicing strategies are:

A、2²ⁿ×1拼接，即(竖条形拼接)；A. 2 ²ⁿ ×1 splicing, that is (vertical strip splicing);

最后，除了整像素子图像外，对已存储的所述各个子图像进行零初始化处理。Finally, zero initialization processing is performed on the stored sub-images except for the integer-pixel sub-images.

由原始图像生成4倍插值图像P4×4的过程，因为P4×4结构上的便利，可以采用SIMD类DSP加速指令完成，滤波设计的卷积运算、移位运算都可以以子图像为单位整块完成。之所以采用图3所示的编号方法，是为了与现有的H.264和其它标准一致)The process of generating a 4-fold interpolation image P4×4 from the original image can be completed by using SIMD DSP acceleration instructions because of the convenience of the P4×4 structure. The convolution operation and shift operation of the filter design can be integrated in units of sub-images block complete. The reason for adopting the numbering method shown in Figure 3 is to be consistent with existing H.264 and other standards)

在运动估计中，SAD运算可以通过从某个子图像中取出和被预测宏块大小相等的子矩阵借助SIMD指令进行加速。In motion estimation, the SAD operation can be accelerated by taking out a sub-matrix with the same size as the predicted macroblock from a certain sub-image with the help of SIMD instructions.

根据上述插值图像内存组织方法，可利用DSP提供的SIMD类加速技术，按以下步骤生成4倍插值图像的各分数像素：According to the above interpolation image memory organization method, the SIMD acceleration technology provided by DSP can be used to generate each fractional pixel of the 4 times interpolation image according to the following steps:

(2)利用1/2像素子图像SP₄，通过滤波插值生成1/2像素子图像SP₁₂。(2) Using the 1/2 pixel sub-image SP ₄ , generate a 1/2 pixel sub-image SP ₁₂ through filtering and interpolation.

根据上述插值图像内存组织方法和生成4倍插值图像的各分数像素的方法，可利用DSP提供的SIMD类加速技术，按以下步骤计算当前宏块MB₀与运动估计过程中，参考帧中某个位置的参考宏块MB_r之间的预测误差指标SAD：According to the above-mentioned interpolation image memory organization method and the method of generating each fractional pixel of the 4-times interpolation image, the SIMD acceleration technology provided by DSP can be used to calculate the current macroblock MB ₀ and a certain value in the reference frame during the motion estimation process according to the following steps: Prediction error index SAD between reference macroblocks MB _r of position:

采用本发明，可以在H.263/H.263+、H.264、MPEG-4等国际标准和AVS 1.0国家标准中，对于1/2像素和1/4像素精度运动预测过程中的插值生成4倍插值图像运算和运动估计运算进行有效的加速。尤其是借助SIMD类DSP加速机制的时候。因此，可以在其它条件不变的前提下，提高视频编码和解码的帧率。提高视频通信类设备比如视频会议或者可视电话的性能，或者通过采用处理能力更低的DSP达到同样的性能，来降低产品的成本。这两种方法都能够提高产品的市场竞争力。本发明的效果是显著的，有如下实验数据可以说明本发明的效果：By adopting the present invention, in international standards such as H.263/H.263+, H.264, MPEG-4 and AVS 1.0 national standard, the interpolation generation in the motion prediction process of 1/2 pixel and 1/4 pixel precision can be 4 times interpolation image operation and motion estimation operation are effectively accelerated. Especially when using the SIMD-like DSP acceleration mechanism. Therefore, under the premise that other conditions remain unchanged, the frame rate of video encoding and decoding can be increased. Improve the performance of video communication equipment such as video conferencing or videophone, or reduce the cost of products by using DSP with lower processing power to achieve the same performance. Both of these methods can improve the market competitiveness of products. Effect of the present invention is remarkable, and following experimental data can illustrate effect of the present invention:

实验1：采用本发明方法结合MMX，SSE2对于1/4像素插值过程进行优化和加速，对于经典测试图像序列Clair、News和Foreman，结果如下表一所示：Experiment 1: Using the method of the present invention combined with MMX, SSE2 optimizes and accelerates the 1/4 pixel interpolation process. For the classic test image sequences Clair, News and Foreman, the results are shown in Table 1 below:

表一图像序列(6000帧) 无优化 MMX优化 SSE2优化 Claire 33.88秒 7.81秒 7.55秒 News 33.51秒 7.64秒 7.38秒 Foreman 33.42秒 7.43秒 7.27秒 Table I Image sequence (6000 frames) no optimization MMX optimization SSE2 optimization Claire 33.88 seconds 7.81 seconds 7.55 seconds News 33.51 seconds 7.64 seconds 7.38 seconds Foreman 33.42 seconds 7.43 seconds 7.27 seconds

实验2：采用本发明方法，对于经典测试图像序列Clair、News和Foreman进行整个1/4像素精度运动预测过程的加速优化，结果如下表二所示，表二中的数据是每秒可以完成的编码帧数：Experiment 2: Using the method of the present invention, the acceleration optimization of the entire 1/4 pixel precision motion prediction process is carried out for the classic test image sequences Clair, News and Foreman. The results are shown in Table 2 below, and the data in Table 2 can be completed per second Number of encoded frames:

表二 Table II

本发明的方法直接适用于遵循H.263/H.263+、H.264、MPEG-4国际标准和AVS国内标准的视频编码器和解码器，可实现中1/2像素及1/4像素精度运动预测。也适合于其它采用1/2像素和1/4像素精度运动预测的视频编码器和解码器，以实现1/2像素和1/4像素精度运动预测。The method of the present invention is directly applicable to video encoders and decoders following H.263/H.263+, H.264, MPEG-4 international standards and AVS domestic standards, and can realize 1/2 pixel and 1/4 pixel Precision Motion Prediction. It is also suitable for other video encoders and decoders that adopt 1/2 pixel and 1/4 pixel precision motion prediction to realize 1/2 pixel and 1/4 pixel precision motion prediction.

本发明的方法还适用于实现1/8像素精度运动预测，且任何1/8像素精度运动预测必然首先进行1/2像素和1/4像素精度的运动预测。因此1/2像素和1/4像素精度运动预测是1/8像素精度运动预测的必要组成部分和前提。也适用于任何其它(不一定遵循某种标准的)采用1/8像素精度运动预测的视频编码器和解码器的实现中1/8像素精度运动预测的实现。The method of the present invention is also suitable for realizing motion prediction with 1/8 pixel precision, and any motion prediction with 1/8 pixel precision must first perform motion prediction with 1/2 pixel precision and 1/4 pixel precision. Therefore, 1/2 pixel and 1/4 pixel precision motion prediction is a necessary component and premise of 1/8 pixel precision motion prediction. It is also applicable to the implementation of 1/8 pixel precision motion prediction in any other (not necessarily following a certain standard) video encoder and decoder implementation using 1/8 pixel precision motion prediction.

附：本专利中使用到的缩略语和关键术语Attachment: Abbreviations and key terms used in this patent

缩写英文中文Abbreviation English Chinese

AVC Audio-Video Coding 音频-视频编码AVC Audio-Video Coding Audio-Video Coding

AVS Advanced Audio-Video System (国家)先进音频视频编码系统AVS Advanced Audio-Video System (National) Advanced Audio-Video Coding System

dB deci-Bell 分贝dB deci-Bell deci-Bell

DSP Digital Signal Processor 数字信号处理芯片DSP Digital Signal Processor Digital Signal Processor

DV Displacement Vector 位移矢量DV Displacement Vector Displacement Vector

MB Macroblock 宏块MB Macroblock Macroblock

MV Motion Vector 运动矢量MV Motion Vector Motion Vector

MPEG Moving Picture Experts Group 运动图像专家组(国际标准组织)MPEG Moving Picture Experts Group Moving Picture Experts Group (International Standards Organization)

PSNR Peak Signal-to Noise Ratio 峰值信噪比PSNR Peak Signal-to Noise Ratio Peak Signal-to-Noise Ratio

SIMD Single Instruction Multiple Data 单指令多数据SIMD Single Instruction Multiple Data Single Instruction Multiple Data

MMX MultiMedia Extension 多媒体扩展指令集MMX MultiMedia Extension Multimedia Extension Instruction Set

SAD Sum of Absolute Differences 绝对差和SAD Sum of Absolute Differences Absolute difference

SSEStream SIMD Extension 单指令多数据扩展指令集SSEStream SIMD Extension Single Instruction Multiple Data Extension Instruction Set

Claims

1. An interpolation image memory organization method for fractional pixel precision motion prediction, characterized in that, when carrying out 1/2 ⁿ pixel precision motion prediction, wherein, n is a natural number, organize the memory of the interpolation image according to the following steps:

(1) According to the 2 ⁿ times interpolation image to be generated, divide the pixels into integer position subsets, 1/2 ¹ position subsets, 1/2 ² position subsets, ..., and 1/2 ⁿ position subsets, The respective subsets include all integer pixels, all 1/2 pixels, all 1/4 pixels, ..., and all 1/2 ⁿ pixels;

(2) forming an integer pixel sub-image identical in size to the original image with all the integer pixels in the integer position subset;

According to the vertical, horizontal, and diagonal positional relationship between each 1/2 pixel and adjacent integer pixels, all the 1/2 pixels in the 1/2 position subset are further divided into 3 smaller subsets , forming three 1/2 pixel sub-images with the same size as the original image in one-to-one correspondence;

According to the vertical, horizontal, and diagonal positional relationship and distance relationship between each 1/4 pixel and adjacent integer pixels and 1/2 pixels, all 1/4 pixels in the 1/4 position subset Further divided into 12 smaller subsets to form 12 1/4 pixel sub-images with the same size as the original image in one-to-one correspondence;

And so on, according to the vertical, horizontal, and diagonal positions between each 1/2 ⁿ pixel and adjacent integer pixels, 1/2 pixels, 1/4 pixels, ..., 1/2 ^(n-1) pixels Classify the relationship and distance relationship, and further divide all 1/2 ⁿ pixels in the 1/2 ⁿ subset into (2 ²ⁿ -2 ^2(n-1) ) smaller subsets to form ( 2 ²ⁿ -2 ^2(n-1) ) 1/2 ⁿ -pixel sub-images of the same size as the original image;

(3) forming a continuous memory area of each sub-image and storing it in the memory;

(4) Perform zero initialization processing on the stored sub-images except the integer-pixel sub-images.

2. The interpolation image memory organization method according to claim 1, wherein the n is equal to 1, and the interpolation image memory is organized according to the following steps when performing motion prediction with 1/2 pixel precision:

(1) According to the 2 times interpolation image to be generated, divide the pixels in it into an integer position subset and a 1/2 position subset, the integer position subset contains all integer pixels, and the 1/2 position subset contains all 1/2 pixel;

(4) Perform zero-initialization processing on the three stored 1/2-pixel sub-images except the integer-pixel sub-image.

3. The interpolation image memory organization method according to claim 1, wherein the n is equal to 2, and the interpolation image memory is organized according to the following steps when performing motion prediction with 1/4 pixel precision:

(1) According to the 4 times interpolation image to be generated, the pixels in it are divided into integer position subsets (S _HP ), 1/2 position subsets (S _HP ), and 1/4 position subsets (S _QP ), so The integer position subset (S _IP ) contains all integer pixels, the 1/2 position subset (S _HP ) contains all 1/2 pixels, and the 1/4 position subset (S _QP ) contains all 1/4 pixel;

(2) form an integer pixel sub-image (SP ₀ ) with the same size as the original image with all the integer pixels in the integer position subset (S _IP );

According to the vertical, horizontal, and diagonal positional relationship between each 1/2 pixel and adjacent integer pixels, all 1/2 pixels in the 1/2 position subset (S _HP ) are further divided into 3 The smaller subsets form three sub-images (SP ₄ , SP ₈ , SP ₁₂ ) with the same size as the original image in one-to-one correspondence;

According to the vertical, horizontal, and diagonal positional relationship and distance relationship between each 1/4 pixel and adjacent integer pixels and 1/2 pixels, all of the 1/4 position subsets (S _QP ) The 1/4 pixel is further divided into 12 smaller subsets to form 12 sub-images with the same size as the original image (SP ₁ , SP ₂ , SP ₃ , SP ₅ , SP ₆ , SP ₇ , SP ₉ , SP ₁₀ , SP ₁₁ , SP ₁₃ , SP ₁₄ , SP ₁₅ );

(4) In addition to the integer-pixel sub-images, zero-initialize the stored 3 1/2-pixel sub-images and 12 1/4-pixel sub-images.

4. The interpolation image memory organization method according to claim 1, wherein the n is equal to 3, and the interpolation image memory is organized according to the following steps when performing motion prediction with 1/8 pixel precision:

(1) According to the 8-fold interpolation image to be generated, the pixels in it are divided into integer position subsets, 1/2 position subsets, 1/4 position subsets, and 1/8 position subsets, and each subset is respectively Including all integer pixels, all 1/2 pixels, all 1/4 pixels, and all 1/8 pixels;

(2) forming a sub-image with the same size as the original image with all the integer pixels in the subset of integer positions;

According to the vertical, horizontal, and diagonal positional relationship and distance relationship between each 1/8 pixel and adjacent integer pixels, 1/2 pixels, and 1/4 pixels, all of the 1/8 subsets The 1/8 pixel is further divided into 48 subsets, forming 48 1/8 pixel sub-images with the same size as the original image in one-to-one correspondence;

(4) In addition to the integer-pixel sub-images, zero-initialize the stored 3 1/2-pixel sub-images, 12 1/4-pixel sub-images, and 48 1/8-pixel sub-images.

5. The interpolation image memory organization method according to claims 1-4, characterized in that in the step (3), the sub-images can be formed according to any one of the following three splicing methods A contiguous memory region is stored into memory:

A. 2 ²ⁿ ×1 splicing, that is, vertical splicing;

B. 2 ⁿ × 2 ⁿ splicing, that is, square splicing;

C. 1×2 ²ⁿ splicing, that is, horizontal strip splicing.

6. The interpolation image memory organization method according to claim 5, characterized in that said n=2, and in said step (3), said The individual subimages form a contiguous memory region stored in memory:

A. 16×1 splicing, that is, vertical splicing;

B. 4×4 splicing, that is, square splicing;

C. 1×16 splicing, that is, horizontal strip splicing.

7. According to the interpolation image memory organization method according to claim 6, the method for generating each fractional pixel of the 4-times interpolation image using the SIMD acceleration technology provided by the DSP, is characterized in that, comprising the following steps:

(1) Using the integer-pixel sub-image SP ₀ , generate 1/2-pixel sub-images SP ₄ , SP ₈ through filtering and interpolation.

(2) Using the 1/2 pixel sub-image SP ₄ , generate a 1/2 pixel sub-image SP ₁₂ through filtering and interpolation.

(3) Using the integer-pixel sub-image SP ₀ and the 1/2-pixel sub-images SP ₄ , SP ₈ , SP ₁₂ , generate 1/4-pixel sub-images SP ₁ , SP ₅ , SP ₉ , SP ₁₃ through horizontal filtering and interpolation.

(4) Using the whole-pixel sub-image SP ₀ and the 1/2-pixel sub-images SP ₄ , SP ₈ , SP ₁₂ to generate 1/4-pixel sub-images SP ₂ SP ₆ , SP ₁₀ , SP ₁₄ through vertical filtering and interpolation.

(5) Utilize the whole-pixel sub-image SP ₀ and 1/2-pixel sub-image SP ₄ , SP ₈ , SP ₁₂ to generate 1/4-pixel sub-image SP 3 , SP 3 , SP ₁₂ through +45° and -45° diagonal direction filter interpolation SP ₇ , SP ₁₁ , SP ₁₅ .

8. According to the method for generating each fractional pixel of a 4-fold interpolation image according to claim 7, the method for quickly calculating the prediction error index SAD by using the SIMD acceleration technology provided by the DSP is characterized in that the current macroblock MB is calculated according to the following steps ₀ and the prediction error index SAD between the reference macroblock MB _r at a certain position in the reference frame during the motion estimation process:

(1) When performing motion prediction with integer pixel precision, according to the position of MB _r , the data of MB _r is read from the sub-image SP ₀ as a whole, and then the SAD is calculated.

(2) When performing 1/2 pixel precision motion prediction, according to the position of MB _r , read the data of MB _r from one of the sub-images SP ₄ , SP ₈ , SP ₁₂ as a whole, and then calculate the SAD.

(3) When performing 1/4 pixel precision motion prediction, according to the position of MBr, from sub-images SP ₁ , SP ₂ , SP ₃ , SP ₅ , SP ₇ , SP ₇ , SP ₉ , SP ₁₀ , SP ₁₁ , SP _13. One of SP ₁₄ and SP ₁₅ reads the data of MB _r as a whole, and then calculates SAD.