CN101815219A - Mobile evaluation method for real-time embedded multimedia design - Google Patents
Mobile evaluation method for real-time embedded multimedia design Download PDFInfo
- Publication number
- CN101815219A CN101815219A CN 200910004964 CN200910004964A CN101815219A CN 101815219 A CN101815219 A CN 101815219A CN 200910004964 CN200910004964 CN 200910004964 CN 200910004964 A CN200910004964 A CN 200910004964A CN 101815219 A CN101815219 A CN 101815219A
- Authority
- CN
- China
- Prior art keywords
- block
- present
- motion
- sad
- search window
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
技术领域technical field
本发明是有关于一种用以执行移动评估的方法,特别是指一种可减少存储器容量和频宽的移动评估执行方法。The present invention relates to a method for performing mobility assessment, in particular to a method for performing mobility assessment that can reduce memory capacity and bandwidth.
背景技术Background technique
随着多媒体技术的应用越来越受欢迎,视频压缩技术的要求也越来越重要。许多视频压缩技术标准纷纷被提出,目前主流规格有MPEG-4和H.264/AVC。这些标准的基本原理主要为去除图像数据中多余(redundancy)的数据,以降低图像的储存空间或图像的传输量。移动评估(MotionEstimation)为视频编码中相当重要的一部分,其利用连续画面间的相似性来去除数据在时间上的重复性(temporal redundancy),而达到数据压缩的目的。As the application of multimedia technology becomes more and more popular, the requirement of video compression technology becomes more and more important. Many video compression technical standards have been proposed one after another, and the current mainstream specifications include MPEG-4 and H.264/AVC. The basic principles of these standards are mainly to remove redundant data in image data, so as to reduce image storage space or image transmission volume. Motion Estimation is a very important part of video coding, which uses the similarity between consecutive pictures to remove the temporal redundancy of data, so as to achieve the purpose of data compression.
图1为移动评估中经常采取的区块比对算法的示意图。首先把画面大小为W×H的目前画面(current frame)100分割成区块大小为NxN的多个区块。接着,在参考画面(reference frame)110(例如前一张画面或下一张画面)中设定大小为(N+SRH-1)×(N+SRV-1)的搜寻窗口(search window)112,并在搜寻窗口112中找到与目前画面100中一个目前区块(current block)104最相似的区块114。接着,计算出两个区块104及114间的差值及移动向量120,通过只传递差值及移动向量120来去除重复的数据,这个步骤就是移动评估。换言之,移动评估的目的为找出目前画面中每一区块的移动向量及误差来代表目前画面。然而,因为移动评估需要比对许多候选区块,此高运算量将会导致存储器频宽大幅增加。Fig. 1 is a schematic diagram of a block comparison algorithm often used in mobile evaluation. First, a
图2显示视频编码系统200的硬件架构,其中参考画面及目前画面是储存于外部存储器220,而移动评估所需的数据则通过外部总线230加载内部存储器212供计算引擎(如嵌入式处理器)214使用。因此,当执行移动评估时,为了进行数据比对运算,在参考画面的搜寻窗口中所需的候选区块数据将通过外部总线230在外部存储器220及内部存储器212间转移,而大幅增加存储器频宽。一般而言,搜寻窗口112的大小是根据画面分辨率及/或压缩规格等标准而定。搜寻窗口112越大,需加载内部存储器的数据量也越多,所需存储器频宽也越大。因此,需要提供一种可解决存储器频宽需求过高的移动评估执行方法。FIG. 2 shows the hardware architecture of the video encoding system 200, wherein the reference frame and the current frame are stored in the external memory 220, and the data required for motion evaluation are loaded into the
发明内容Contents of the invention
鉴于先前技术所存在的问题,本发明针对方形搜寻(square search)算法提供了一种适用于MPEG-4和H.264/AVC的低功率及高效能的视频编码方法,可大幅度减少存储器容量和频宽,进而降低硬件成本、减少电量使用、并增快执行速度。In view of the existing problems in the prior art, the present invention provides a low-power and high-efficiency video coding method suitable for MPEG-4 and H.264/AVC for the square search (square search) algorithm, which can greatly reduce memory capacity and bandwidth, thereby reducing hardware costs, reducing power usage, and increasing execution speed.
根据本发明的一方面,提供了一种用以执行移动评估的方法,包含以下步骤:于目前画面中选定目前区块;取得位于目前区块周围的多个邻近区块的移动向量及残值数据;根据多个邻近区块的残值数据设定预定临界值;比对目前区块与在参考画面中的初始参考区块而得到初始比对结果,并比较预定临界值与初始比对结果;若初始比对结果大于预定临界值,则根据多个邻近区块的移动向量决定目前区块的预测移动向量;以及在对应预测移动向量的搜寻窗口范围中进行区块比对,以寻找与目前区块相匹配的对应参考区块。According to one aspect of the present invention, a method for performing motion assessment is provided, comprising the following steps: selecting a current block in the current frame; obtaining motion vectors and residues of a plurality of neighboring blocks located around the current block Value data; set a predetermined critical value according to the residual value data of a plurality of adjacent blocks; compare the current block with the initial reference block in the reference frame to obtain an initial comparison result, and compare the predetermined critical value with the initial comparison Result; if the initial comparison result is greater than the predetermined critical value, the predicted motion vector of the current block is determined according to the motion vectors of multiple adjacent blocks; and the block comparison is performed in the search window corresponding to the predicted motion vector to find The corresponding reference block that matches the current block.
根据本发明的另一方面,提供了一种计算机可读取媒体,用以储存程序指令,其中当程序指令执行于计算装置上时,将使计算装置执行上述的方法。According to another aspect of the present invention, a computer-readable medium is provided for storing program instructions, wherein when the program instructions are executed on a computing device, the computing device will execute the above method.
本发明的其它方面,部分将在后续说明中陈述,而部分可由说明中轻易得知,或可由本发明的实施例而得知。本发明的各方面将可利用所附的申请专利范围中所特别指出的元件及组合而理解并达成。需了解,前述的发明内容及下列详细说明均仅作举例之用,并非用以限制本发明。Other aspects of the present invention will partly be stated in the subsequent description, and partly can be easily understood from the description, or can be known from the embodiments of the present invention. Aspects of the invention will be understood and achieved by use of the elements and combinations particularly pointed out in the appended claims. It should be understood that the foregoing summary of the invention and the following detailed description are for illustrative purposes only, and are not intended to limit the present invention.
附图说明Description of drawings
图式是与本说明书结合并构成其一部分,用以说明本发明的实施例,且连同说明书用以解释本发明的原理。在此所述的实施例是本发明的较佳实施例,然而,必须了解本发明并不限于所示的配置及元件,其中:The drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principle of the invention. The embodiments described herein are preferred embodiments of the invention, however, it must be understood that the invention is not limited to the arrangements and elements shown, wherein:
图1为使用区块比对算法进行移动评估的示意图;FIG. 1 is a schematic diagram of mobile evaluation using block comparison algorithm;
图2显示一视频编码系统的硬件架构;Fig. 2 shows the hardware architecture of a video encoding system;
图3为本发明一实施例的以方形搜寻算法执行移动评估的示意图;FIG. 3 is a schematic diagram of performing motion assessment with a square search algorithm according to an embodiment of the present invention;
图4为描述使用方形搜寻进行移动评估的示意图;Figure 4 is a schematic diagram describing the use of square search for mobile assessment;
图5显示使用光栅扫描的一范例;Figure 5 shows an example using raster scanning;
图6显示针对参考画面的等级A到D四种数据重复利用架构;以及FIG. 6 shows four data reuse architectures for levels A to D of reference pictures; and
图7显示本发明一实施例的执行移动评估的方法流程图。FIG. 7 shows a flowchart of a method for performing mobility assessment according to an embodiment of the present invention.
[主要元件标号说明][Description of main component labels]
100 目前画面100 Current Screen
104 目前区块104 Current block
110 参考画面110 Reference screen
112 搜寻窗口112 Search window
114 区块114 block
212 内部存储器212 Internal memory
214 计算引擎214 Computing Engine
220 外部存储器220 External memory
230 总线230 bus
300 目前画面300 Current screen
302 目前区块302 Current block
310 参考画面310 Reference screen
312、314 区块312, 314 Blocks
320 搜寻窗口320 Search Window
402、404、412、414、416 区块402, 404, 412, 414, 416 blocks
422、424、426、428、430 区块422, 424, 426, 428, 430 blocks
405 方形图形405 Square Graphics
500 画面500 screens
510、511、512、513、514 区块510, 511, 512, 513, 514 blocks
610、620 搜寻窗口610, 620 Search window
612、614 区块612, 614 Blocks
622、624 区块排622, 624 Block row
630、640 参考画面630, 640 Reference screen
632、634 搜寻窗口632, 634 Search window
642、644 搜寻窗口排642, 644 Search window row
具体实施方式Detailed ways
本发明针对方形搜寻(square search)算法并配合数据重复利用架构提出可有效降低存储器频宽且减少内部存储器(on-chip memory)需求的动态评估方法,其根据与邻近区块间在空间上的相依性,动态地调整搜寻窗口的大小,取代需加载整块搜寻窗口的已知的动态评估方法。为了使本发明的叙述更加详尽与完备,可参照下列描述并配合图3至图7的图式。然以下实施例中所述的装置、元件及方法步骤,仅用以说明本发明,并非用以限制本发明的范围。The present invention proposes a dynamic evaluation method that can effectively reduce memory bandwidth and reduce internal memory (on-chip memory) requirements for the square search algorithm and cooperate with the data reuse architecture. Dependency, dynamically resizes the search window, replacing known dynamic evaluation methods that require loading the entire search window. In order to make the description of the present invention more detailed and complete, reference may be made to the following description together with the diagrams of FIGS. 3 to 7 . However, the devices, components and method steps described in the following embodiments are only used to illustrate the present invention, and are not intended to limit the scope of the present invention.
图3为本发明一实施例的以方形搜寻算法执行移动评估的示意图,其针对目前画面300中大小为N×N的目前区块302,在参考画面310中对应目前区块302的位置周围框出一个搜寻窗口320,以在搜寻窗口320中找出与目前区块302最相似的区块。在此实施例中采用的比对方法为计算目前区块302与搜寻窗口320中的各候选区块的SAD值,其计算方式如下:FIG. 3 is a schematic diagram of performing motion assessment using a square search algorithm according to an embodiment of the present invention. For a current block 302 with a size of N×N in the
Cij代表目前区块,Rij代表一候选区块。换言之,把目前区块中每一像素的强度与一候选区块中每一像素的强度相减,再把所得到的N×N个差值的绝对值相加便可得到SAD值。SAD值越小,代表两个区块越相似。但值得注意的是,在此实施例中,虽是以SAD值作为判断与目前区块302的相似程度,惟方式不限于此,其它比对方式,如均方误差(mean square error)或平均绝对误差(mean absolute error)等亦适用于本发明。Cij represents the current block, and Rij represents a candidate block. In other words, the intensity of each pixel in the current block is subtracted from the intensity of each pixel in a candidate block, and then the absolute values of the resulting N×N differences are added to obtain the SAD value. The smaller the SAD value, the more similar the two blocks are. But it is worth noting that in this embodiment, although the SAD value is used as the judgment of the similarity with the current block 302, the method is not limited to this, and other comparison methods, such as mean square error (mean square error) or average Absolute error (mean absolute error) and the like are also applicable to the present invention.
图3所示的实施例是使用方形搜寻的方式从参考画面310中找出与目前区块302最相似的区块,其中方形搜寻算法是在移动评估过程中,在搜寻窗口范围内,使用方形图形进行比对。图4为描述使用方形搜寻进行移动评估的示意图。首先,在步骤1中,在参考画面中以对应目前区块位置的区块402为起始点,在以区块402为中心的由9个区块所构成的方形图形405内找到最相符的区块(即SAD值最小的区块),在此例中假设最相符区块为区块404。接着,在步骤2中,再以区块404为中心点,使用方形图案继续找寻最相符的区块,此时需载入新增区块412、414、及416的数据。假设步骤2中最相符区块为区块416,则接着在步骤3中,以区块416为中心,载入新增的区块422、424、426、428、及430并重复上述比对操作。假设步骤3中最相符区块为区块428,则接着在步骤4中,以区块428为中心重复上述操作。若步骤4中最相符的区块位在方形图形的中心点,即若最相符区块为区块428,则可停止对此目前区块的方形搜寻。The embodiment shown in FIG. 3 uses a square search method to find the block most similar to the current block 302 from the
回到图3,在此实施例中,首先从对应区块302位置的区块312开始比对,并在以区块312为中心的由9个区块所构成的区域中寻找最相符的区块(即SAD最小的区块)。举例来说,可使用如图3中箭头所示路径依序针对9个区块进行比对。假设9个区块中,区块314具有最小的SAD值,则以区块314为中心,重复上述步骤,直到最相符区块在所比对的9个区块的中心点为止。Returning to FIG. 3 , in this embodiment, the comparison is first started from the
目前画面中的所有区块皆会进行上述的移动评估程序,以在参考画面中分别找出所对应的最相近的区块,而执行移动评估的顺序将影响某一特定区块在执行移动评估时,其周围的哪些区块已经执行过移动评估。举例来说,图5显示使用光栅(raster)扫描的一范例,在此范例中,从左到右、从上到下扫描画面500中的所有区块。因此,当要针对某一区块(如区块510)做移动评估时,其左方(511)、左上方(512)、上方(513)、及右上方(514)的区块皆已经进行过移动评估的程序,即这些邻近区块的移动向量及其对应的SAD值皆为已知。通过所获得的邻近区块相关数据配合与邻近区块之间的空间相关性(spatial correlation),可预测目前区块的移动向量以动态地调整搜寻窗口范围,亦可用以设定目前区块的SAD临界值。All the blocks in the current frame will go through the above-mentioned motion evaluation process to find the corresponding closest block in the reference frame, and the order in which the motion evaluation is performed will affect the motion evaluation of a specific block When , which surrounding blocks have already performed mobile evaluation. For example, FIG. 5 shows an example of using raster scanning. In this example, all the blocks in the frame 500 are scanned from left to right and from top to bottom. Therefore, when movement evaluation is to be performed on a certain block (such as block 510), the blocks on its left (511), upper left (512), upper (513), and upper right (514) have all been performed Through the procedure of motion evaluation, the motion vectors of these adjacent blocks and their corresponding SAD values are known. By combining the obtained adjacent block related data with the spatial correlation between the adjacent blocks, the movement vector of the current block can be predicted to dynamically adjust the search window range, and it can also be used to set the current block SAD threshold.
以图3所示的实施例为例,目前区块302的预测搜寻窗口范围及预定临界值皆可根据邻近区块的比对结果而获得,描述如下。首先,取得邻近区块比对后所得的移动向量及比对数据,其中比对数据即为其所搜寻到的匹配区块的SAD值(以下称残值数据(residual data)),接着设定目前区块的预定临界值如下:Taking the embodiment shown in FIG. 3 as an example, the predicted search window range and predetermined threshold of the current block 302 can be obtained according to the comparison results of adjacent blocks, as described below. First, obtain the motion vector and comparison data obtained after comparing adjacent blocks, wherein the comparison data is the SAD value (hereinafter referred to as residual data) of the matching block it searches for, and then set The predetermined threshold of the current block is as follows:
αn=(2×LEFTSAD+2×TOPSAD+TOP-RIGHTSAD+TOP-LEFTSAD+ε)/6α n =(2×LEFT SAD +2×TOP SAD +TOP-RIGHT SAD +TOP-LEFT SAD +ε)/6
其中LEFTSAD代表目前区块左边区块的残值数据,TOPSAD代表目前区块上边区块的残值数据,TOP-RIGHTSAD代表目前区块右上角区块的残值数据,TOP-LEFTSAD代表目前区块左上角区块的残值数据,ε代表目前区块的constant factor。ε是可进行微调的补偿系数,其选择为经验法则的应用,设计者可依实际应用进行调整。在此实施例中,考虑到区块间距离的远近,将左边及上边区块赋予2倍的权重,然在其它实施例中,各邻近区块的权重可依实际应用而做调整。需注意的是,本发明并不限于使用如图5所示的光栅扫描顺序,其它像是Z字形(zigzag)扫描顺序也适用于本发明,但需注意的是,不同的扫描顺序将影响某一区块可获得哪些邻近区块的移动评估结果作为预测搜寻窗口范围之用。Among them, LEFT SAD represents the residual value data of the block on the left side of the current block, TOP SAD represents the residual value data of the upper block of the current block, TOP-RIGHT SAD represents the residual value data of the upper right block of the current block, and TOP-LEFT SAD Represents the residual value data of the block in the upper left corner of the current block, and ε represents the constant factor of the current block. ε is a compensation coefficient that can be fine-tuned, and its selection is the application of empirical rules, and the designer can adjust it according to the actual application. In this embodiment, considering the distance between the blocks, the weights of the left and upper blocks are doubled. However, in other embodiments, the weights of adjacent blocks can be adjusted according to actual applications. It should be noted that the present invention is not limited to the raster scanning order shown in FIG. The motion evaluation results of neighboring blocks that can be obtained for a block are used to predict the range of the search window.
在图3的实施例中,当针对区块302进行移动评估时,首先先加载参考画面310中对应区块302的位置的区块312到内部存储器中,接着比对区块302及区块312并计算其SAD值。若SAD值小于预定临界值αn,即可结束区块302的移动评估(移动向量为(0,0)),进行下一区块的移动评估。如此一来,在静态的影片时,存储器频宽(Memory Bandwidth)可以降低到只需约11%。In the embodiment of FIG. 3 , when performing motion evaluation on the block 302 , the
若区块302及区块312之间的SAD值大于预定临界值αn,则接着加载所预测的搜寻窗口范围以进行后续的比对。搜寻窗口范围的预测可利用所获得的邻近区块相关数据配合与邻近区块之间的空间相关性(spatialcorrelation)来决定。在此实施例中,利用目前区块的左上角区块1、上边区块2、右上角区块3和左边区块4的移动向量,预测出目前区块的移动向量。首先,定义目前区块的中心点对应坐标为(0,0),而目前区块的左上角区块1、上边区块2、右上角区块3和左边区块4的中心点对应坐标分别为(-16,16),(0,16),(16,16),(-16,0)。接着,根据最小平方法,利用左上角区块1、上边区块2、右上角区块3和左边区块4的移动向量和相关坐标来求取最适的回归平面(regression plane):z=c-ax-by,以预测目前区块的移动向量。将四个邻近区块的坐标及移动向量代入,可得:If the SAD value between the block 302 and the
分别取a、b、c的偏微分:Take the partial differentials of a, b, and c respectively:
由以上3个式子,可得到a、b、c数值如下:From the above three formulas, the values of a, b, and c can be obtained as follows:
a=1/32(MV1-MV3);a=1/32(MV 1 -MV 3 );
b=1/96(-5MV1-2MV2+MV3+6MV4);b=1/96(-5MV 1 -2MV 2 +MV 3 +6MV 4 );
c=1/2(-MV1+MV3+2MV4);c=1/2(-MV 1 +MV 3 +2MV 4 );
将a、b、c数值及目前区块坐标(0,0)代入,即可预测目前区块移动向量为:Substituting the values of a, b, and c and the current block coordinates (0, 0), the current block movement vector can be predicted as:
MV=1/2(-MV1+MV3+2MV4)MV=1/2(-MV 1 +MV 3 +2MV 4 )
依照所预测的目前区块的移动向量和其所对应的方形图形区域即可预测搜寻窗口范围,如此可以只下载部分所需用到的数据,避免下载已知整块搜寻窗口的数据。加载动态调整后的搜寻窗口后,在搜寻窗口内以上述方形搜寻的方式进行比对。综上述,本发明一开始只加载所处理区块的原始位置所对应的参考画面的数据,比对之后才动态地调整需要加载多少数据量至内部存储器,即搜寻窗口的大小是可动态决定的。因此,本发明可减少所需加载内部存储器的数据量,不但可降低数据传输的时间及消耗功率,也可减少所需的内部存储器尺寸而降低硬件成本。The range of the search window can be predicted according to the predicted movement vector of the current block and its corresponding square graphic area, so that only part of the required data can be downloaded, avoiding downloading the data of the known entire search window. After the dynamically adjusted search window is loaded, the comparison is performed in the above-mentioned square search method within the search window. To sum up, the present invention only loads the data of the reference picture corresponding to the original position of the processed block at the beginning, and then dynamically adjusts how much data needs to be loaded into the internal memory after comparison, that is, the size of the search window can be dynamically determined . Therefore, the present invention can reduce the amount of data to be loaded into the internal memory, not only reduce the time and power consumption of data transmission, but also reduce the size of the required internal memory and reduce hardware costs.
除了使用数据预测的方法,在存储器管理上,本发明亦应用数据重复利用架构,通过将会重复使用到的数据暂存于内部存储器中,而降低存储器存取及数据转移的次数。换言之,在分析数据的重复利用性之后,通过加入内部存储器而避免重复存取某些数据,进而降低存储器频宽需求。对于数据重复利用架构的相关描述,可参考由D.X.Li等人于IEEE Trans.ConsumerElectron.,vol.53,no.3,pp.1053-1060,Aug.2007中所发表的“具有最小存储器频宽的H.264/AVC整体移动评估的架构设计(Architecture Design forH.264/AVC Integer Motion Estimation with Minimum Memory Bandwidth)”、由J.C.Tuan等人于IEEE Trans.Circuits Syst.Video Technol.,vol.12,no.1,pp.61-72,Jan.2002中所发表的“完全搜寻区块匹配VLSI架构的数据重复利用及存储器频宽分析(On the data reuse and memory bandwidth analysis forfull-search block-matching VLSI architecture)”、由C.Y.Chen等人于IEEETrans.Circuits Syst.Video Technol.,vol.16,no.4,pp.553-558,Apr.2006中所发表的“用于具有对应编码顺序的移动评估的等级C+数据重复使用架构(LevelC+data reuse scheme for motion estimation with corresponding coding orders)”、以及、由T.C.Chen等人于IEEE Trans.Circuits Syst.Video Technol.,vol.17,no.2,pp.242-247,Feb.2007中所发表的“H.264/AVC中的多重参考画面移动评估的单一参考画面多重目前宏区块架构(Single Reference Frame Multiple CurrentMacroblocks Scheme for Multiple Reference Frame Motion Estimation inH.264/AVC)”,其上内容将并入本文作为参考。In addition to using the method of data prediction, the present invention also applies a data reuse framework in memory management, and reduces the times of memory access and data transfer by temporarily storing reused data in the internal memory. In other words, after analyzing the reusability of data, some data is avoided to be repeatedly accessed by adding an internal memory, thereby reducing memory bandwidth requirements. For the relevant description of the data reuse architecture, please refer to "Having Minimum Memory Bandwidth" published by D.X.Li et al. in IEEE Trans.Consumer Electron., vol.53, no.3, pp.1053-1060, Aug.2007 Architecture Design for H.264/AVC Integer Motion Estimation with Minimum Memory Bandwidth (Architecture Design for H.264/AVC Integer Motion Estimation with Minimum Memory Bandwidth)", by J.C.Tuan et al. in IEEE Trans.Circuits Syst.Video Technol., vol.12, No.1, pp.61-72, "On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture" published in Jan.2002 architecture)", published by C.Y.Chen et al. in IEEETrans.Circuits Syst.Video Technol., vol.16, no.4, pp.553-558, Apr.2006 "for mobile evaluation with corresponding coding order Level C+ data reuse scheme (LevelC+data reuse scheme for motion estimation with corresponding coding orders)", and, by T.C.Chen et al. in IEEE Trans.Circuits Syst.Video Technol., vol.17, no.2, pp .242-247, "Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference Frame Motion Estimation in H.264/AVC published in Feb.2007 "Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference Frame Motion Estimation inH. 264/AVC)", the contents of which are incorporated herein by reference.
数据重复利用架构的效能可由以下两个因素来评估:内部存储器的尺寸及冗余存取参数Ra,其中内部存储器可用以表示针对数据重复使用而暂存参考数据所需的存储器大小,冗余存取参数Ra则可用以评估外部存储器频宽,其定义如下:The performance of the data reuse architecture can be evaluated by the following two factors: the size of the internal memory and the redundant access parameter Ra, where the internal memory can be used to represent the memory size required to temporarily store reference data for data reuse, and the redundant memory The parameter Ra can be used to evaluate the bandwidth of the external memory, which is defined as follows:
数据重复利用的程度越低,Ra值越大,且需要越多的存储器频宽,反之,数据重复利用的程度越高,Ra值越小,且所需的存储器频宽越少。总存储器频宽BW可表示如下:The lower the degree of data reuse, the larger the Ra value, and requires more memory bandwidth. Conversely, the higher the degree of data reuse, the smaller the Ra value, and requires less memory bandwidth. The total memory bandwidth BW can be expressed as follows:
BW=f×W×H×(Ra目前画面)+f×W×H×(Ra参考画面)BW=f×W×H×(Ra current picture )+f×W×H×(Ra reference picture )
其中f为画面更新速率,W为画面宽度,H为画面宽度。Where f is the picture update rate, W is the picture width, and H is the picture width.
一般来说,存储器频宽是取决于画面更新速率(frame rate)、画面大小、搜寻窗口大小、及Ra值等,而针对特定的视频压缩应用,画面更新速率及画面大小通常为固定值,因此本发明通过选择Ra值较小的数据重复利用架构并使用数据预测方法而减小搜寻窗口的尺寸,进而有效地将低存储器频宽。Generally speaking, memory bandwidth depends on the frame rate, frame size, search window size, and Ra value, etc., and for specific video compression applications, the frame rate and frame size are usually fixed values, so The present invention reduces the size of the search window by selecting a data reuse architecture with a smaller Ra value and using a data prediction method, thereby effectively reducing memory bandwidth.
对目前画面来说,平均每个区块会被存取SRHxSRV次,即For the current picture, each block will be accessed SR H x SR V times on average, namely
但只要加入大小为NxN的内部存储器,就可将目前画面的Ra降低为1,如下:But as long as the internal memory with the size of NxN is added, the Ra of the current picture can be reduced to 1, as follows:
而对参考画面来说,图6显示针对参考画面的等级A到D四种数据重复利用架构,其中斜线部分为可重复利用的数据。等级A及B分别为在单一搜寻窗口610、620内的数据的重复利用,等级C及D为在不同搜寻窗口的数据的重复利用。详言之,对在目前画面中的大小为NxN像素的区块,等级A是重复利用参考画面中的大小为(N+SRH-1)×(N+SRV-1)的单一搜寻窗口610中的两个水平方向连续的候选区块612及614间重迭的像素,而等级B则是重复利用在搜寻窗口620中的垂直方向连续的两排候选区块622及624间重迭的像素。等级C为重复利用在参考画面630中两个水平方向连续区块所各自对应的搜寻窗口632及634间重迭的像素,而等级D是重复利用在参考画面640中垂直方向连续的两排区块所各自对应的搜寻窗口642及644间重迭的像素。如上述,总存储器频宽取决于Ra,等级A到D架构的Ra可计算如下:As for the reference picture, FIG. 6 shows four data reuse structures of levels A to D for the reference picture, wherein the hatched part is the data that can be reused. Levels A and B are data reuse within a
等级A:Grade A:
等级B:Grade B:
等级C:Grade C:
等级D:Grade D:
另一方面,由图6可看出,等级A到D架构所需的内部存储器尺寸如下:On the other hand, as can be seen from Figure 6, the internal memory size required for the grade A to D architecture is as follows:
由上述可知,内部存储器尺寸越小,存储器频宽的需求量则越大(如等级A),反之等级D架构的存储器频宽需求虽可大幅度减少,但相对地需要较大的内部存储器尺寸。From the above, it can be seen that the smaller the size of the internal memory, the greater the memory bandwidth requirement (such as grade A). On the contrary, although the memory bandwidth requirement of the grade D architecture can be greatly reduced, relatively larger internal memory size is required .
等级C在内部存储器尺寸及外部存储器频宽中可取得较佳的平衡,但随着视频分辨率的提升,等级C的数据重复使用架构已经不敷实际应用的需要。因此,本发明结合了数据重复使用以及数据预测两种功能,在等级C架构中,根据邻近区块的搜寻窗口大小而预测并动态地调整目前区块的搜寻窗口大小,取代了原本等级C架构需加载相邻两个水平方向连续区块各自对应的整块搜寻窗口的重迭区域,而只需加载其所对应的预测的搜寻窗口范围。如此,不但可有效降低其内部存储器尺寸需求,且可进一步降低等级C架构所需的存储器频宽。Class C can achieve a better balance between internal memory size and external memory bandwidth, but with the improvement of video resolution, the data reuse architecture of Class C is no longer sufficient for practical applications. Therefore, the present invention combines the two functions of data reuse and data prediction. In the level-C framework, the size of the search window of the current block is predicted and dynamically adjusted according to the size of the search window of adjacent blocks, replacing the original level-C framework. It is necessary to load the overlapping areas of the entire search windows corresponding to two consecutive blocks in the horizontal direction, and only need to load the corresponding predicted search window range. In this way, not only the size requirement of the internal memory can be effectively reduced, but also the memory bandwidth required by the class-C architecture can be further reduced.
图7显示本发明一实施例的执行移动评估的方法流程图。一般来说,在执行区块比对移动评估算法时,会先将目前画面分割成多个区块,并决定对多个区块执行移动评估的扫描顺序。在此实施例中,采用光栅扫描顺序依序对各区块进行移动评估,而每个区块的移动评估算法系采用方形搜寻算法。首先,在步骤S700中,选择其中一区块以进行移动评估,并取得其邻近区块的动态扫描窗口范围、邻近区块的移动向量预测值、和邻近区块的比对数据。接着,在步骤S710中,根据邻近区块的残值数据定义目前区块的预定临界值αn,举例来说,预定临界值αn可为对邻近区块的残值数据取不同权重后相加所得的数值。需注意,预定临界值αn的计算可根据实际应用所需而选择做适当的调整。接着,在步骤S720中,加载参考画面中对应目前区块位置的数据,并与目前区块进行一次真实比对运算,以计算其SAD值。FIG. 7 shows a flowchart of a method for performing mobility assessment according to an embodiment of the present invention. Generally speaking, when executing the block comparison motion evaluation algorithm, the current frame is first divided into multiple blocks, and the scanning sequence for performing motion evaluation on the multiple blocks is determined. In this embodiment, each block is sequentially evaluated for motion in a raster scan order, and the motion evaluation algorithm for each block is a square search algorithm. First, in step S700, one of the blocks is selected for motion evaluation, and the dynamic scanning window range of its adjacent blocks, the motion vector prediction value of the adjacent blocks, and the comparison data of the adjacent blocks are obtained. Next, in step S710, the predetermined critical value α n of the current block is defined according to the residual value data of adjacent blocks . Add the resulting value. It should be noted that the calculation of the predetermined critical value α n can be selected and adjusted appropriately according to actual application requirements. Next, in step S720, the data corresponding to the position of the current block in the reference frame is loaded, and a real comparison operation is performed with the current block to calculate its SAD value.
接着,在步骤S 730中,比较步骤S 710所设定的预定临界值αn以及步骤S720所得的SAD值。若在步骤S730中,比对所得的SAD值小于预定临界值αn,则程序进行至步骤S740,结束此区块的移动评估,并接着进行至步骤S750,以判断是否所有区块均完成移动评估。若在步骤S750判断所有区块均完成移动评估,则程序进行至步骤S760,结束目前画面的移动评估,若在步骤S750判断并非所有区块皆完成移动评估,则程序回到步骤S700,选择下一区块继续执行移动评估。Next, in step S730, compare the predetermined threshold α n set in step S710 with the SAD value obtained in step S720. If in step S730, the SAD value obtained by comparison is smaller than the predetermined critical value α n , the procedure proceeds to step S740, ends the movement evaluation of this block, and then proceeds to step S750, to determine whether all blocks have been moved Evaluate. If it is judged in step S750 that all blocks have completed the motion evaluation, then the program proceeds to step S760 to end the motion evaluation of the current screen, if it is judged in step S750 that not all blocks have completed the motion evaluation, then the program returns to step S700, select Next A block continues to perform mobile evaluation.
在步骤S730中,若SAD值大于预定临界值αn,则程序进行至步骤S770,利用已知的邻近区块的移动向量预测出目前区块的移动向量。在此实施例中,预测公式是根据左上方、上方、右上方、和左方区块的移动向量及坐标,通过最小平方法求取回归平面而获得。需注意的是,预测公式可能随扫描顺序的不同而做调整。接着,在步骤S780中,将对应所预测的目前区块移动向量的方形搜寻区域从外部存储器加载内部存储器中。接着,在步骤S790中,于所加载的搜寻窗口中以方形搜寻算法对所选择的区块执行移动评估,以寻找最匹配的区块。于步骤S790中找到最匹配的区块后,程序回到步骤S750以重复上述步骤,直到完成目前画面的移动评估。需注意的是,若在步骤S790中无法以方形搜寻找到匹配的区块,则可在预测的搜寻窗口中选择具有最小SAD值的区块作为目前区块的匹配区块、或是可加载其它或更大的搜寻窗口范围以再次进行比对操作,然本发明并不限制此情况下可实行的方式。In step S730, if the SAD value is greater than the predetermined threshold α n , the process proceeds to step S770, where the motion vector of the current block is predicted by using the known motion vectors of adjacent blocks. In this embodiment, the prediction formula is obtained by calculating the regression plane by the least square method according to the moving vectors and coordinates of the upper left, upper, upper right, and left blocks. It should be noted that the prediction formula may be adjusted according to the scanning order. Next, in step S780, the square search area corresponding to the predicted current block motion vector is loaded from the external memory into the internal memory. Next, in step S790 , in the loaded search window, a square search algorithm is used to perform motion evaluation on the selected blocks to find the best matching block. After finding the best matching block in step S790, the process returns to step S750 to repeat the above steps until the motion evaluation of the current frame is completed. It should be noted that if a matching block cannot be found by square search in step S790, the block with the smallest SAD value can be selected as the matching block of the current block in the predicted search window, or other blocks can be loaded. Or a larger search window range to perform the comparison operation again, but the present invention does not limit the practicable manner in this case.
本发明利用邻近区块的残值数据(residual data)和第一次真实比对的结果,先减少所需的存储器频宽和内部存储器尺寸,并利用周围区块的移动向量预测出目前区块的搜寻窗口范围。本发明取代等级C架构需下载水平方向相邻两区块的搜寻范围,而只需要下载所预测的搜寻窗口范围。因此,本发明只需要用到原本内部存储器的30%~60%,存储器频宽只需要约40%~80%,大幅度地减少内部存储器和存储器频宽。本发明所提出的结合数据预测及数据重复利用架构的移动评估方法,可应用在各种实时嵌入式多媒体系统的软硬件存储器管理上。The present invention uses the residual data of adjacent blocks and the result of the first real comparison to reduce the required memory bandwidth and internal memory size, and predicts the current block using the motion vectors of the surrounding blocks The range of the search window. The present invention replaces the need to download the search ranges of two adjacent blocks in the horizontal direction instead of the level C architecture, and only needs to download the predicted search window range. Therefore, the present invention only needs to use 30% to 60% of the original internal memory, and the memory bandwidth only needs about 40% to 80%, which greatly reduces the internal memory and memory bandwidth. The mobile evaluation method combined with data prediction and data reuse framework proposed by the present invention can be applied to software and hardware memory management of various real-time embedded multimedia systems.
以上所述仅为本发明的较佳实施例而已,并非用以限定本发明的权利要求范围;凡其它未脱离本发明所揭示的精神下所完成的等效改变或修饰,均应包含在上述的权利要求范围内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the scope of the claims of the present invention; all other equivalent changes or modifications that do not deviate from the spirit disclosed in the present invention should be included in the above-mentioned within the scope of the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN 200910004964 CN101815219B (en) | 2009-02-20 | 2009-02-20 | Mobile evaluation method for real-time embedded multimedia design |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN 200910004964 CN101815219B (en) | 2009-02-20 | 2009-02-20 | Mobile evaluation method for real-time embedded multimedia design |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101815219A true CN101815219A (en) | 2010-08-25 |
| CN101815219B CN101815219B (en) | 2013-03-20 |
Family
ID=42622319
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN 200910004964 Active CN101815219B (en) | 2009-02-20 | 2009-02-20 | Mobile evaluation method for real-time embedded multimedia design |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN101815219B (en) |
-
2009
- 2009-02-20 CN CN 200910004964 patent/CN101815219B/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| CN101815219B (en) | 2013-03-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI389575B (en) | Motion estimation approach for real-time embedded multimedia design | |
| US9357228B2 (en) | Motion estimation of images | |
| JP2920208B2 (en) | Coding method of motion vector of moving image | |
| US9319708B2 (en) | Systems and methods of improved motion estimation using a graphics processing unit | |
| US8155213B2 (en) | Seamless wireless video transmission for multimedia applications | |
| KR100994773B1 (en) | Method and apparatus for generating motion vector in hierarchical motion estimation | |
| EP1668912B1 (en) | Rectangular-shape motion search | |
| WO2008003220A1 (en) | Motion vector estimation method | |
| TWI442775B (en) | Low-power and high-performance video coding method for performing motion estimation | |
| US20060120455A1 (en) | Apparatus for motion estimation of video data | |
| CN101800893B (en) | Low-power high-performance video coding method performing motion estimation | |
| TWI401970B (en) | Low-power and high-throughput design of fast motion estimation vlsi architecture for multimedia system-on-chip design | |
| KR20080048384A (en) | Fast full-area motion prediction method based on segmented search area and device | |
| Liu et al. | Complexity comparison of fast block-matching motion estimation algorithms | |
| JP2012129791A (en) | Image encoder | |
| CN101815219A (en) | Mobile evaluation method for real-time embedded multimedia design | |
| US10616588B2 (en) | Non-transitory computer-readable storage medium, encoding processing method, and encoding processing apparatus | |
| JP4887009B2 (en) | Method and apparatus for selecting motion vectors for blockset coding | |
| KR100986719B1 (en) | How to reduce the amount of computation for H.264 / ACC motion estimation | |
| US8213513B2 (en) | Efficient data prediction and data reuse motion estimation engine for system-on-chip design | |
| Tian et al. | PA-Search: predicting units adaptive motion search for surveillance video coding | |
| KR100699835B1 (en) | Hierarchical Motion Predictor and Motion Vector Prediction Method | |
| Duvar et al. | SIMD-based low bit-depth motion estimation with application to HEVC | |
| CN115529459A (en) | Central point searching method and device, computer equipment and storage medium | |
| Saha et al. | Approximate sad computation for real-time low power video encoders |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant |