JP2004056305A

JP2004056305A - Motion vector detection method, motion vector detection device, and program

Info

Publication number: JP2004056305A
Application number: JP2002208710A
Authority: JP
Inventors: Shinichiro Nishioka; 伸一郎西岡; Masayuki Toyama; 昌之外山; Tsutomu Sekibe; 勉関部; Takao Matsumoto; 孝夫松本
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-07-17
Filing date: 2002-07-17
Publication date: 2004-02-19

Abstract

【課題】本発明は、画像階層化及び動きベクトル検出における演算量を削減すると共に、高い動きベクトル検出精度を得ることができる動きベクトル検出方法を提供する。
【解決手段】本発明の動きベクトル検出方法は、第１画像及び第２画像から２組の近似画像群及びアクティビティデータ群を算出する階層化画像の算出において、動きベクトル検出精度の向上に寄与しない高周波成分を算出するための計算を省略して演算量を削減し、ブロックマッチングにおいては動きベクトル検出精度向上に寄与しないアクティビティデータに対するブロックマッチングを省略することにより演算量を削減することにより、全体的な計算量を削減する。
【選択図】　なしAn object of the present invention is to provide a motion vector detection method capable of reducing a calculation amount in image layering and motion vector detection and obtaining high motion vector detection accuracy.
A motion vector detection method according to the present invention does not contribute to improvement of motion vector detection accuracy in calculating a hierarchical image in which two sets of approximate image groups and activity data groups are calculated from a first image and a second image. The calculation amount is reduced by omitting the calculation for calculating the high-frequency component. In the block matching, the calculation amount is reduced by omitting the block matching for the activity data that does not contribute to the improvement of the motion vector detection accuracy. Reduce computational complexity.
[Selection diagram] None

Description

【０００１】
【発明の属する技術分野】本発明は、動きベクトル検出方法及び動きベクトル検出装置であって、特に時間的に異なる２つの画像データをそれぞれ階層化してからブロックマッチングを行い、動きベクトルを検出する技術に関する。
【０００２】
【従来の技術】動画像信号の高能率符号化方式の国際規格の１つであるＭＰＥＧ２（Ｍｏｖｉｎｇ　Ｐｉｃｔｕｒｅ　Ｅｘｐｅｒｔｓ　Ｇｒｏｕｐ　２）は、空間方向の冗長度削減に直交変換である離散コサイン変換を用い、時間方向の冗長度削減にフレーム間予測・動き補償を用いるハイブリッド符号化に分類される方法である。
【０００３】
フレーム間予測には、ブロックマッチング法を用いる。
ブロックマッチング法では、第１画像及び第１画像と時間的に異なる画像を表す第２画像について、それぞれの画像を小さな矩形領域であるブロックに分割する。第１画像上で注目する注目ブロックと、第２画像上で当該注目ブロックと同位置のブロックを±ｍ×±ｎ画素の探索範囲内で水平方向にｘ画素，垂直方向にｙ画素だけずらしたしたシフトブロックとの類似度を、所定の評価関数にて評価し、評価値が最小となるシフト量ｖ＝（ｘ，ｙ）を動きベクトルとするものである。
【０００４】
評価関数としては、絶対値差分がよく用いられる。
上記のブロックマッチング法では、探索範囲内をくまなく探索し絶対値差分を求める必要があるため、演算量が大きくなり、装置自体が大型化したり演算時間が長くなる問題がある。
上記問題を解決する手法の一つに階層型動きベクトル検出方法がある。
【０００５】
これは第１画像及び第２画像の各々について低域通過処理を施して低解像度の近似画像（以下、階層１と呼ぶ）を作成し、順次同様の低域通過処理を適用してより高い階層の近似画像（階層２、階層３、・・・）を作成する画像階層化ステップと、下位階層において、上位階層で検出した動きベクトル候補の周囲の小探索領域で動きベクトル候補を検出し、順次下位階層の動きベクトル候補を求めていき、最終的に原画像の動きベクトルを検出する動きベクトル検出ステップからなる。
【０００６】
この方法によると大幅な演算量削減が望める一方、低域通過処理により上位階層になる程画像の特徴量である高周波成分が除去されるため、動きベクトル検出の精度が大きく低下する問題があった。
この問題に対し、特開平７−２２２１５７では低域通過処理によって選別される画像の高周波成分をアクティビティ画像（アクティビティデータとも言う）としてブロックマッチングに用いて検出精度を向上させる階層型動きベクトル検出方法が提案されており、上記低域通過処理については平均値化処理を用いる方法が提案されている。
【０００７】
【発明が解決しようとする課題】しかしながら従来の方法では、精度の高い動きベクトル検出を行うためには、アクティビティデータの算出に大きな演算量が必要である。さらに、画像の高解像度化、高画質化により単位時間あたりに実行すべき演算量は増加する一方であり、ＬＳＩで演算を実現させる場合に消費電力、発熱量の増加につながっている。
【０００８】
上記の問題に鑑み、本発明は、画像階層化ステップ及び動きベクトル検出ステップにおける演算量を削減すると共に、動きベクトル検出精度を従来程度の水準に維持できる動きベクトル検出方法を提供することを目的とする。
【０００９】
【課題を解決するための手段】
上記課題を解決するため、本発明の動きベクトル検出方法は、原解像度でそれぞれ表された第１画像と第２画像との間の動きベクトルを検出する動きベクトル検出方法であって、当該第１画像及び当該第２画像に含まれる低周波成分を当該原解像度よりも低い縮小解像度でそれぞれ表す第１近似画像及び第２近似画像、及び当該第１画像及び第２画像に含まれる高周波成分を当該縮小解像度でそれぞれ表す第１アクティビティ画像及び第２アクティビティ画像を算出する算出ステップと、当該第１アクティビティ画像により表された高周波成分の総量が所定閾値以上であるか否かを判断する判断ステップと、当該閾値以上であると判断された場合、当該第１近似画像と当該第２近似画像との比較結果、及び当該第１アクティビティ画像と当該第２アクティビティ画像との比較結果の双方を用いて動きベクトル候補を選出し、その他の場合、当該第１近似画像と当該第２近似画像との比較結果のみを用いて動きベクトル候補を選出する選出ステップと、当該動きベクトル候補を用いて動きベクトルを検出する検出ステップとを含む。
【００１０】
【発明の実施の形態】
発明の実施の形態における動きベクトル検出装置について図面を参照しながら説明する。
１．　全体構成
図１は、本実施の形態における動きベクトル検出装置１０の全体構成を示すブロック図である。動きベクトル検出装置１０は、第１画像メモリ１０１、第２画像メモリ１０２、算出手段１０３、階層化第１画像メモリ１０４、階層化第２画像メモリ１０５、判断手段１０６及び動きベクトル検出手段１０７から構成される。
【００１１】
動きベクトル検出装置１０は、具体的にはプロセッサ、プログラムを記憶しているＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）、作業用のＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）等のソフトウェア及びハードウェアにより実現される。各構成要素の機能は、プロセッサがＲＯＭに記憶されているプログラムを実行することにより実現される。構成要素間におけるデータの受け渡しは、ＲＡＭ等のハードウェアを介して行われる。
【００１２】
第１画像メモリ１０１は、入力された第１画像を保持する。
第２画像メモリ１０２は、入力された第２画像を保持する。
算出手段１０３は、第１画像メモリ１０１に保持している第１画像及び第２画像メモリ１０２に保持している第２画像を、それぞれ階層化第１画像メモリ１０４及び階層化第２画像メモリ１０５に読み出し、当該読み出した第１画像及び第２画像のそれぞれに含まれる低周波成分を当該各画像よりも低くかつ段階的に低下する複数の解像度をもって示す２組の近似画像群、及び当該第１画像及び第２画像のそれぞれに含まれる高周波成分を当該複数の解像度をもって示す２組のアクティビティデータ群を算出する。
【００１３】
階層化第１画像メモリ１０４は、算出手段１０３が第１画像から算出した前記近似画像群及びアクティビティデータ群を保持する。
階層化第２画像メモリ１０５は、算出手段１０３が第２画像から算出した前記近似画像群及びアクティビティデータ群を保持する。
判断手段１０６は、算出手段１０３が第１画像から算出した各階層のアクティビティデータの重要度を評価し、第１画像及び第２画像の各階層のアクティビティデータを動きベクトル検出に用いるか否かを判断する。
【００１４】
動きベクトル検出手段１０７は、低解像度の階層で選出した動きベクトル候補を順次高解像度の階層の動きベクトル候補の選出に利用し、動きベクトルを検出する。
２．データ構造
図２は、算出手段１０３が算出する各データの関係を模式的に示した図である。
【００１５】
２００及び２０１は、入力された原画像である第１画像と第２画像を示す。
２０２及び２０３は、第１画像２００から算出した低域通過成分である近似画像及び高域通過成分を用いて生成するアクティビティデータを示し、２０４及び２０５は、第２画像２０１から算出した近似画像及びアクティビティデータを示す。
【００１６】
前記２０２、２０３、２０４及び２０５は、階層１に属する。
同様に２０６及び２０７は、前記近似画像２０２から算出した近似画像及びアクティビティデータを示し、２０８及び２０９は、前記近似画像２０４から算出した近似画像及びアクティビティデータを示す。
前記２０６、２０７、２０８及び２０９は、階層２に属する。
【００１７】
２１０は、２０２及び２０６からなる近似画像群を示し、２１１は２０３及び２０７からなるアクティビティデータ群を示す。
同様に２１２は２０４及び２０８からなる近似画像群を示し、２１３は２０５及び２０９からなるアクティビティデータ群を示す。
ここで表記について説明する。
【００１８】
第１画像及び第２画像を、原画像においてブロックマッチングを行う単位であるブロック（１６画素×１６画素）に区分し、水平方向にｉ番目、垂直方向にｊ番目の位置の第１画像のブロックをＢＬＫ（ｉ，ｊ）、第２画像のブロックをｒＢＬＫ（ｉ，ｊ）と表記する。
同様に、第１画像から算出した階層ｎの近似画像におけるブロックである近似画像ブロックをａｐｘＢＬＫｎ（ｉ，ｊ）と表記し、階層ｎのアクティビティデータにおけるブロックであるアクティビティブロックをａｃｔＢＬＫｎ（ｉ，ｊ）と表記する。
また第２画像から算出した階層ｎの近似画像ブロックをｒａｐｘＢＬＫｎ（ｉ，ｊ）と表記し、第２画像から算出した階層ｎでのアクティビティブロックをｒａｃｔＢＬＫｎ（ｉ，ｊ）と表記する。
【００１９】
ａｐｘＢＬＫｎ（ｉ，ｊ）、ｒａｐｘＢＬＫｎ（ｉ，ｊ）、ａｃｔＢＬＫｎ（ｉ，ｊ）、ｒａｃｔＢＬＫｎ（ｉ，ｊ）の各ブロックの画素数は、ＢＬＫ（ｉ，ｊ）及びｒＢＬＫ（ｉ，ｊ）の画素数に比べ、水平方向、垂直方向にそれぞれ１／（２のｎ乗）となる。
次に、図３を用い、原画像から近似画像群及びアクティビティデータ群を算出する画像階層化処理について説明する。
【００２０】
図３は、画像階層化処理時の階層化第１画像メモリ１０４及び階層化第２画像１０５の内容を示している。
ここでは、第１画像について階層化する例を説明し、第２画像については説明が重複するので、説明を割愛する。
階層化第１画像メモリ１０４の内容は、画像階層化処理が進むに従い、図３の３００から３０１、３０２へと変化する。
【００２１】
３００は、算出手段１０３が、第１画像を第１画像メモリ１０２から階層化第１画像メモリ１０４へ読み出した時の階層化第１画像メモリ１０４の内容である。
３０１は、３００から階層１の近似画像及びアクティビティデータを算出した時の階層化第１画像メモリ１０４の内容である。
【００２２】
３０２は、階層１の近似画像から、階層２の近似画像及びアクティビティデータを算出した時の階層化第１画像メモリ１０４の内容である。
図３中のａｍ、ｂｍ、ｃｍ、ｄｍ（ｍは任意の自然数）は、それぞれ原画像を構成する画素データを表す。
図３中のａｐｘｎｍ（ｍ、ｎは共に自然数）は近似画像を構成する画素データを表し、ａｃｔｎｍ（ｍ、ｎは共に自然数）はアクティビティデータを構成するデータを表す。
【００２３】
また、ｍ、ｎは任意の自然数であり、ｎは各データが属する階層を示している。
算出手段１０３は、３００を２画素×２画素の小画素領域に区分し、全ての小画素領域について小画素領域毎に定位置計算のウェーブレット変換を行い、階層１の近似画像及びアクティビティデータを得る。
【００２４】
定位置計算のウェーブレット変換の詳細は後述する。
３０３は、画素データａ１、ｂ１、ｃ１、ｄ１を持つ２画素×２画素の小画素領域である。
算出手段１０３が、小画素領域３０３に対し定位置計算のウェーブレット変換を行った結果、３０３の内容が、近似画像を表す画素データａｐｘ１１とアクティビティデータを表すデータａｃｔ１１からなる３０４に示す内容になる。
【００２５】
ここで、３０４中にａｐｘ１１が２個含まれるが、第１のａｐｘ１１は、動きベクトル検出の際に使用する。
第２のａｐｘ１１は、階層２の近似画像及びアクティビティデータを算出する演算に使用し、当該演算実施中に値が書き換えられる。
３００内の全ての小画素領域に対しウェーブレット変換を行った結果、階層化第１画像メモリ１０４の内容が３０１で表す内容となる。
【００２６】
３０１について、ａｐｘ１ｍで示した画素の集まりが階層１の近似画像を構成し、図２中２０２に対応している。
またａｃｔ１ｍで示したデータの集まりが階層１のアクティビティデータを構成し、図２中２０３に対応している。
以上のように算出した階層１の近似画像及びアクティビティデータの画素数は、原画像の画素数に比べ水平方向、垂直方向にそれぞれ１／２となる。
【００２７】
更に、階層１の近似画像を用い、階層２の近似画像及び階層２のアクティビティデータを算出する。
３０１内の近似画像を表す画素データを２画素×２画素の小画素領域に区分し、全ての小画素領域について小画素領域毎にウェーブレット変換を行い階層２の近似画像及びアクティビティデータを得る。
【００２８】
図３中３０１で斜線を施した、階層１の近似画像を表すａｐｘ１１、ａｐｘ１２、ａｐｘ１３、ａｐｘ１４を２画素×２画素の小画素領域とし、当該小画素領域に対し、定位置計算のウェーブレット変換を行った結果、図３中３０２で斜線を施した階層２の近似画像を表す画素データであるａｐｘ２１及び階層２のアクティビティデータを表すデータであるａｃｔ２１を得る。
【００２９】
３０１内の全ての小画素領域に対しウェーブレット変換を行った結果、階層化第１画像メモリ１０４の内容は、３０２で表す内容となる。
３０２について、ａｐｘ２ｍで示した画素の集まりが階層２の近似画像を構成し、図２中２０６に対応している。
またａｃｔ２ｍで示したデータの集まりが階層２のアクティビティデータを構成し、図２中２０７に対応している。
【００３０】
以上のように算出した階層２の近似画像及びアクティビティデータの画素数は、階層１の近似画像及びアクティビティデータの画素数に比べ水平方向、垂直方向にそれぞれ１／２となる。
階層ｎの近似画像及びアクティビティデータの画素数は、原画像に比べ水平方向、垂直方向にそれぞれ１／（２のｎ乗）となる。
【００３１】
３階層化を行った時の階層化第１画像メモリ１０４の内容である３０２内には、階層１の近似画像及びアクティビティデータを表すデータと、階層２の近似画像及びアクティビティデータを表すデータが混在する。
また第２画像についても、第１画像の場合と同様に階層化処理を行い、算出した近似画像群及びアクティビティデータ群を階層化第２画像メモリ１０５に保持する。
３．　処理
３．１　全体処理
図４は、動きベクトル検出装置１０の全体処理の概略を示すフローチャートである。
【００３２】
算出手段１０３が近似画像群及びアクティビティデータ群の算出処理であるステップＳ４０１を行い、判断手段１０６と動きベクトル検出手段１０７が、動きベクトル検出処理であるステップＳ４０２を行う。
３．２　近似画像群及びアクティビティデータ群の算出処理
ステップＳ４０１では、算出手段１０３が、第１画像メモリ１０１内の第１画像を階層化第１画像メモリ１０４に読み出し、また第２画像メモリ１０２内の第２画像を階層化第２画像メモリ１０５に読み出し、各々の画像に対しリフティング構成のウェーブレット変換を適用する。
【００３３】
リフティング構成とは離散ウェーブレット変換を定位置計算で実行する構成法であり、メモリの使用量が少ない、アドレスのデコード処理が少ない等の利点がある。
詳細は、公知論文　Ｗ．Ｓｗｅｌｄｅｎｓ，Ｔｈｅ　ｌｉｆｔｉｎｇ　ｓｃｈｅｍｅ：　Ａ　ｃｕｓｔｏｍ　ｄｅｓｉｇｎ　ｃｏｎｓｔｒｕｃｔｉｏｎ　ｏｆｂｉｏｒｔｈｏｇｏｎａｌ　ｗａｖｅｌｅｔｓ，　Ｊ．Ａｐｐｌ．Ｃｏｍｐｕｔ．Ｈａｒｍｏｎｉｃ　Ａｎａｌｙｓｉｓ，　３　（１９９６）に記載されているため、ここでは説明を省略する。
３．２．１　Ｈａａｒウェーブレット変換
ステップＳ４０１では、低域通過成分として平均値を、高域通過成分として偏差を算出するリフティング構成のＨａａｒウェーブレット変換を用いる。
【００３４】
図５は、２画素×２画素の小画素領域に対し、リフティング構成の２次元Ｈａａｒウェーブレット変換を実行する例である。
階層化第１画像メモリ１０４及び階層化第２画像メモリ１０５内に読み出した第１画像及び第２画像のそれぞれを図３中３０３と同様の２画素×２画素からなる小画素領域に区分し、小画素領域毎に前記リフティング構成のウェーブレット変換を適用する。
【００３５】
ここでは、画素データがそれぞれａ，ｂ，ｃ，ｄである小画素領域に対し、リフティング構成のウェーブレット変換を適用する例について説明する。
図５の各行はメモリ領域内のデータに行う演算とその結果であり、上から下へ順次処理が進行していることを示し、各列は各メモリ領域に対応し、最右列には使用する演算の種類を示す。
【００３６】
演算開始時、前記画素データａをメモリ領域Ａに保持し，前記画素データｂ，ｃ，ｄを、それぞれメモリ領域Ｂ、Ｃ、Ｄに保持している。
ここではメモリ領域Ａについてのみ説明を行い、メモリ領域Ｂ、メモリ領域Ｃ，メモリ領域Ｄについては、メモリ領域Ａについての説明と重複するので、説明を割愛する。
【００３７】
図５中のメモリ領域Ａの列５０１の各欄は、上下に区切っており、上半分５０２，５０４，５０６，５０８が実施する演算、下半分５０３、５０５、５０７、５０９が演算後のメモリ領域Ａの内容を示す。
５０１について、Ｓ０からＳ３までの動作を順を追って説明する。
Ｓ０において、５０２では演算を行わず、５０３に示す初期データａをメモリ領域Ａに格納する。
【００３８】
Ｓ０からＳ１に移行する際、メモリ領域Ａの内容ａとメモリ領域Ｂの内容ｂを加算（Ａ＋Ｂ）する演算５０４を行い、メモリ領域Ａの内容は５０５に示す演算結果ａ＋ｂとなる。
Ｓ１からＳ２に移行する際、Ｓ１でのメモリ領域Ａの内容であるａ＋ｂと、Ｓ１でのメモリ領域Ｃの内容であるｃ＋ｄを加算（Ａ＋Ｃ）する演算５０６を行い、メモリ領域Ａの内容は６０７に示す演算結果ａ＋ｂ＋ｃ＋ｄとなる。
【００３９】
Ｓ２からＳ３に移行する際、Ｓ２でのメモリ領域Ａの内容に対し割算（シフト）演算５０８を行い、メモリ領域Ａの内容は、５０９に示す演算結果（ａ＋ｂ＋ｃ＋ｄ）／４となる
Ｓ０からＳ３まで行った結果、メモリ領域Ａには、低域通過成分であるＬＬ成分の（ａ＋ｂ＋ｃ＋ｄ）／４が、同様にメモリ領域Ｂには、水平方向のエッジ強調成分であるＨＬ成分である（ｂ＋ｄ）／２−（ａ＋ｃ）／２が、メモリ領域Ｃには、垂直方向のエッジ強調成分であるＬＨ成分である（ｃ＋ｄ）／２−（ａ＋ｂ）／２が、メモリ領域Ｄには、水平垂直方向のエッジ強調成分であるＨＨ成分である（ｄ−ｃ）−（ｂ−ａ）が格納されている。
【００４０】
ここで、ＬＬ成分が低域通過成分、ＬＨ、ＨＬ及びＨＨ成分が高域通過成分である。
算出手段１０３では、さらに、アクティビティデータの算出に使用する高域通過成分のみを算出することにより、演算量を削減できる。
アクティビティデータの算出に使用する高域通過成分としては、アクティビティデータとしてノイズの影響を受けやすいＨＨ成分は用いず、第１画像がフレーム画像の場合、水平方向のエッジ強調成分であるＨＬ成分と垂直方向のエッジ強調成分であるＬＨ成分との絶対値平均（｜ＨＬ｜＋｜ＬＨ｜）／２を用い、フィールド画像である場合、水平方向に比べて垂直方向に帯域制限されているため、ＬＨ成分のみを用いる。
３．２．２　簡略化リフティングスキーム
アクティビティデータの算出に使用しない高域通過成分は算出不要であり、算出手段１０３は前記リフティング構成のウェーブレット変換における高域通過成分を算出する過程で、不要な演算を省略することによりリフティング構成を簡略化する（以下、簡略化リフティングスキームと呼ぶ）。
【００４１】
図６は、アクティビティデータとして（｜ＨＬ｜＋｜ＬＨ｜）／２を採用し、２次元Ｈａａｒウェーブレット変換を行う簡略化リフティングスキームの例である。
アクティビティデータを（｜ＨＬ｜＋｜ＬＨ｜）／２とする場合、ＨＨ成分を算出する必要がないため、最終的にＨＨ成分が算出されるメモリ領域Ｄへの演算は省略し、全体の処理を簡略化する。
【００４２】
図７は、アクティビティデータとしてＬＨ成分を採用し、２次元Ｈａａｒウェーブレット変換を行う簡略化リフティングスキームの例である。
この場合、ＨＬ成分、ＨＨ成分を算出する必要が無いため、省略可能な演算がさらに増え演算量の削減効果が大きくなる。
ここで、図６及び図７のどちらの場合にも近似値であるＬＬ成分は、次の階層での定位置計算で書き換えられるが、この近似値は後段の動きベクトル検出手段１０７でのブロックマッチングにも用いるため、採用しない高域通過成分のメモリ領域Ｄにコピーしておく。
３．２．３　従来例との演算量比較
図１２に、従来方式の画像階層化ステップを示す。
【００４３】
従来方式の画像階層化ステップでは、小画素領域内のデータａ，ｂ，ｃ，ｄにおいて、近似値として平均値Ａｖｇ＝（ａ＋ｂ＋ｃ＋ｄ）／４を使用し、アクティビティとしてＡｃｔ＝（｜ａ−Ａｖｇ｜＋｜ｂ−Ａｖｇ｜＋｜ｃ−Ａｖｇ｜＋｜ｄ−Ａｖｇ｜）／４を使用する。
従来方式を用いて小画素領域について近似画像及びアクティビティを算出するのに要する演算は、図１２に示すように、加算３回、減算４回、絶対値加算３回、シフト演算（除算）２回となる。
【００４４】
一方本発明の前記簡略化リフティングスキームを用いると、上記演算は、図６の場合、加算４回、減算３回、絶対値加算１回、シフト演算２回であり、従来方式に比べ、減算が加算に置き換わり、絶対値加算２回を削減できる。
また、図７の場合、加算３回、減算１回、絶対値加算０回、シフト演算２回であり、従来方式に比べ、減算３回、絶対値加算３回を削減できる。
３．２．４　第１及び第２階層化画像メモリにおける適用
図８は、ステップＳ４０１を詳細に説明するフローである。
【００４５】
本実施例では、原画像から階層１の近似画像及びアクティビティデータを算出し、当該階層１の近似画像から階層２の近似画像及びアクティビティデータを算出する３階層化について説明する。
ステップＳ８０１では演算対象の階層ｎを０とし、ステップＳ８０２では階層数Ｎを３とする。
【００４６】
ステップＳ８０３では、ｎ＞（Ｎ−１）を判定し、ＹＥＳの場合は処理を終了し、ＮＯの場合ステップＳ８０４に進む。
ステップＳ８０４では、階層ｎの原画像（ｎ＝０の場合）或いは近似画像（ｎ≠０の場合）内の全ての小画素領域についてウェーブレット変換を行ったかどうか判定し、ＹＥＳの場合ステップＳ８０６でｎ＝ｎ＋１とし、ステップＳ８０３に進み、ＮＯの場合、ステップＳ８０５に進み、演算を行っていない小画素領域についてウェーブレット変換を行う。
【００４７】
ステップＳ８０５で行う小画素領域に対するウェーブレット変換については、前述した通りである。
３．３　動きベクトル検出処理
図４中ステップＳ４０２の動きベクトル検出処理の詳細を図９、１０、１１を用いて説明する。
【００４８】
図９は、ステップＳ４０２の処理を詳細に説明したものである。
ここでは、図９において、原画像における水平方向１番目、垂直方向１番目の位置のブロックであるＢＬＫ（１，１）の動きベクトル検出の例で説明する。
ステップＳ９０１では、原画像において、全てのブロックの動きベクトルを検出したかどうか判断する。
【００４９】
検出した場合、処理を終了する。
検出していない場合、ステップＳ９０２に進む。
ステップＳ９０２では、原画像において、動きベクトルを検出していないブロックを選出する。
本実施例では、ＢＬＫ（１，１）の動きベクトルを検出するものとする。
【００５０】
ステップＳ９０３では、演算対象となる階層ｎをＮ−１とする。
ステップＳ４０１において画像を３階層化しているため、Ｎ＝３とする。
ステップＳ９０４では、最上位階層のアクティビティブロックの重要度を評価する。
ここでは、ＢＬＫ（１，１）から算出した最上位階層のアクティビティブロックであるａｃｔＢＬＫ２（１，１）の重要度の評価を行う。
【００５１】
アクティビティブロックａｃｔＢＬＫｎ（１，１）の重要度の評価は、判断手段１０６がａｃｔＢＬＫｎ（１，１）内の各画素データ値のデータ総和を算出し、当該データ総和が予め設定したしきい値α以上であれば、当該アクティビティブロックの重要度は高いと判定し、しきい値α未満であれば当該アクティビティブロックの重要度は低いと判定する。
【００５２】
前記しきい値αを高く設定すると、重要度が低いと判断されるアクティビティブロックが増え、逆に前記しきい値αを低く設定すると重要度が低いと判断されるアクティビティブロックは減る。
例えば、ブロックサイズが２画素×２画素である場合、画素値を２５６階調とするとアクティビティブロック内データの総和は０〜２５５×４となり、当該ブロック内総和最大値の２５％を前記しきい値に設定するとしきい値は２５５となる。
【００５３】
アクティビティデータとしてウェーブレット変換後の画像の高域通過成分を用いる場合、変動が小さく一様な画像領域でのアクティビティデータは無視できる値となり、後述する動きベクトル検出ステップにおける検出精度向上に貢献しない。
よって、判断手段１０６は、アクティビティブロックの重要度が高い場合ブロックマッチングにアクティビティデータを用い、重要度が低い場合には、ブロックマッチングにアクティビティデータを用いないと判断する。
【００５４】
判断手段１０６が、アクティビティブロックを用いると判断した場合ステップＳ９０５に進み、用いないと判断した場合ステップＳ９０６に進む。
ステップＳ９０５では、動きベクトル検出手段１０７が、近似画像及びアクティビティデータを用いて、ブロックマッチングを行い動きベクトル候補ＭＶＣ（Ｎ−１）を検出する。
【００５５】
図１０は、ａｐｘＢＬＫｎ（１，１）についてのブロックマッチングを説明する図である。
図１１は、動きベクトル検出について説明する図である。
ａｐｘＢＬＫｎ（１，１）についてブロックマッチングを行う場合、ａｐｘＢＬＫｎ（１，１）とシフトブロックとの類似度を評価関数により評価する。
【００５６】
シフトブロックとは図１０に示すように、ａｐｘＢＬＫｎ（ｉ，ｊ）に対し同じｉ，ｊであるｒａｐｘＢＬＫｎ（ｉ，ｊ）を、探索領域の範囲内で、水平方向、及び垂直方向に画素単位でずらしたものである。
また前記評価関数としては、偏差絶対総和や偏差２乗総和などをよく用いる。図１０は、図中ａｐｘＢＬＫｎ（１，１）をａｃｔＢＬＫｎ（１，１）と読み替え、ｒａｐｘＢＬＫｎ（１，１）をｒａｃｔＢＬＫｎ（１，１）と読み替えることにより、アクティビティブロックについてのブロックマッチングについても説明する図となる。
【００５７】
ａｃｔＢＬＫｎ（１，１）についてのブロックマッチングは、ａｐｘＢＬＫｎ（１，１）の場合と同様に、ｒａｃｔＢＬＫｎ（１，１）を探索領域の範囲内で画素単位にシフトしたシフトブロックとの類似度を評価関数により評価する。
ａｐｘＢＬＫｎ（１，１）についての評価関数による評価値とａｃｔＢＬＫｎ（１，１）についての評価関数による評価値に重みｗ（０＜ｗ≦１）を付けた値とを加えた結果をブロックマッチングの最終的な評価値とする。
【００５８】
前記ブロックマッチングの最終的な評価値が最小となるシフトブロックのシフト量が動きベクトルとなる。
最上位階層ｎ＝２の場合、動きベクトル検出手段１０７が、ａｐｘＢＬＫ２（１，１）及びａｃｔＢＬＫ２（１，１）について、探索領域ＳＲ２（１，１）（水平方向±１２画素、垂直方向±８画素）内で前記ブロックマッチングを行い、図１１に示す動きベクトル候補ＭＶＣ（２）（１，１）を算出する。
【００５９】
ステップＳ９０６では、ａｃｔＢＬＫ２（１，１）を用いず、ａｐｘＢＬＫ２（１，１）について、探索領域ＳＲ２（１，１）（水平方向±１２画素、垂直方向±８画素）内で前記ブロックマッチングを行い、図１１に示す動きベクトル候補ＭＶＣ（２）（１，１）を算出する。
ステップＳ９０７では、演算対象となる階層ｎについて、ｎ＝ｎ−１とする。
【００６０】
ステップＳ９０８では、ｎが原画像を表す０であるかどうかを判定する。
ｎ＝０である場合、ステップＳ９１２へ進む。
ｎ≠０の場合、ステップＳ９０９へ進む。
ステップＳ９０９では、判断手段１０６が、階層ｎのアクティビティブロックの重要度を評価する。
【００６１】
本実施例では、ＢＬＫ（１，１）から算出した階層１のアクティビティブロックであるａｃｔＢＬＫ１（１，１）の重要度の評価を行う。
重要度の評価は、ステップＳ９０４で行った方法と同じ方法を用いる。
判断手段１０６が、アクティビティブロックを用いると判断した場合ステップＳ９１０に進み、用いないと判断した場合ステップＳ９１１に進む。
【００６２】
ステップＳ９１０及びステップＳ９１１では、動きベクトル検出手段１０７が、最上位階層及び原画像を除く階層ｎにおいて、近似画像及びアクティビティデータを用いて動きベクトル候補ＭＶＣ（ｎ）（ｉ，ｊ）を検出する。
ブロックマッチングにおける探索領域は、ＭＶＣ（ｎ＋１）（ｉ，ｊ）を用い、小探索領域を設定する。
【００６３】
ｎ＝１の場合である階層１においては、図１１に示すように探索領域ＳＲ１（１，１）（水平方向±２４画素、垂直方向±１６画素）内にＭＶＣ（２）（１，１）を２倍した位置から±２画素の小探索領域ＳＳＲ１（１，１）を設定する。ここでステップＳ９１０では、小探索領域ＳＳＲ１（１，１）内で、ａｐｘＢＬＫ１（１，１）及びａｃｔＢＬＫ１（１，１）についてブロックマッチングを行い、動きベクトル候補ＭＶＣ（１）（１，１）を算出し、ステップＳ９１１では、小探索領域ＳＳＲ１（１，１）内で、ａｐｘＢＬＫ１（１，１）のみについてブロックマッチングを行い、動きベクトル候補ＭＶＣ（１）（１，１）を算出する。
【００６４】
ステップＳ９１２では、原画像において動きベクトルを検出する。
ブロックマッチングには、原画像同士である第１画像と第２画像を用いる。
図１１に示すように原画像中の探索領域ＳＲ（１，１）（水平方向±４８画素、垂直方向±３２画素）内にＭＶＣ（１）（１，１）を２倍した位置およびその周囲水平・垂直±２画素の小探索領域ＳＳＲ（１，１）を設定し、ＳＳＲ（１，１）内でブロックマッチングを行い、整数画素精度の動きベクトルＭＶ（１，１）を検出する。
４．まとめ
以上説明したように、動きベクトル検出装置１０は、第１画像及び第２画像から２組の近似画像群及びアクティビティデータ群を算出する階層化画像の算出において、動きベクトル検出精度の向上に寄与しない高周波成分を算出するための計算を省略して演算量を削減し、ブロックマッチングにおいては動きベクトル検出精度向上に寄与しないアクティビティデータに対するブロックマッチングを省略することにより演算量を削減することにより、全体的な計算量を削減する。またノイズの影響を受けやすい高域通過成分をアクティビティデータの算出に用いないことにより、ノイズに対し頑健なブロックマッチングを行う。
（その他の変形例）
なお、本発明を上記の実施の形態に基づいて説明してきたが、本発明は、上記の実施の形態に限定されないのはもちろんである。以下のような場合も本発明に含まれる。
（１）本発明は、実施の形態で説明したステップを含む方法であるとしてもよい。また、これらの方法を、コンピュータシステムを用いて実現するためのコンピュータプログラムであるとしてもよいし、前記プログラムを表すデジタル信号であるとしてもよい。
【００６５】
また、本発明は、前記プログラム又は前記デジタル信号を記録したコンピュータ読取り可能な記録媒体、例えば、フレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、半導体メモリ等であるとしてもよい。
（２）最上位階層と原画像の間の間の階層においては探索領域ＳＲｎ内に小探索領域ＳＳＲｎを設定し、実施の形態で説明した例では当該ＳＳＲｎのサイズを水平・垂直±２画素としているが、±１画素でもよいし誤検出抑制のため広くとっても良い。
（３）最上位階層において判断手段が行った、ブロックマッチングにアクティビティブロックを用いるか否かの判断結果を、下位階層において使用してもよい。
（４）最上位階層での全探索の結果において、ブロックマッチングの評価値が小さい順に複数の動きベクトル候補ＭＶＣを採用し、当該複数の動きベクトル候補を下位階層で用いてもよい。
（５）階層化した近似画像及びアクティビティデータを用いて動きベクトル検出を行う動きベクトル検出装置において、アクティビティデータの重要度を評価し、重要でないデータのブロックマッチングを省略する方法を、実施の形態で示した実施例に示したウェーブレット変換以外の方法で画像を階層化する場合に用いてもよい。
【００６６】
【発明の効果】
（１）本発明の動きベクトル検出方法は、原解像度でそれぞれ表された第１画像と第２画像との間の動きベクトルを検出する動きベクトル検出方法であって、当該第１画像及び当該第２画像に含まれる低周波成分を当該原解像度よりも低い縮小解像度でそれぞれ表す第１近似画像及び第２近似画像、及び当該第１画像及び第２画像に含まれる高周波成分を当該縮小解像度でそれぞれ表す第１アクティビティ画像及び第２アクティビティ画像を算出する算出ステップと、当該第１アクティビティ画像により表された高周波成分の総量が所定閾値以上であるか否かを判断する判断ステップと、当該閾値以上であると判断された場合、当該第１近似画像と当該第２近似画像との比較結果、及び当該第１アクティビティ画像と当該第２アクティビティ画像との比較結果の双方を用いて動きベクトル候補を選出し、その他の場合、当該第１近似画像と当該第２近似画像との比較結果のみを用いて動きベクトル候補を選出する選出ステップと、当該動きベクトル候補を用いて動きベクトルを検出する検出ステップとを含む。
【００６７】
この構成によれば、動きベクトル検出方法は、動きベクトルの検出精度向上に寄与しないアクティビティ画像についてのブロックマッチング演算を省略することにより、動きベクトル検出についての演算量を削減すると共に、動きベクトル検出精度を従来程度の水準で維持できる。
（２）また、本発明の動きベクトル検出方法は、原解像度でそれぞれ表された第１画像と第２画像との間の動きベクトルを検出する動きベクトル検出方法であって、当該原解像度よりも低くかつ段階的に低下する複数の縮小解像度の各々について、当該第１画像及び当該第２画像に含まれる低周波成分を当該縮小解像度でそれぞれ表す第１近似画像及び第２近似画像、及び当該第１画像及び第２画像に含まれる高周波成分を当該縮小解像度でそれぞれ表す第１アクティビティ画像及び第２アクティビティ画像を算出する算出ステップと、各縮小解像度について、当該縮小解像度の第１アクティビティ画像により表された高周波成分の総量が所定閾値以上であるか否かを判断する判断ステップと各縮小解像度について、当該閾値以上であると判断された場合、当該縮小解像度の第１近似画像と当該縮小解像度の第２近似画像との比較結果、及び当該縮小解像度の第１アクティビティ画像と当該縮小解像度の第２アクティビティ画像との比較結果の双方を用いて動きベクトル候補を選出し、その他の場合、当該縮小解像度の第１近似画像と当該縮小解像度の第２近似画像との比較結果のみを用いて動きベクトル候補を選出する選出ステップと、当該動きベクトル候補を当該縮小解像度の低い順に段階的に用いて動きベクトルを検出する検出ステップとを含む。
【００６８】
この構成によれば、（１）と同様の効果が得られる。
（３）また、前記（２）の動きベクトル検出方法において、前記判断ステップは、所定の縮小解像度よりも高い縮小解像度について、当該所定の縮小解像度における判断結果と同一の判断結果が得られたものとして前記判断を省略してもよい。
【００６９】
この構成によれば、当該所定の縮小解像度よりも高い縮小解像度のアクティビティ画像について、当該判断を行うための演算が不要となり、動きベクトル検出についての演算量を削減すると共に、動きベクトル検出精度を従来程度の水準で維持できる。
（４）また、前記（１）乃至（３）のいずれかの動きベクトル検出方法において、前記算出ステップは、リフティング構成のウェーブレット変換を用いて、前記各近似画像及び各アクティビティ画像を算出してもよい。
【００７０】
この構成によれば、（１）と同様の効果が得られる。
（５）また、前記（４）の動きベクトル検出方法において、前記算出ステップは、リフティング構成のＨａａｒウェーブレット変換を用いて、前記各近似画像及び各アクティビティ画像を算出してもよい。
この構成によれば、（１）と同様の効果が得られる。
（６）また、前記（４）又は（５）の動きベクトル検出方法において、画像を複数の周波数帯域に分割する演算を行う前記リフティング構成のウェーブレット変換及び前記リフティング構成のＨａａｒウェーブレット変換について、前記算出ステップは、前記各近似画像及び前記各アクティビティ画像の算出に用いる周波数帯域のみを抜き出す演算を行ってもよい。
【００７１】
この構成によれば、アクティビティ画像の算出に使用しない周波数帯域を抜き出すための演算を省略出来るため、演算量を削減することができる。
（７）また、前記（６）の動きベクトル検出方法において、前記算出ステップは、前記第１画像が、飛越走査画像であるか順次走査画像であるかによって、アクティビティ画像の算出に用いる周波数帯域を選択してもよい。
【００７２】
この構成によれば、アクティビティ画像の算出に使用しない周波数帯域を抜き出すための演算を省略出来るため、演算量を削減すると共に、動きベクトル検出精度を従来程度の水準で維持できる。
（８）本発明の動きベクトル検出装置は、原解像度でそれぞれ表された第１画像と第２画像との間の動きベクトルを検出する動きベクトル検出装置であって、当該第１画像及び当該第２画像に含まれる低周波成分を当該原解像度よりも低い縮小解像度でそれぞれ表す第１近似画像及び第２近似画像、及び当該第１画像及び第２画像に含まれる高周波成分を当該縮小解像度でそれぞれ表す第１アクティビティ画像及び第２アクティビティ画像を算出する算出手段と、当該第１アクティビティ画像により表された高周波成分の総量が所定閾値以上であるか否かを判断する判断手段と、当該閾値以上であると判断された場合、当該第１近似画像と当該第２近似画像との比較結果、及び当該第１アクティビティ画像と当該第２アクティビティ画像との比較結果の双方を用いて動きベクトル候補を選出し、その他の場合、当該第１近似画像と当該第２近似画像との比較結果のみを用いて動きベクトル候補を選出する選出手段と、当該動きベクトル候補を用いて動きベクトルを検出する検出手段とを含む。
【００７３】
この構成によれば、動きベクトル検出装置は、動きベクトルの検出精度向上に寄与しないアクティビティ画像についてのブロックマッチング演算を省略することにより、動きベクトル検出についての演算量を削減すると共に、動きベクトル検出精度を従来程度の水準で維持できる。
（９）また本発明の動きベクトル検出装置は、原解像度でそれぞれ表された第１画像と第２画像との間の動きベクトルを検出する動きベクトル検出装置であって、当該原解像度よりも低くかつ段階的に低下する複数の縮小解像度の各々について、当該第１画像及び当該第２画像に含まれる低周波成分を当該縮小解像度でそれぞれ表す第１近似画像及び第２近似画像、及び当該第１画像及び第２画像に含まれる高周波成分を当該縮小解像度でそれぞれ表す第１アクティビティ画像及び第２アクティビティ画像を算出する算出手段と、各縮小解像度について、当該縮小解像度の第１アクティビティ画像により表された高周波成分の総量が所定閾値以上であるか否かを判断する判断手段と各縮小解像度について、当該閾値以上であると判断された場合、当該縮小解像度の第１近似画像と当該縮小解像度の第２近似画像との比較結果、及び当該縮小解像度の第１アクティビティ画像と当該縮小解像度の第２アクティビティ画像との比較結果の双方を用いて動きベクトル候補を選出し、その他の場合、当該縮小解像度の第１近似画像と当該縮小解像度の第２近似画像との比較結果のみを用いて動きベクトル候補を選出する選出手段と、当該動きベクトル候補を当該縮小解像度の低い順に段階的に用いて動きベクトルを検出する検出手段とを含む。
【００７４】
この構成によれば、（１）と同様の効果が得られる。
（１０）また、前記（９）の動きベクトル検出装置において、前記判断手段は、所定の縮小解像度よりも高い縮小解像度について、当該所定の縮小解像度における判断結果と同一の判断結果が得られたものとして前記判断を省略してもよい。
この構成によれば、当該所定の縮小解像度よりも高い縮小解像度のアクティビティ画像について、当該判断を行うための演算が不要となり、動きベクトル検出についての演算量を削減すると共に、動きベクトル検出精度を従来程度の水準で維持できる。
（１１）また、前記（８）乃至（１０）のいずれかの動きベクトル検出装置において、前記算出手段は、リフティング構成のウェーブレット変換を用いて、前記各近似画像及び各アクティビティ画像を算出してもよい。
【００７５】
この構成によれば、（１）と同様の効果が得られる。
（１２）また、前記（１１）の動きベクトル検出装置において、前記算出手段は、リフティング構成のＨａａｒウェーブレット変換を用いて、前記各近似画像及び各アクティビティ画像を算出してもよい。
この構成によれば、（１）と同様の効果が得られる。
（１３）また、前記（１１）又は（１２）の動きベクトル検出装置において、画像を複数の周波数帯域に分割する演算を行う前記リフティング構成のウェーブレット変換及び前記リフティング構成のＨａａｒウェーブレット変換について、前記算出手段は、前記各近似画像及び前記各アクティビティ画像の算出に用いる周波数帯域のみを抜き出す演算を行ってもよい。
【００７６】
この構成によれば、アクティビティ画像の算出に使用しない周波数帯域を抜き出すための演算を省略出来るため、演算量を削減すると共に、動きベクトル検出精度を従来程度の水準で維持できる。
（１４）また、前記（１３）の動きベクトル検出装置において、前記算出手段は、前記第１画像が、飛越走査画像であるか順次走査画像であるかによって、アクティビティ画像の算出に用いる周波数帯域を選択してもよい。
【００７７】
この構成によれば、アクティビティ画像の算出に使用しない周波数帯域を抜き出すための演算を省略出来るため、演算量を削減すると共に、動きベクトル検出精度を従来程度の水準で維持できる。
（１５）本発明のプログラムは、原解像度でそれぞれ表された第１画像と第２画像との間の動きベクトルを検出する動きベクトル検出装置をコンピュータを用いて実現するためのコンピュータ実行可能なプログラムであって、
当該第１画像及び当該第２画像に含まれる低周波成分を当該原解像度よりも低い縮小解像度でそれぞれ表す第１近似画像及び第２近似画像、及び当該第１画像及び第２画像に含まれる高周波成分を当該縮小解像度でそれぞれ表す第１アクティビティ画像及び第２アクティビティ画像を算出する算出ステップと、
当該第１アクティビティ画像により表された高周波成分の総量が所定閾値以上であるか否かを判断する判断ステップと、
当該閾値以上であると判断された場合、当該第１近似画像と当該第２近似画像との比較結果、及び当該第１アクティビティ画像と当該第２アクティビティ画像との比較結果の双方を用いて動きベクトル候補を選出し、その他の場合、当該第１近似画像と当該第２近似画像との比較結果のみを用いて動きベクトル候補を選出する選出ステップと、
当該動きベクトル候補を用いて動きベクトルを検出する検出ステップとを含む。
【００７８】
この構成によれば、当該プログラムを用いる動きベクトル検出装置は、（１）と同様の効果を有する。
【図面の簡単な説明】
【図１】動きベクトル検出装置１０の全体構成の一例を示している。
【図２】算出手段が算出する各データの関係を模式的に示した図である。
【図３】原画像、近似画像群及びアクティビティデータ群のデータ保持形式を示す。
【図４】動きベクトル検出装置の全体処理の概略を示すフローチャートである。
【図５】リフティング構成の２次元Ｈａａｒウェーブレット変換を説明する図である。
【図６】アクティビティデータとして（｜ＨＬ｜＋｜ＬＨ｜）／２を採用し、２次元Ｈａａｒウェーブレット変換を行う簡略化リフティングスキームの例である。
【図７】アクティビティデータとしてＬＨ成分を採用し、２次元Ｈａａｒウェーブレット変換を行う簡略化リフティングスキームの例である。
【図８】画像階層化を詳細に説明するフローである。
【図９】動きベクトル検出処理を詳細に説明するフローである。
【図１０】ａｐｘＢＬＫｎ（１，１）についてのブロックマッチングを説明する図である。
【図１１】動きベクトル検出について説明する図である。
【図１２】従来方式の画像階層化ステップを示す。
【符号の説明】
１０　動きベクトル検出装置
１０１　第１画像メモリ
１０２　第２画像メモリ
１０３　算出手段
１０４　階層化第１画像メモリ
１０５　階層化第２画像メモリ
１０６　判断手段
１０７　動きベクトル検出手段
２００　第１画像
２０１　第２画像
２０２　第１画像についての階層１の近似画像
２０３　第１画像についての階層１のアクティビティデータ
２０４　第２画像についての階層１の近似画像
２０５　第２画像についての階層１のアクティビティデータ
２０６　第１画像についての階層２の近似画像
２０７　第１画像についての階層２のアクティビティデータ
２０８　第２画像についての階層２の近似画像
２０９　第２画像についての階層２のアクティビティデータ
２１０　第１画像についての近似画像群
２１１　第１画像についてのアクティビティデータ群
２１２　第２画像についての近似画像群
２１３　第２画像についてのアクティビティデータ群
３００　原画像を表す階層化画像メモリ内容
３０１　２階層化後の階層化画像メモリ内容
３０２　３階層化後の階層化画像メモリ内容
３０３　小画素領域
３０４　小画素領域
５０１　メモリ領域Ａ列
５０２　Ｓ０での演算
５０３　Ｓ０での演算結果
５０４　Ｓ１での演算
５０５　Ｓ１での演算結果
５０６　Ｓ２での演算
５０７　Ｓ２での演算結果
５０８　Ｓ３での演算
５０９　Ｓ３での演算結果[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a motion vector detecting method and a motion vector detecting apparatus, and more particularly to a technique for detecting a motion vector by layering two temporally different image data and then performing block matching. About.
[0002]
2. Description of the Related Art MPEG2 (Moving Picture Experts Group 2), which is one of the international standards for a high-efficiency coding method for a moving picture signal, uses a discrete cosine transform, which is an orthogonal transform, to reduce redundancy in a spatial direction. This is a method that is classified as hybrid coding that uses inter-frame prediction / motion compensation to reduce directional redundancy.
[0003]
A block matching method is used for inter-frame prediction.
In the block matching method, the first image and the second image representing an image temporally different from the first image are divided into small rectangular blocks. A block of interest on the first image and a block at the same position as the block of interest on the second image are shifted by x pixels in the horizontal direction and y pixels in the vertical direction within a search range of ± m × ± n pixels. The similarity with the shifted block is evaluated by a predetermined evaluation function, and the shift amount v = (x, y) that minimizes the evaluation value is used as a motion vector.
[0004]
As the evaluation function, an absolute value difference is often used.
In the block matching method described above, since it is necessary to search all over the search range and find the absolute value difference, there is a problem that the amount of calculation becomes large, the device itself becomes large, and the calculation time becomes long.
One of the techniques for solving the above problem is a hierarchical motion vector detection method.
[0005]
This is because low-pass processing is performed on each of the first image and the second image to create a low-resolution approximate image (hereinafter referred to as layer 1), and similar low-pass processing is sequentially applied to higher-level layers. , An image hierarchization step of creating an approximate image (hierarchy 2, hierarchy 3,...), And in the lower hierarchy, motion vector candidates are detected in a small search area around the motion vector candidates detected in the upper hierarchy, and sequentially It comprises a motion vector detecting step of obtaining a motion vector candidate of a lower layer and finally detecting a motion vector of the original image.
[0006]
According to this method, a great reduction in the amount of calculation can be expected. On the other hand, the high-frequency component, which is the feature amount of the image, is removed in the higher hierarchy by the low-pass processing, so that there is a problem that the accuracy of motion vector detection is greatly reduced. .
To solve this problem, Japanese Patent Application Laid-Open No. 7-222157 discloses a hierarchical motion vector detection method in which high-frequency components of an image selected by low-pass processing are used as an activity image (also referred to as activity data) for block matching to improve detection accuracy. A method using an averaging process has been proposed for the low-pass process.
[0007]
However, in the conventional method, a large amount of calculation is required to calculate the activity data in order to perform highly accurate motion vector detection. Further, the higher the resolution and the higher the quality of an image, the more the amount of calculation to be executed per unit time is increasing, which leads to an increase in power consumption and heat generation when the calculation is realized by an LSI.
[0008]
In view of the above problems, an object of the present invention is to provide a motion vector detection method that can reduce the amount of calculation in the image layering step and the motion vector detection step, and can maintain the motion vector detection accuracy at a conventional level. I do.
[0009]
[Means for Solving the Problems]
In order to solve the above problem, a motion vector detection method according to the present invention is a motion vector detection method for detecting a motion vector between a first image and a second image each represented by an original resolution. The first approximate image and the second approximate image representing the low-frequency component included in the image and the second image at a reduced resolution lower than the original resolution, respectively, and the high-frequency component included in the first image and the second image, A calculating step of calculating a first activity image and a second activity image each represented by a reduced resolution; a determining step of determining whether or not the total amount of high frequency components represented by the first activity image is equal to or greater than a predetermined threshold; If it is determined that the value is equal to or greater than the threshold, the comparison result between the first approximate image and the second approximate image, and the first activity image and the A selection step of selecting a motion vector candidate using both the comparison result with the two activity images, and in other cases, selecting a motion vector candidate using only the comparison result between the first approximate image and the second approximate image And a detecting step of detecting a motion vector using the motion vector candidate.
[0010]
BEST MODE FOR CARRYING OUT THE INVENTION
A motion vector detection device according to an embodiment of the present invention will be described with reference to the drawings.
1. overall structure
FIG. 1 is a block diagram illustrating an overall configuration of a motion vector detection device 10 according to the present embodiment. The motion vector detecting device 10 includes a first image memory 101, a second image memory 102, a calculating unit 103, a hierarchical first image memory 104, a hierarchical second image memory 105, a determining unit 106, and a motion vector detecting unit 107. Is done.
[0011]
The motion vector detection device 10 is specifically realized by software and hardware such as a processor, a ROM (Read Only Memory) storing a program, and a work RAM (Random Access Memory). The function of each component is realized by the processor executing a program stored in the ROM. The transfer of data between the components is performed via hardware such as a RAM.
[0012]
The first image memory 101 holds the input first image.
The second image memory 102 holds the input second image.
The calculating means 103 converts the first image held in the first image memory 101 and the second image held in the second image memory 102 into a first hierarchical image memory 104 and a second hierarchical image memory 105, respectively. And two sets of approximate image groups showing low-frequency components included in each of the read first image and second image with a plurality of resolutions lower and stepwise lower than each image; Two sets of activity data groups indicating the high frequency components included in each of the image and the second image with the plurality of resolutions are calculated.
[0013]
The hierarchical first image memory 104 holds the approximate image group and the activity data group calculated from the first image by the calculation unit 103.
The hierarchical second image memory 105 holds the approximate image group and the activity data group calculated from the second image by the calculation unit 103.
The determining unit 106 evaluates the importance of the activity data of each layer calculated from the first image by the calculating unit 103, and determines whether to use the activity data of each layer of the first image and the second image for motion vector detection. to decide.
[0014]
The motion vector detecting means 107 detects a motion vector by sequentially using the motion vector candidates selected in the low-resolution hierarchy for selecting the motion vector candidates in the high-resolution hierarchy.
2. data structure
FIG. 2 is a diagram schematically showing the relationship between the respective data calculated by the calculating means 103.
[0015]
Reference numerals 200 and 201 denote a first image and a second image which are input original images.
Reference numerals 202 and 203 denote an approximate image which is a low-pass component calculated from the first image 200 and activity data generated using the high-pass component. Reference numerals 204 and 205 denote an approximate image calculated from the second image 201 and Indicates activity data.
[0016]
The 202, 203, 204 and 205 belong to the first layer.
Similarly, 206 and 207 indicate the approximate image and activity data calculated from the approximate image 202, and 208 and 209 indicate the approximate image and activity data calculated from the approximate image 204.
The above 206, 207, 208 and 209 belong to layer 2.
[0017]
Reference numeral 210 denotes an approximate image group including 202 and 206, and 211 denotes an activity data group including 203 and 207.
Similarly, reference numeral 212 denotes an approximate image group including 204 and 208, and reference numeral 213 denotes an activity data group including 205 and 209.
Here, the notation will be described.
[0018]
The first image and the second image are divided into blocks (16 pixels × 16 pixels), which are units for performing block matching in the original image, and the first image block at the i-th position in the horizontal direction and the j-th position in the vertical direction Is denoted as BLK (i, j), and the block of the second image is denoted as rBLK (i, j).
Similarly, an approximate image block that is a block in the approximate image of the hierarchy n calculated from the first image is denoted as apxBLKn (i, j), and an activity block that is a block in the activity data of the hierarchy n is actBLKn (i, j). Notation.
Also, an approximate image block of hierarchy n calculated from the second image is denoted as rapxBLKn (i, j), and an activity block at hierarchy n calculated from the second image is denoted as ractBLKn (i, j).
[0019]
The number of pixels of each block of apxBLKn (i, j), rapxBLKn (i, j), actBLKn (i, j) and ractBLKn (i, j) is the number of pixels of BLK (i, j) and rBLK (i, j). It is 1 / (2 n) in each of the horizontal and vertical directions as compared to the number.
Next, an image layering process for calculating an approximate image group and an activity data group from an original image will be described with reference to FIG.
[0020]
FIG. 3 shows the contents of the hierarchical first image memory 104 and the hierarchical second image 105 during the image hierarchical processing.
Here, an example in which the first image is hierarchized will be described, and the description of the second image will be redundant.
The contents of the hierarchical first image memory 104 change from 300 in FIG. 3 to 301 and 302 as the image hierarchical processing proceeds.
[0021]
Reference numeral 300 denotes the content of the hierarchical first image memory 104 when the calculating unit 103 reads out the first image from the first image memory 102 to the hierarchical first image memory 104.
Reference numeral 301 denotes the content of the hierarchical first image memory 104 when the approximate image and the activity data of the layer 1 are calculated from 300.
[0022]
Reference numeral 302 denotes the content of the hierarchical first image memory 104 when the approximate image of the layer 2 and the activity data are calculated from the approximate image of the layer 1.
In FIG. 3, am, bm, cm, and dm (m is an arbitrary natural number) respectively represent pixel data constituting the original image.
In FIG. 3, apxnm (m and n are both natural numbers) represents pixel data forming an approximate image, and actnm (m and n are both natural numbers) represents data forming activity data.
[0023]
Further, m and n are arbitrary natural numbers, and n indicates a layer to which each data belongs.
The calculation means 103 divides 300 into small pixel regions of 2 pixels × 2 pixels, performs a wavelet transform of a fixed position calculation for each of the small pixel regions for each small pixel region, and obtains an approximate image of layer 1 and activity data. .
[0024]
Details of the wavelet transform for the fixed position calculation will be described later.
Reference numeral 303 denotes a small pixel area of 2 pixels × 2 pixels having pixel data a1, b1, c1, and d1.
As a result of the calculation means 103 performing the wavelet transform for the fixed position calculation on the small pixel area 303, the contents of the 303 become the contents indicated by 304 consisting of pixel data apx11 representing an approximate image and data act11 representing activity data.
[0025]
Here, two apx11 are included in 304, and the first apx11 is used when detecting a motion vector.
The second apx11 is used for an operation for calculating an approximate image and activity data of the layer 2, and the value is rewritten during the operation.
As a result of performing the wavelet transform on all the small pixel areas in 300, the content of the hierarchical first image memory 104 becomes the content represented by 301.
[0026]
Regarding 301, a group of pixels indicated by apx1m constitutes an approximate image of the first layer, and corresponds to 202 in FIG.
A collection of data indicated by act1m forms the activity data of the first tier, and corresponds to 203 in FIG.
The number of pixels of the approximate image and the activity data of the hierarchy 1 calculated as described above is １／ each in the horizontal and vertical directions as compared with the number of pixels in the original image.
[0027]
Further, the approximate image of the layer 2 and the activity data of the layer 2 are calculated using the approximate image of the layer 1.
The pixel data representing the approximate image in 301 is divided into small pixel regions of 2 pixels × 2 pixels, and a wavelet transform is performed for all the small pixel regions for each small pixel region, thereby obtaining an approximate image of layer 2 and activity data.
[0028]
In FIG. 3, apx11, apx12, apx13, and apx14, which represent the approximate image of the hierarchy 1 and are hatched in 301, are defined as a small pixel area of 2 pixels × 2 pixels. As a result, apx21, which is pixel data representing an approximate image of layer 2 shaded in 302 in FIG. 3, and act21, which is data representing activity data of layer 2, are obtained.
[0029]
As a result of performing the wavelet transform on all the small pixel areas in 301, the content of the hierarchical first image memory 104 becomes the content represented by 302.
Regarding 302, a group of pixels indicated by apx2m forms an approximate image of the hierarchy 2 and corresponds to 206 in FIG.
A collection of data indicated by act2m constitutes the activity data of the hierarchy 2 and corresponds to 207 in FIG.
[0030]
The number of pixels of the approximate image and the activity data of the hierarchy 2 calculated as described above is なる each in the horizontal direction and the vertical direction as compared with the number of pixels of the approximate image and the activity data of the hierarchy 1.
The number of pixels of the approximate image and the activity data of the layer n is 1 / (2 n) in the horizontal direction and the vertical direction, respectively, as compared with the original image.
[0031]
In the content 302 of the layered first image memory 104 when the three layers are formed, data representing the approximate image and the activity data of the layer 1 and data representing the approximate image and the activity data of the layer 2 are mixed. I do.
The second image is also subjected to the layering process as in the case of the first image, and the calculated approximate image group and activity data group are stored in the layered second image memory 105.
3. processing
3.1 Overall processing
FIG. 4 is a flowchart showing an outline of the overall processing of the motion vector detection device 10.
[0032]
The calculation unit 103 performs step S401, which is a process of calculating an approximate image group and an activity data group, and the determination unit 106 and the motion vector detection unit 107 perform step S402, which is a motion vector detection process.
3.2 Calculation of approximate image group and activity data group
In step S401, the calculating unit 103 reads the first image in the first image memory 101 into the hierarchical first image memory 104, and reads the second image in the second image memory 102 into the hierarchical second image memory 105. Read and apply a lifting wavelet transform to each image.
[0033]
The lifting configuration is a configuration method in which the discrete wavelet transform is executed by the fixed position calculation, and has advantages such as a small memory usage and a small address decoding process.
For details, see the known paper W.S. Sweldens, The lifting scheme: A custom design construction of biological waves, J. Amer. Appl. Comput. Since it is described in Harmonic Analysis, 3 (1996), the description is omitted here.
3.2.1 Haar wavelet transform
In step S401, a Haar wavelet transform having a lifting configuration for calculating an average value as a low-pass component and calculating a deviation as a high-pass component is used.
[0034]
FIG. 5 shows an example in which a two-dimensional Haar wavelet transform having a lifting configuration is performed on a small pixel area of 2 pixels × 2 pixels.
Each of the first image and the second image read into the hierarchical first image memory 104 and the hierarchical second image memory 105 is divided into a small pixel area including 2 pixels × 2 pixels as in 303 in FIG. The lifting wavelet transform is applied to each small pixel area.
[0035]
Here, an example will be described in which a lifting configuration wavelet transform is applied to a small pixel region in which pixel data is a, b, c, and d, respectively.
Each row in FIG. 5 is an operation to be performed on data in the memory area and its result, and indicates that processing is progressing sequentially from top to bottom, each column corresponds to each memory area, and the rightmost column is used. Indicates the type of operation to be performed.
[0036]
At the start of the calculation, the pixel data a is stored in the memory area A, and the pixel data b, c, and d are stored in the memory areas B, C, and D, respectively.
Here, only the memory area A will be described, and the description of the memory area B, the memory area C, and the memory area D will be omitted because it is the same as that of the memory area A.
[0037]
Each column of a column 501 of the memory area A in FIG. The content of A is shown.
Regarding 501, the operation from S0 to S3 will be described in order.
In S0, the calculation is not performed in 502, and the initial data a shown in 503 is stored in the memory area A.
[0038]
When shifting from S0 to S1, an operation 504 for adding (A + B) the content a of the memory area A and the content b of the memory area B is performed, and the content of the memory area A becomes an operation result a + b indicated by 505.
When shifting from S1 to S2, an operation 506 is performed (A + C) for adding a + b which is the content of the memory area A in S1 and c + d which is the content of the memory area C in S1, and the content of the memory area A is 607. The calculation result is a + b + c + d.
[0039]
When shifting from S2 to S3, a division (shift) operation 508 is performed on the content of the memory area A in S2, and the content of the memory area A is an operation result (a + b + c + d) / 4 indicated by 509.
As a result of performing from S0 to S3, (a + b + c + d) / 4 of the LL component, which is a low-pass component, is stored in the memory area A, and similarly, HL component, which is an edge enhancement component in the horizontal direction, is stored in the memory area B. In the memory area C, (c + d) / 2- (a + b) / 2, which is an LH component which is an edge enhancement component in the vertical direction, and in the memory area D, (b + d) / 2- (a + c) / 2 (Dc)-(ba), which is an HH component that is an edge enhancement component in the horizontal and vertical directions, is stored.
[0040]
Here, the LL component is a low-pass component, and the LH, HL, and HH components are high-pass components.
The calculating means 103 can further reduce the amount of calculation by calculating only the high-pass components used for calculating the activity data.
As the high-pass component used for calculating the activity data, an HH component that is easily affected by noise is not used as the activity data. When the first image is a frame image, an HL component that is a horizontal edge enhancement component and a vertical component are used. The average value of the absolute value (| HL | + | LH |) / 2 with the LH component which is the edge enhancement component in the direction is used. In the case of a field image, the band is limited in the vertical direction as compared with the horizontal direction. Use only components.
3.2.2 Simplified lifting scheme
The high-pass components that are not used for calculating the activity data need not be calculated, and the calculating unit 103 simplifies the lifting configuration by omitting unnecessary calculations in the process of calculating the high-pass components in the wavelet transform of the lifting configuration. (Hereinafter referred to as a simplified lifting scheme).
[0041]
FIG. 6 is an example of a simplified lifting scheme that employs (| HL | + | LH |) / 2 as activity data and performs two-dimensional Haar wavelet transform.
When the activity data is (| HL | + | LH |) / 2, it is not necessary to calculate the HH component, so that the calculation for the memory area D where the HH component is finally calculated is omitted, and the entire processing is performed. To simplify.
[0042]
FIG. 7 is an example of a simplified lifting scheme that employs LH components as activity data and performs two-dimensional Haar wavelet transform.
In this case, since it is not necessary to calculate the HL component and the HH component, the number of operations that can be omitted further increases, and the effect of reducing the amount of operation increases.
Here, the LL component, which is an approximate value in both cases of FIGS. 6 and 7, is rewritten by the fixed-position calculation in the next layer. Is copied to the memory area D of the high-pass component that is not adopted.
3.2.3 Computational amount comparison with conventional example
FIG. 12 shows a conventional image hierarchization step.
[0043]
In the image hierarchization step of the conventional method, an average value Avg = (a + b + c + d) / 4 is used as an approximate value for data a, b, c, and d in a small pixel area, and Act = (| a-Avg | + | B-Avg | + | c-Avg | + | d-Avg |) / 4 is used.
As shown in FIG. 12, the calculations required to calculate the approximate image and the activity for the small pixel region using the conventional method include three additions, four subtractions, three absolute value additions, and two shift operations (division). It becomes.
[0044]
On the other hand, when the simplified lifting scheme of the present invention is used, in the case of FIG. 6, the above operations are four additions, three subtractions, one absolute value addition, and two shift operations. This can be replaced with addition, and two absolute value additions can be reduced.
In addition, in the case of FIG. 7, three additions, one subtraction, zero absolute value addition, and two shift operations are performed, and three subtractions and three absolute value additions can be reduced compared to the conventional method.
3.2.4 Application in first and second hierarchical image memories
FIG. 8 is a flowchart illustrating Step S401 in detail.
[0045]
In the present embodiment, a description will be given of a three-layer structure in which an approximate image and activity data of layer 1 are calculated from an original image, and an approximate image and activity data of layer 2 are calculated from the approximate image of layer 1.
In step S801, the layer n to be operated is set to 0, and in step S802, the number N of layers is set to 3.
[0046]
In step S803, it is determined that n> (N-1). If YES, the process ends. If NO, the process proceeds to step S804.
In step S804, it is determined whether the wavelet transform has been performed for all the small pixel areas in the original image (when n = 0) or the approximate image (when n ≠ 0) of the hierarchy n, and in the case of YES, n is determined in step S806. = N + 1, the process proceeds to step S803, and in the case of NO, the process proceeds to step S805, and the wavelet transform is performed on the small pixel region where the calculation is not performed.
[0047]
The wavelet transform for the small pixel area performed in step S805 is as described above.
3.3 Motion vector detection processing
Details of the motion vector detection processing in step S402 in FIG. 4 will be described with reference to FIGS.
[0048]
FIG. 9 illustrates the process of step S402 in detail.
Here, in FIG. 9, an example of detection of a motion vector of BLK (1, 1) which is a block at the first position in the horizontal direction and the first position in the vertical direction in the original image will be described.
In step S901, it is determined whether or not motion vectors of all blocks have been detected in the original image.
[0049]
If detected, the process ends.
If not, the process proceeds to step S902.
In step S902, a block in which a motion vector has not been detected is selected from the original image.
In this embodiment, it is assumed that a motion vector of BLK (1, 1) is detected.
[0050]
In step S903, the hierarchy n to be calculated is set to N-1.
Since the image is divided into three layers in step S401, N = 3.
In step S904, the importance of the activity block in the highest hierarchy is evaluated.
Here, the importance of actBLK2 (1, 1), which is the highest-level activity block calculated from BLK (1, 1), is evaluated.
[0051]
The evaluation of the importance of the activity block actBLKn (1, 1) is performed by the judging means 106 calculating the sum of data of each pixel data value in actBLKn (1, 1), and the sum of the data is equal to or larger than a preset threshold α. If it is, the activity block is determined to be high in importance, and if less than the threshold value α, the activity block is determined to be low in importance.
[0052]
When the threshold α is set high, the number of activity blocks determined to be low in importance increases, and when the threshold α is set low, the activity blocks determined to be low in importance decrease.
For example, when the block size is 2 pixels × 2 pixels, if the pixel value is 256 gradations, the sum of the data in the activity block is 0 to 255 × 4, and 25% of the maximum value of the sum in the block is set to the threshold value. , The threshold becomes 255.
[0053]
When the high-pass component of the image after the wavelet transform is used as the activity data, the activity data in a uniform image region with small fluctuations has a negligible value, and does not contribute to the improvement of the detection accuracy in the motion vector detection step described later.
Therefore, the determination unit 106 determines that the activity data is used for the block matching when the importance of the activity block is high, and that the activity data is not used for the block matching when the importance of the activity block is low.
[0054]
If the determination unit 106 determines that the activity block is to be used, the process proceeds to step S905; otherwise, the process proceeds to step S906.
In step S905, the motion vector detection unit 107 performs block matching using the approximate image and the activity data to detect a motion vector candidate MVC (N-1).
[0055]
FIG. 10 is a diagram illustrating block matching for apxBLKn (1, 1).
FIG. 11 is a diagram illustrating motion vector detection.
When performing block matching for apxBLKn (1,1), the similarity between apxBLKn (1,1) and the shift block is evaluated using an evaluation function.
[0056]
As shown in FIG. 10, a shift block refers to rapxBLKn (i, j), which is the same i, j for apxBLKn (i, j), within the search area in the horizontal and vertical directions in pixel units. It is staggered.
Further, as the evaluation function, a deviation absolute sum, a deviation square sum, or the like is often used. FIG. 10 also describes block matching for an activity block by replacing apxBLKn (1,1) with actBLKn (1,1) and replacing rapxBLKn (1,1) with ractBLKn (1,1). It becomes a figure.
[0057]
In the block matching for actBLKn (1,1), similar to the case of apxBLKn (1,1), the similarity to a shifted block obtained by shifting ractBLKn (1,1) in pixel units within the range of the search area is evaluated. Evaluate by function.
The result obtained by adding the evaluation value of the apxBLKn (1,1) by the evaluation function and the evaluation value of the actBLKn (1,1) by the evaluation function and a value with a weight w (0 <w ≦ 1) is added to the block matching. This is the final evaluation value.
[0058]
The shift amount of the shift block that minimizes the final evaluation value of the block matching is the motion vector.
In the case of the highest hierarchy n = 2, the motion vector detecting means 107 determines that the search region SR2 (1,1) (horizontal ± 12 pixels, vertical ± 8 pixels) for the apxBLK2 (1,1) and the actBLK2 (1,1). The block matching is performed within the pixel) to calculate a motion vector candidate MVC (2) (1,1) shown in FIG.
[0059]
In step S906, the block matching is performed on the apxBLK2 (1,1) in the search region SR2 (1,1) (± 12 pixels in the horizontal direction and ± 8 pixels in the vertical direction) without using actBLK2 (1,1). , The motion vector candidate MVC (2) (1, 1) shown in FIG.
In step S907, n = n-1 is set for the hierarchy n to be operated.
[0060]
In step S908, it is determined whether or not n is 0 representing the original image.
If n = 0, the process proceeds to step S912.
If n ≠ 0, the process proceeds to step S909.
In step S909, the determination unit 106 evaluates the importance of the activity block in the hierarchy n.
[0061]
In the present embodiment, the importance of actBLK1 (1,1), which is the activity block of hierarchy 1 calculated from BLK (1,1), is evaluated.
The importance is evaluated using the same method as the method performed in step S904.
If the determining unit 106 determines that the activity block is to be used, the process proceeds to step S910; otherwise, the process proceeds to step S911.
[0062]
In step S910 and step S911, the motion vector detection unit 107 detects a motion vector candidate MVC (n) (i, j) using the approximate image and the activity data in the top layer and the layer n excluding the original image.
As a search area in block matching, a small search area is set using MVC (n + 1) (i, j).
[0063]
In hierarchy 1 where n = 1, as shown in FIG. 11, MVC (2) (1,1) is located in search area SR1 (1,1) (± 24 pixels in the horizontal direction and ± 16 pixels in the vertical direction). A small search area SSR1 (1,1) of ± 2 pixels is set from the position where is doubled. Here, in step S910, block matching is performed for apxBLK1 (1,1) and actBLK1 (1,1) within the small search region SSR1 (1,1), and a motion vector candidate MVC (1) (1,1) is obtained. In step S911, block matching is performed only on apxBLK1 (1, 1) in the small search area SSR1 (1, 1), and a motion vector candidate MVC (1) (1, 1) is calculated.
[0064]
In step S912, a motion vector is detected in the original image.
For block matching, a first image and a second image, which are original images, are used.
As shown in FIG. 11, a position where the MVC (1) (1, 1) is doubled in the search area SR (1, 1) (± 48 pixels in the horizontal direction and ± 32 pixels in the vertical direction) in the original image and its surroundings A small search area SSR (1, 1) of ± 2 pixels in the horizontal and vertical directions is set, block matching is performed in the SSR (1, 1), and a motion vector MV (1, 1) with integer pixel precision is detected.
4. Conclusion
As described above, the motion vector detection device 10 does not contribute to the improvement of the motion vector detection accuracy in calculating the hierarchical image in which the two sets of the approximate image group and the activity data group are calculated from the first image and the second image. The calculation amount is reduced by omitting the calculation for calculating the high-frequency component. In the block matching, the calculation amount is reduced by omitting the block matching for the activity data that does not contribute to the improvement of the motion vector detection accuracy. Reduce computational complexity. In addition, since a high-pass component that is easily affected by noise is not used for calculating the activity data, block matching robust to noise is performed.
(Other modifications)
Although the present invention has been described based on the above embodiment, it is needless to say that the present invention is not limited to the above embodiment. The following cases are also included in the present invention.
(1) The present invention may be a method including the steps described in the embodiment. Further, the method may be a computer program for realizing the method using a computer system, or may be a digital signal representing the program.
[0065]
Further, the present invention is a computer-readable recording medium on which the program or the digital signal is recorded, for example, a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, semiconductor memory, and the like. It may be.
(2) A small search area SSRn is set in the search area SRn in the hierarchy between the highest hierarchy and the original image. In the example described in the embodiment, the size of the SSRn is set to ± 2 pixels in the horizontal and vertical directions. However, it may be ± 1 pixel, or may be large for suppressing erroneous detection.
(3) The determination result of whether or not to use the activity block for block matching performed by the determination means in the highest hierarchy may be used in the lower hierarchy.
(4) A plurality of motion vector candidates MVC may be employed in the order of smaller block matching evaluation values in the result of the full search in the highest hierarchy, and the plurality of motion vector candidates may be used in the lower hierarchy.
(5) A method for evaluating the importance of activity data and omitting block matching of unimportant data in a motion vector detection device that performs motion vector detection using hierarchically approximated images and activity data is described in the embodiment. It may be used when hierarchizing images by a method other than the wavelet transform shown in the illustrated embodiment.
[0066]
【The invention's effect】
(1) A motion vector detection method according to the present invention is a motion vector detection method for detecting a motion vector between a first image and a second image each represented by an original resolution, wherein the first image and the second The first approximate image and the second approximate image representing the low-frequency components included in the two images at a reduced resolution lower than the original resolution, and the high-frequency components included in the first and second images are respectively represented at the reduced resolution. A calculating step of calculating the first activity image and the second activity image to be represented, a determining step of determining whether or not the total amount of the high-frequency components represented by the first activity image is equal to or greater than a predetermined threshold; If it is determined that there is, the comparison result between the first approximate image and the second approximate image, and the first activity image and the second activity A selection step of selecting a motion vector candidate using both the comparison result with the image and, in other cases, selecting a motion vector candidate using only the comparison result between the first approximate image and the second approximate image; Detecting a motion vector using the motion vector candidate.
[0067]
According to this configuration, the motion vector detection method reduces the amount of calculation for motion vector detection by omitting the block matching calculation for an activity image that does not contribute to improving the detection accuracy of the motion vector, and reduces the motion vector detection accuracy. Can be maintained at a conventional level.
(2) Further, the motion vector detection method of the present invention is a motion vector detection method for detecting a motion vector between the first image and the second image represented by the original resolution, respectively. For each of a plurality of reduced resolutions that are low and gradually reduced, a first approximate image and a second approximate image each representing a low frequency component included in the first image and the second image with the reduced resolution, and A calculating step of calculating a first activity image and a second activity image representing the high-frequency components included in the one image and the second image at the reduced resolution, respectively, and each reduced resolution is represented by a first activity image of the reduced resolution. The determination step of determining whether or not the total amount of the high-frequency components is equal to or greater than a predetermined threshold value, and determining that each of the reduced resolutions is equal to or greater than the threshold value. In this case, both the comparison result of the first approximate image of the reduced resolution with the second approximate image of the reduced resolution and the comparison result of the first activity image of the reduced resolution and the second activity image of the reduced resolution are both included. And selecting a motion vector candidate by using only the comparison result between the first approximate image of the reduced resolution and the second approximate image of the reduced resolution in other cases. A step of detecting a motion vector using the motion vector candidates step by step in ascending order of the reduced resolution.
[0068]
According to this configuration, the same effect as (1) can be obtained.
(3) In the motion vector detection method according to (2), in the determination step, for a reduced resolution higher than a predetermined reduced resolution, the same determination result as the determination result at the predetermined reduced resolution is obtained. The above determination may be omitted.
[0069]
According to this configuration, for an activity image having a reduced resolution higher than the predetermined reduced resolution, the calculation for making the determination is unnecessary, and the amount of calculation for motion vector detection is reduced, and the motion vector detection accuracy is reduced. It can be maintained at a standard level.
(4) In the motion vector detecting method according to any one of (1) to (3), the calculating step may calculate the approximate images and the activity images using a wavelet transform having a lifting configuration. Good.
[0070]
According to this configuration, the same effect as (1) can be obtained.
(5) In the motion vector detection method according to (4), the calculating step may calculate the approximate images and the activity images by using a Haar wavelet transform having a lifting configuration.
According to this configuration, the same effect as (1) can be obtained.
(6) In the motion vector detection method according to (4) or (5), the calculation is performed on the lifting wavelet transform and the lifting Haar wavelet transform that perform an operation of dividing an image into a plurality of frequency bands. The step may perform an operation of extracting only a frequency band used for calculating each of the approximate images and each of the activity images.
[0071]
According to this configuration, the calculation for extracting the frequency band not used for calculating the activity image can be omitted, so that the calculation amount can be reduced.
(7) In the motion vector detection method according to (6), the calculating step includes determining a frequency band used for calculating an activity image depending on whether the first image is an interlaced scanning image or a progressive scanning image. You may choose.
[0072]
According to this configuration, since the calculation for extracting the frequency band not used for calculating the activity image can be omitted, the calculation amount can be reduced, and the motion vector detection accuracy can be maintained at a level comparable to the conventional level.
(8) The motion vector detection device of the present invention is a motion vector detection device that detects a motion vector between a first image and a second image each represented by an original resolution. The first approximate image and the second approximate image representing the low-frequency components included in the two images at a reduced resolution lower than the original resolution, and the high-frequency components included in the first and second images are respectively represented at the reduced resolution. Calculating means for calculating the first activity image and the second activity image to be represented; determining means for determining whether or not the total amount of high-frequency components represented by the first activity image is equal to or greater than a predetermined threshold; If it is determined that there is, the comparison result between the first approximate image and the second approximate image and the comparison between the first activity image and the second activity image Selecting means for selecting a motion vector candidate using both of the comparison results, and in other cases, selecting means for selecting a motion vector candidate using only the comparison result between the first approximate image and the second approximate image; Detecting means for detecting a motion vector using the candidate.
[0073]
According to this configuration, the motion vector detection device reduces the amount of calculation for motion vector detection by omitting the block matching calculation for the activity image that does not contribute to the improvement of the detection accuracy of the motion vector. Can be maintained at a conventional level.
(9) Further, the motion vector detecting device of the present invention is a motion vector detecting device which detects a motion vector between the first image and the second image each represented by the original resolution, and is lower than the original resolution. And a first approximate image and a second approximate image each representing a low-frequency component included in the first image and the second image with the reduced resolution for each of the plurality of reduced resolutions that gradually decrease, and the first approximate image. Calculating means for calculating a first activity image and a second activity image representing the high-frequency components included in the image and the second image at the reduced resolution, respectively, and each reduced resolution is represented by the first activity image at the reduced resolution. Judgment means for judging whether or not the total amount of high-frequency components is equal to or greater than a predetermined threshold value, and for each reduced resolution, when it is determined that the total amount is equal to or greater than the threshold value. Using both the comparison result of the first approximate image of the reduced resolution and the second approximate image of the reduced resolution, and the comparison result of the first activity image of the reduced resolution and the second activity image of the reduced resolution. A selecting means for selecting a motion vector candidate, and in other cases, selecting a motion vector candidate using only a comparison result between the first approximate image of the reduced resolution and the second approximate image of the reduced resolution; And a detecting means for detecting a motion vector by using stepwise in order of the reduced resolution.
[0074]
According to this configuration, the same effect as (1) can be obtained.
(10) In the motion vector detecting device according to (9), the determination unit obtains the same determination result as the determination result at the predetermined reduced resolution for a reduced resolution higher than the predetermined reduced resolution. The above determination may be omitted.
According to this configuration, for an activity image having a reduced resolution higher than the predetermined reduced resolution, the calculation for making the determination is unnecessary, and the amount of calculation for motion vector detection is reduced, and the motion vector detection accuracy is reduced. It can be maintained at a standard level.
(11) In the motion vector detecting device according to any one of (8) to (10), the calculating unit may calculate each of the approximate images and each of the activity images using a wavelet transform having a lifting configuration. Good.
[0075]
According to this configuration, the same effect as (1) can be obtained.
(12) In the motion vector detecting device according to (11), the calculation means may calculate each of the approximate images and each of the activity images using a Haar wavelet transform having a lifting configuration.
According to this configuration, the same effect as (1) can be obtained.
(13) In the motion vector detecting device according to (11) or (12), the calculation is performed on the lifting wavelet transform and the lifting Haar wavelet transform that perform an operation of dividing an image into a plurality of frequency bands. The means may perform an operation of extracting only a frequency band used for calculating each of the approximate images and each of the activity images.
[0076]
According to this configuration, since the calculation for extracting the frequency band not used for calculating the activity image can be omitted, the calculation amount can be reduced, and the motion vector detection accuracy can be maintained at a level comparable to the conventional level.
(14) In the motion vector detecting device according to (13), the calculating means determines a frequency band used for calculating an activity image depending on whether the first image is an interlaced scanning image or a progressive scanning image. You may choose.
[0077]
According to this configuration, since the calculation for extracting the frequency band not used for calculating the activity image can be omitted, the calculation amount can be reduced, and the motion vector detection accuracy can be maintained at a level comparable to the conventional level.
(15) A computer-executable program for realizing, by using a computer, a motion vector detecting device that detects a motion vector between a first image and a second image each represented by an original resolution. And
A first approximation image and a second approximation image representing low-frequency components included in the first image and the second image at a reduced resolution lower than the original resolution, respectively, and a high-frequency component included in the first image and the second image; A calculating step of calculating a first activity image and a second activity image each representing a component at the reduced resolution;
A determining step of determining whether or not the total amount of the high-frequency components represented by the first activity image is equal to or greater than a predetermined threshold;
If it is determined that the motion vector is equal to or greater than the threshold value, the motion vector is calculated using both the comparison result between the first approximate image and the second approximate image and the comparison result between the first activity image and the second activity image. Selecting a candidate, and in other cases, selecting a motion vector candidate using only a comparison result between the first approximate image and the second approximate image;
Detecting a motion vector using the motion vector candidate.
[0078]
According to this configuration, the motion vector detecting device using the program has the same effect as (1).
[Brief description of the drawings]
FIG. 1 shows an example of the overall configuration of a motion vector detection device 10.
FIG. 2 is a diagram schematically showing a relationship between respective data calculated by a calculation unit.
FIG. 3 shows a data holding format of an original image, an approximate image group, and an activity data group.
FIG. 4 is a flowchart showing an outline of the overall processing of the motion vector detecting device.
FIG. 5 is a diagram illustrating a two-dimensional Haar wavelet transform having a lifting configuration.
FIG. 6 is an example of a simplified lifting scheme that employs (| HL | + | LH |) / 2 as activity data and performs two-dimensional Haar wavelet transform.
FIG. 7 is an example of a simplified lifting scheme that employs an LH component as activity data and performs a two-dimensional Haar wavelet transform.
FIG. 8 is a flowchart for explaining image layering in detail.
FIG. 9 is a flowchart illustrating a motion vector detection process in detail.
FIG. 10 is a diagram illustrating block matching for apxBLKn (1, 1).
FIG. 11 is a diagram illustrating motion vector detection.
FIG. 12 shows a conventional image layering step.
[Explanation of symbols]
10. Motion vector detection device
101 first image memory
102 Second image memory
103 Calculation means
104 Hierarchical first image memory
105 Hierarchical second image memory
106 Judgment means
107 Motion vector detecting means
200 First image
201 Second image
202 Approximate image of layer 1 for first image
203 Level 1 activity data for the first image
204 Approximate image of layer 1 for second image
205 Activity data of layer 1 for the second image
206 Approximate image of layer 2 for first image
207 Tier 2 activity data for first image
208 Approximate image of layer 2 for second image
209 Level 2 activity data for the second image
210 Approximate image group for first image
211 Activity data group for the first image
212 Approximate image group for second image
213 Activity data group for second image
300 Hierarchical image memory contents representing original image
301 Layered image memory contents after two layers
302 Contents of layered image memory after three layers
303 small pixel area
304 small pixel area
501 column of memory area A
Operation at 502 S0
Calculation result at 503 S0
504 Operation in S1
Calculation result at 505 S1
506 Operation in S2
507 Operation result in S2
508 Operation in S3
509 Calculation result in S3

Claims

A motion vector detection method for detecting a motion vector between a first image and a second image each represented at an original resolution,
A first approximation image and a second approximation image representing low-frequency components included in the first image and the second image at a reduced resolution lower than the original resolution, respectively, and a high-frequency component included in the first image and the second image; A calculating step of calculating a first activity image and a second activity image each representing a component at the reduced resolution;
A determining step of determining whether or not the total amount of the high-frequency components represented by the first activity image is equal to or greater than a predetermined threshold;
If it is determined that the motion vector is equal to or greater than the threshold value, the motion vector is calculated using both the comparison result between the first approximate image and the second approximate image and the comparison result between the first activity image and the second activity image. Selecting a candidate, and in other cases, selecting a motion vector candidate using only a comparison result between the first approximate image and the second approximate image;
A detecting step of detecting a motion vector using the motion vector candidate.

A motion vector detection method for detecting a motion vector between a first image and a second image each represented at an original resolution,
For each of a plurality of reduced resolutions lower than the original resolution and gradually reduced, a first approximate image and a second approximate image each representing a low-frequency component included in the first image and the second image with the reduced resolution. A calculation step of calculating an image and a first activity image and a second activity image representing the high-frequency components included in the first image and the second image at the reduced resolution, respectively;
For each reduced resolution, a determination step of determining whether the total amount of high-frequency components represented by the first activity image of the reduced resolution is equal to or greater than a predetermined threshold, and it is determined that each reduced resolution is equal to or greater than the threshold. In this case, both the comparison result of the first approximate image of the reduced resolution and the second approximate image of the reduced resolution and the comparison result of the first activity image of the reduced resolution and the second activity image of the reduced resolution are used. A motion vector candidate, and in other cases, selecting a motion vector candidate using only a comparison result between the first approximate image of the reduced resolution and the second approximate image of the reduced resolution,
A step of detecting a motion vector by using the motion vector candidates step by step in ascending order of the reduced resolution.

The determining step includes:
3. The motion vector detecting method according to claim 2, wherein the determination is omitted for a reduced resolution higher than the predetermined reduced resolution, assuming that the same determination result as the determination result at the predetermined reduced resolution is obtained. .

4. The motion vector detecting method according to claim 1, wherein the calculating step calculates the approximate images and the activity images using a wavelet transform having a lifting configuration.

5. The motion vector detecting method according to claim 4, wherein the calculating step calculates the approximate images and the activity images using a Haar wavelet transform having a lifting configuration.

In the lifting structure wavelet transform and the lifting structure Haar wavelet transform for performing an operation of dividing an image into a plurality of frequency bands,
6. The motion vector detecting method according to claim 4, wherein the calculating step performs an operation of extracting only a frequency band used for calculating each of the approximate images and each of the activity images.

The motion vector detection according to claim 6, wherein the calculating step selects a frequency band used for calculating an activity image depending on whether the first image is an interlaced scanning image or a progressive scanning image. Method.

A motion vector detection device that detects a motion vector between a first image and a second image each represented by an original resolution,
A first approximation image and a second approximation image representing low-frequency components included in the first image and the second image at a reduced resolution lower than the original resolution, respectively, and a high-frequency component included in the first image and the second image; Calculating means for calculating a first activity image and a second activity image each representing a component at the reduced resolution;
Determining means for determining whether or not the total amount of high frequency components represented by the first activity image is equal to or greater than a predetermined threshold;
If it is determined that the motion vector is equal to or larger than the threshold value, the motion vector is calculated using both the comparison result between the first approximate image and the second approximate image and the comparison result between the first activity image and the second activity image. Selecting means for selecting a candidate, and in other cases, selecting a motion vector candidate using only a comparison result between the first approximate image and the second approximate image;
Detecting means for detecting a motion vector using the motion vector candidate.

A motion vector detection device that detects a motion vector between a first image and a second image each represented by an original resolution,
For each of a plurality of reduced resolutions lower than the original resolution and gradually reduced, a first approximate image and a second approximate image each representing a low-frequency component included in the first image and the second image with the reduced resolution. Calculating means for calculating an image and a first activity image and a second activity image each representing a high-frequency component included in the first image and the second image at the reduced resolution;
For each reduced resolution, the determination means for determining whether or not the total amount of high frequency components represented by the first activity image of the reduced resolution is equal to or greater than a predetermined threshold value, and it is determined that each reduced resolution is equal to or greater than the threshold value. In this case, both the comparison result between the first approximate image at the reduced resolution and the second approximate image at the reduced resolution and the comparison result between the first activity image at the reduced resolution and the second activity image at the reduced resolution are used. Selecting means for selecting a motion vector candidate, and in other cases, selecting a motion vector candidate using only a comparison result between the first approximate image of the reduced resolution and the second approximate image of the reduced resolution,
Detecting means for detecting a motion vector by using the motion vector candidates step by step in ascending order of the reduced resolution.

The determining means includes:
10. The motion vector detecting device according to claim 9, wherein, for a reduced resolution higher than the predetermined reduced resolution, the determination is omitted assuming that the same determination result as the determination result at the predetermined reduced resolution is obtained. .

The motion vector detecting device according to claim 8, wherein the calculating unit calculates the approximate images and the activity images using a wavelet transform having a lifting configuration.

12. The motion vector detecting device according to claim 11, wherein the calculating unit calculates the approximate images and the activity images using a Haar wavelet transform having a lifting configuration.

In the lifting structure wavelet transform and the lifting structure Haar wavelet transform for performing an operation of dividing an image into a plurality of frequency bands,
13. The motion vector detecting device according to claim 11, wherein the calculating unit performs an operation of extracting only a frequency band used for calculating each of the approximate images and each of the activity images.

14. The motion vector detecting apparatus according to claim 13, wherein the calculating unit selects a frequency band used for calculating an activity image depending on whether the first image is an interlaced scanning image or a progressive scanning image. apparatus.

A computer-executable program for realizing, using a computer, a motion vector detection device that detects a motion vector between a first image and a second image each represented by an original resolution,
A first approximation image and a second approximation image representing low-frequency components included in the first image and the second image at a reduced resolution lower than the original resolution, respectively, and a high-frequency component included in the first image and the second image; A calculating step of calculating a first activity image and a second activity image each representing a component at the reduced resolution;
A determining step of determining whether or not the total amount of the high-frequency components represented by the first activity image is equal to or greater than a predetermined threshold;
If it is determined that the motion vector is equal to or greater than the threshold value, the motion vector is calculated using both the comparison result between the first approximate image and the second approximate image and the comparison result between the first activity image and the second activity image. Selecting a candidate, and in other cases, selecting a motion vector candidate using only a comparison result between the first approximate image and the second approximate image;
A step of detecting a motion vector using the motion vector candidate.