JP2004064438A

JP2004064438A - Monitoring system and monitoring method

Info

Publication number: JP2004064438A
Application number: JP2002220204A
Authority: JP
Inventors: Koichi Masukura; 増倉　孝一; Osamu Hori; 堀　修; Toshimitsu Kaneko; 金子　敏充; Takeshi Mita; 三田　雄志; Koji Yamamoto; 山本　晃司
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-07-29
Filing date: 2002-07-29
Publication date: 2004-02-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a monitoring system which does not need constantly monitoring by a supervisor visual observation and to provide a method for monitoring. <P>SOLUTION: The monitoring system includes an extracting unit (102) for extracting object information for specifying an object from a video, a retrieving unit (105) for retrieving the object coincident with retrieving conditions based on the extracted object information, and a display unit (106) for reproducing and displaying the video projected in at least one or more frames of the retrieved object. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明はカメラ等で撮影した映像に含まれる物体を監視する監視システム及び監視方法に関する。
【０００２】
【従来の技術】
従来、監視システムは専用の機器から構成されているので、システム構成機器を接続するために高価な専用回線が必要であるとともに、監視員が目視によって監視業務を行っていたため、道路や発電所をはじめとしたインフラ施設や大規模工場等専用機器や監視員のコストがかかってもよいような用途を中心に使われていた。
【０００３】
しかし、近年の社会情勢の不安定さや、空き巣や車上荒らしをはじめとするテロ等の犯罪の増加から、低価格な監視システムの需要が急増している。このような需要に応じて、民生用機器を使用することのできる、あるいはインターネット等の公衆回線を利用することのできる監視システムが数多く発表されている。
【０００４】
【発明が解決しようとする課題】
ところが、上記の新監視システムにおいても、異常検出のための監視業務は従来と同じく目視によって行なわれており、監視員の負担は以前として大きかった。特に、小規模な監視システムでは専任の監視員を配置することはコスト的に難しく、目視による異常検出をカメラの全作動時間に渡って行うことができないという状況が多い。
【０００５】
本発明の目的は監視員による目視の常時監視を必要としない監視システム及び監視方法を提供することにある。
【０００６】
【課題を解決するための手段】
上記した課題を解決し目的を達成するために、本発明は以下に示す手段を用いている。
【０００７】
映像に含まれる物体を監視するシステムであって、
映像から物体を特定する物体情報を抽出する抽出手段と、
前記抽出手段により抽出された物体情報に基づいて検索条件に合致する物体を検索する検索手段と、
前記検索手段により検索された物体が少なくとも１フレーム以上映っている映像を再生表示する表示手段と、
を具備する監視システム。
【０００８】
映像に含まれる物体を監視する方法であって、
映像から物体を特定する物体情報を抽出するステップと、
抽出された物体情報に基づいて検索条件に合致する物体を検索するステップと、
検索された物体が少なくとも１フレーム以上映っている映像を再生表示するステップと、
を具備する監視方法。
【０００９】
本発明は、映像中に含まれる物体を特定する物体情報を抽出し、物体情報に基づいて監視対象となる物体を検索し、検索された物体が少なくとも１フレーム以上映っている映像を再生表示することにより、監視員が目視によって常時、異常検出する必要がなくなる。このため、多数のカメラを一台の端末にて監視する際も、端末上の検索において検索条件に合致した物体のみを監視すればよく、監視の効率が大幅にアップする。また、リアルタイムで監視を行なえない場合でも、物体の映像と物体情報とを保存しておくことで、物体の映像の頭出しの必要がなく物体の映像を自動的に表示できる。
【００１０】
【発明の実施の形態】
以下、図面を参照して本発明による監視システム及び監視方法の実施形態を説明する。
【００１１】
第１実施形態
図１は本発明の第１実施形態に係る監視システムの構成を示す図である。本システムは映像入力部１０１、物体抽出部１０２、物体情報蓄積部１０３、映像蓄積部１０４、物体検索部１０５、映像表示部１０６を備えている。これらの各部は汎用の信号線、通信回線を介して接続される。なお、物体抽出部１０２、物体検索部１０５はハードウェアとしてではなく、ソフトウェアとして実現してもよい。
【００１２】
映像入力部１０１は例えばビデオカメラや一定時間毎に撮影するディジタル静止画カメラ等で構成される。映像入力部１０１は説明の便宜上１台しか示さないが、多数設けられていてもよい。
【００１３】
物体抽出部１０２は映像入力部１０１より映像信号を受け取り、映像中に映っている物体を抽出し、抽出した物体を特定できる物体情報（図４参照）を作成し、物体情報蓄積部１０３に送信する。物体情報は、物体の形状や軌跡等の動き特徴量を記述する時空間領域情報、物体の色やテクスチャ等画像特徴量を記述する代表画像情報、物体の名称や分類やカメラ番号等属性特徴量を記述する属性情報のうち少なくとも１つからなる。ここで、物体とは映像中におけるひとかたまりの領域部分であり、例えば人、動物、車、道、川等、あるいはその一部（人の顔、車のタイヤ）等、あるいはその集合（人の列、車の渋滞）等監視すべきものとして把握できるものならどのようなものでもよい。物体の抽出は映像を表示する画面上でユーザが手入力によって行なってもよいし、背景差分法、フレーム間差分法、ステレオ視、テンプレートマッチング、カラーヒストグラムによる抽出等画像処理によって行って自動的に行なっても良い。また、赤外線カメラや人感センサ等他の外部機器を利用して行なっても良い。また、これらの手法を組み合わせて物体抽出を行なっても良い。
【００１４】
物体情報蓄積部１０３は物体抽出部１０２にて作成した物体情報を蓄積保存するものであり、例えばハードディスクや光ディスク、半導体メモリ等で構成される。
【００１５】
映像蓄積部１０４は映像入力部１０１より映像信号を受け取り、映像信号を蓄積保存するものであり、例えばハードディスクや光ディスク、半導体メモリ、ビデオテープレコーダ等で構成される。蓄積する映像信号は物体が１フレーム以上映っているものであればよく、映像入力部１０１より受け取った映像信号すべてを保存してもよいし、物体抽出部１０２から物体の時空間領域情報を受け取り、物体が存在する時間間隔だけの映像信号を保存してもよい。
【００１６】
映像蓄積部１０４は蓄積データ量を減らすためになんらかの間引きや圧縮を行なってもよい。例えば、フレームレートを落としたり、ＭＰＥＧ−２やＭＰＥＧ−４等の映像圧縮フォーマットに変換して保存することによって、同じ容量でより多くの映像信号を蓄積することができる。
【００１７】
物体情報として時空間領域情報や属性情報が作成されている場合は、これらの物体情報に従い映像信号になんらかの加工をして保存してもよい。例えば、時空間領域情報を利用して物体周辺のみを切り抜いた映像信号をつくりそれを保存したり、物体輪郭を指定した色の線で縁取りしたり、物体の軌跡を指定の色で記入したり、物体以外の部分の輝度を落としたり塗りつぶす等の加工を施したりしてもよい。また、属性情報を利用して物体の分類、名称、特徴量を文字情報として映像にスーパーインポーズしたり、統計量を表示してもよい。両者を組み合わせて属性情報の特徴量によって物体部分の加工の方法を切り替えたりしてもよい。
【００１８】
物体検索部１０５は物体情報蓄積部１０３に保存されている物体情報からユーザが指定した検索条件に該当する物体情報を検索する。検索条件はあらかじめ登録しておいてもよいし、検索毎にユーザが入力してもよい。あらかじめ登録されていた複数の検索条件からユーザが検索毎に選択する等、上記の方法を組み合わせてもよい。
【００１９】
ユーザは検索結果（の物体情報）に基づいて監視すべき１つもしくは複数の物体を指定する。物体を指定すると、指定物体ＩＤ、再生開始時刻、再生終了時刻等が記述された再生情報を映像蓄積部１０４に送信し、当該再生情報で指定される物体の映像信号を映像表示部１０６に送信させる。
【００２０】
検索条件は、例えば物体の色、テクスチャ等の画像特徴量に関する条件、軌跡、存在時間、形状、大きさ等動き特徴量に関する条件、分類、名称、カメラ番号、撮影日時等属性特徴量に関する条件等、物体情報に関するものならどのようなものでもよい。また、複数の条件をＡＮＤ、ＯＲ、ＮＯＴ等論理演算子で結合して利用しても良い。
【００２１】
映像表示部１０６は映像蓄積部１０４から受け取った物体の映像信号を再生表示するためのものであり、例えばＣＲＴ等である。再生表示する際には、物体情報蓄積部１０３から当該物体の物体情報を読み込み、物体情報中の時空間領域情報や属性情報を利用して、物体の映像に何らかの加工を施してもよい。例えば、時空間領域情報を利用して物体周辺のみを切り抜いて保存したり、物体輪郭を縁取りしたり、物体以外の部分の輝度を落としたり塗りつぶす等の加工を施したり、属性情報を利用して物体の分類や名称を映像にスーパーインポーズしたり、特徴量によって加工の方法を切り替えたりしてもよい。
【００２２】
また、複数の物体の映像を同時に見られるように物体の映像同士を合成してもよい。
【００２３】
映像入力部１０１が複数ある場合は、映像入力部１０１の数だけ物体抽出部１０２および物体情報蓄積部１０３を用意してもいいし、１つの物体抽出部１０２および物体情報蓄積部１０３にて複数の映像入力部１０１からの入力を処理してもよいし、これらの組み合わせでもよい。
【００２４】
次に、本実施形態の作用を説明する。図２は物体の抽出処理の流れの一例を表したフローチャートを示す。抽出処理は１フレーム毎や一定時間毎等のある時間間隔毎に実行される。
【００２５】
ステップＳ２１で、カメラ等の映像入力部１０１から映像信号を入力する（映像入力ステップ）。
【００２６】
ステップＳ２２で、映像入力が存在するかどうかを判定する（映像入力判定ステップ）。映像入力が存在しなければ、物体の抽出処理は終了する。映像入力があると、ステップＳ２３で物体の抽出を行う。
【００２７】
物体抽出ステップＳ２３では、映像入力ステップＳ２１で入力された映像信号から物体を抽出する。物体の抽出は映像表示画面上でユーザが手入力によって行なってもよいし、背景差分法、フレーム間差分法、ステレオ視、テンプレートマッチング、カラーヒストグラムによる抽出等の画像処理によって自動的に行なっても良い。また、赤外線カメラや人感センサ等の他の外部機器を用いて行なっても良い。また、ユーザによる入力、画像処理、外部機器を組み合わせて物体抽出を行なっても良い。
【００２８】
ステップＳ２４では、物体抽出ステップＳ２３で抽出した物体の時空間領域情報、代表画像情報、属性情報の少なくとも１つを作成し、これらを物体情報（図３参照）として保存蓄積する（物体情報蓄積ステップ）。
【００２９】
ステップＳ２５では、映像入力ステップＳ２１で取得した映像信号を蓄積保存する（映像蓄積ステップ）。映像蓄積ステップＳ２５は映像入力ステップＳ２１の直後に実行してもよいが、物体情報の作成、蓄積後に実行することにより、物体情報蓄積ステップＳ２４で生成した物体情報を利用して、映像に何らかの加工を施すことができる。例えば、時空間領域情報を利用して物体周辺のみを切り抜いて保存したり、物体輪郭を縁取りしたり、物体以外の部分の輝度を落としたり塗りつぶす等の加工を施したり、属性情報を利用して物体の分類や名称をスーパーインポーズしたり、特徴量によって加工の方法を切り替えたりすることができる。
【００３０】
図３に物体情報蓄積部１０３に蓄積される物体情報のデータ構造の一例を示す。図３に示されるように、物体情報は物体ＩＤ４０１、先頭時刻４０２、終了時刻４０３、時空間領域情報４０４、代表画像情報４０７、属性情報４０８で構成される。時空間領域情報４０４は形状フラグ４０５と動きデータ４０６で構成され、属性情報４０８は１つ以上の属性名４０９と属性値４１０にて構成される。
【００３１】
これらのデータは１つのファイルとしてまとまっていてもいいし、別々のファイルに分かれていてもいい。また、時空間領域情報４０４、代表画像情報４０５、属性情報４０６はそれぞれ省略することもできるし、複数持っていてもいい。時空間領域情報４０４、代表画像情報４０５、属性情報４０６の並び順もどのようであってもよい。
【００３２】
物体ＩＤ４０１は、抽出された物体を区別するために一意につけられる識別番号である。
【００３３】
先頭時刻４０２および終了時刻４０３は、それぞれ物体がはじめて出現した時刻と最後に消失した時刻である。
【００３４】
時空間領域情報４０４は、物体が存在する時空間的な領域情報を保持するもので、物体が存在する時間およびその位置や形状が分かるようなものであればどのようなものであってもよい。例えば、物体が存在するところを“１”、そうでないところを“０”とした動画像で表すアルファマップを利用してもいいし、ＭＰＥＧ−７のＳｐａｔｉｏＴｅｍｐｏｒａｌ　Ｌｏｃａｔｏｒのようなデータ形式を利用してもよい。ＳｐａｔｉｏＴｅｍｐｏｒａｌ　Ｌｏｃａｔｏｒは映像中の各フレームの物体形状を矩形、楕円もしくは任意頂点数の多角形で近似し、物体の動きや形状変化を近似矩形や近似楕円の外接矩形や近似多角形の各頂点の時間方向の軌跡として表わし、頂点軌跡をスプライン関数で近似し、その近似関数のパラメータとして表すＦｉｇｕｒｅ　Ｔｒａｊｅｃｔｏｒｙと、任意の１フレームにおける物体形状を矩形、楕円もしくは任意頂点数の多角形で近似し、これを参照図形として、物体の動きや形状変化を参照図形からのアフィン変換等の変換パラメータによって表わし、変換パラメータの時間方向の軌跡をスプライン関数で近似し、その近似関数のパラメータとして表すＰａｒａｍｅｔｅｒ　Ｔｒａｊｅｃｔｏｒｙのいずれかがある。ＳｐａｔｉｏＴｅｍｐｏｒａｌ　Ｌｏｃａｔｏｒはアルファマップより遥かに少ないデータ量で、物体形状やその動きをあらわすことができるという特徴がある。
【００３５】
形状フラグ４０５は、時空間領域情報がどのような形式で保存されているかを示すためのフラグである。アルファマップやＳｐａｔｉｏＴｅｍｐｏｒａｌ　Ｌｏｃａｔｏｒのようなデータ形式や、物体の形状、特徴量軌跡の関数近似方法等が記述される。
【００３６】
動きデータ４０６は、時空間領域の動きのデータである。アルファマップであれば動画像データ、ＳｐａｔｉｏＴｅｍｐｏｒａｌ　Ｌｏｃａｔｏｒであれば関数近似した関数パラメータである。
【００３７】
代表画像情報４０７は、物体が存在する１フレームもしくは複数フレームの画像情報である。これはユーザが目視によって物体を指定するときのための情報であり、ＢＭＰやＪＰＥＧ等の画像情報や映像信号の当該フレームを示すポインタ等である。画像情報は当該フレームの画像全体でもいいし、その一部でもいい。例えば、当該フレームにおける位置および形状が分かっているときは、画像から物体周辺のみを切り抜いて代表画像情報としてもよい。また、物体領域以外の輝度を落としたり、物体輪郭を縁取りする等画像になんらかの加工を施してもよい。
【００３８】
代表画像情報とするフレームは、当該物体が判別しやすいフレームであればどのフレームを取り出してきてもよい。例えば物体の存在する時間期間の中心時刻を代表フレームとする等の方法がある。
【００３９】
属性情報４０８は、物体がどのような特徴をもっているかを記述するためのものである。例えば、物体が車であるか人であるか動物であるか等の分類や、色や形や大きさ等の特徴量や、映像信号と同時に取得された音声やセンサ等の外部機器情報や、関連する情報へのリンク等が記述される。
【００４０】
属性情報４０８は、１つ以上の属性名４０９と属性値４１０で構成される。属性名４０９と属性値４１０は１対１に対応している。属性名４０９および属性値４１０はどのような形式で記述されていてもよい。例えば、国際標準であるＭＰＥＧ−７形式や各種データベース形式やテキスト形式等が使用できる。
【００４１】
図４は、物体の検索処理の流れの一例を表したフローチャートを示す。処理は物体検索毎に実行される。
【００４２】
ステップＳ３１では、物体の検索条件を入力する（検索条件入力ステップ）。検索条件はあらかじめ登録しておいてもよいし、検索毎にユーザが手入力してもよい。あらかじめ登録されていた複数の検索条件からユーザが検索毎に選択する等、上記の方法を組み合わせてもよい。
【００４３】
ステップＳ３２では、物体情報蓄積部１０３に蓄積されている各物体情報と入力された検索条件を比べ、物体情報が検索条件に合致するかどうか判定する（物体情報マッチングステップ）。合致していると判定された物体の代表画像情報や属性情報をＣＲＴ等に表示したりログに残したりして、ユーザが検索結果を閲覧できるようにする。
【００４４】
ステップＳ３３では、物体情報蓄積部１０３に蓄積されている全物体情報についてマッチングが行われたかどうかを判定する（マッチング終了判定ステップ）。全物体情報の判定が終了するまでステップＳ３２とステップＳ３３が繰り返される。
【００４５】
ステップＳ３４では、検索条件に合致した物体のうちのどの物体の映像信号を再生するかを指定する（物体指定ステップ）。基本的には検索結果を閲覧したユーザが物体の映像を再生する物体を指定するが、検索結果が少ない場合や検索結果のうち最新の時刻の物体の映像を常に見たい場合等は、システムが自動的に指定してもよい。
【００４６】
指定は１つの物体でも複数の物体でもよく、同時にそれぞれの物体の映像信号について再生開始時刻、再生終了時刻、再生オプションを指定することができる。これらの情報は物体再生情報（図５参照）として物体の映像信号再生時に使用される。再生オプションには物体輪郭を縁取りする等の特殊効果方法や、複数の映像信号再生時にどのように物体の映像信号を再生するか等を指定することができる。
【００４７】
ステップＳ３５では、物体指定ステップＳ３４で指定した物体再生情報を利用し、当該物体の映像信号をＣＲＴ等に再生表示する（映像再生ステップ）。再生時には物体情報の再生開始時刻、再生終了時刻、再生オプションに基づいて映像信号を加工し表示する
図５に映像信号の再生時に指定される物体再生情報のデータ構造の一例を示す。図５に示されるように物体再生情報は指定物体ＩＤ５０１、再生開始時刻５０２、再生終了時刻５０３、再生オプション５０４で構成される。１つの物体再生情報で複数の物体を指定する際には指定物体ＩＤ５０１、再生開始時刻５０２、再生終了時刻５０３、再生オプション５０４がそれぞれ複数になる。
【００４８】
指定物体ＩＤ５０１は、再生表示する物体を識別するために一意につけられる識別番号である。指定物体ＩＤは図４の物体情報における物体ＩＤ４０１と同じものであり、物体ＩＤを指定することによって物体の映像を一意に指定することができる。
【００４９】
再生開始時刻５０２は、当該物体の映像を再生する際の開始時刻である。省略した際は物体の出現時刻（物体情報内の先頭時刻４０２）から再生する。
【００５０】
再生終了時刻５０３は、当該物体の映像を再生する際の終了時刻である。省略した際は物体の消失時刻（物体情報内の終了時刻４０３）まで再生する。
【００５１】
再生オプション５０４は、物体の映像をどのように再生するかのオプションである。再生オプション４０４には例えば画面サイズ、再生レート、物体縁取り等の特殊効果等を指定することができる。また、複数物体を指定した際には、複数物体を一画面に入れて再生したり、順番に指定したりする等複数物体の再生方法も指定することができる。
【００５２】
以上説明したように本実施形態によれば、カメラ等の映像入力部１０１で取得した映像中に含まれる人や車等の物体を特定する時空間領域情報、代表画像情報、属性情報の少なくとも１つ以上で構成される物体情報を物体抽出部１０２にて抽出し、抽出された物体情報を利用して監視対象となる物体を検索し、検索された物体が少なくとも１フレーム以上映っている映像である物体の映像を再生表示する。
【００５３】
これにより、監視員が目視によって常時、異常検出する必要がなくなる。多数のカメラを一台の端末にて監視する際も、端末上の検索においてマッチした物体のみをチェックすればよく、監視の効率が大幅にアップする。また、リアルタイムで監視を行なえないような場合でも、物体の映像と物体情報を保存しておくことで映像の頭出しの必要なく監視を行うことができる。
【００５４】
本発明は上述した実施形態に限定されず、種々変形して実施可能である。例えば、本発明は、コンピュータに所定の手段を実行させるための、コンピュータを所定の手段として機能させるための、あるいはコンピュータに所定の機能を実現させるためのプログラムを記録したコンピュータ読取り可能な記録媒体としても実施することもできる。
【００５５】
【発明の効果】
以上説明したように本発明によれば、映像中に含まれる物体の物体情報を利用して監視対象となる物体を検索し、当該の物体の映像を再生表示することが可能となる。これにより全映像を目視することなく監視業務を行うことができ、監視業務の大幅な効率化が可能となる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る監視システムの構成の一例を示すブロック図。
【図２】同実施形態における物体の抽出処理手順の一例を表したフローチャート。
【図３】同実施形態における物体情報のデータ構造の一例を示す図。
【図４】同実施形態における物体の検索処理手順の一例を表したフローチャート。
【図５】同実施形態における物体再生情報のデータ構造の一例を示す図。
【符号の説明】
１０１…映像入力部
１０２…物体抽出部
１０３…物体情報蓄積部
１０４…映像蓄積部
１０５…物体検索部
１０６…映像表示部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a monitoring system and a monitoring method for monitoring an object included in an image captured by a camera or the like.
[0002]
[Prior art]
Conventionally, the monitoring system is composed of dedicated equipment, so expensive dedicated lines are needed to connect the system components, and because monitoring personnel have been conducting monitoring work visually, roads and power plants have to be monitored. It was mainly used for infrastructure facilities, large-scale factories, and other specialized equipment and applications that could require the cost of a supervisor.
[0003]
However, the demand for low-cost surveillance systems is rapidly increasing due to the recent instability of social situations and the increase in crimes such as terrorism such as burglars and vandalism. In response to such demands, there have been published many monitoring systems that can use consumer devices or use a public line such as the Internet.
[0004]
[Problems to be solved by the invention]
However, even in the above-described new monitoring system, the monitoring work for detecting an abnormality is performed visually as in the past, and the burden on the monitoring personnel was large as before. In particular, in a small-scale monitoring system, it is difficult to arrange a dedicated monitoring person in terms of cost, and in many cases, visual abnormality detection cannot be performed over the entire operation time of the camera.
[0005]
An object of the present invention is to provide a monitoring system and a monitoring method that do not require visual monitoring by a monitor at all times.
[0006]
[Means for Solving the Problems]
In order to solve the above problems and achieve the object, the present invention uses the following means.
[0007]
A system for monitoring an object included in a video,
Extracting means for extracting object information for specifying the object from the video,
Search means for searching for an object that matches a search condition based on the object information extracted by the extraction means,
Display means for reproducing and displaying a video in which the object searched for by the search means is shown in at least one frame,
A monitoring system comprising:
[0008]
A method of monitoring an object included in a video,
Extracting object information identifying the object from the video;
Searching for an object that matches the search condition based on the extracted object information;
Reproducing and displaying an image in which the searched object is reflected in at least one frame;
A monitoring method comprising:
[0009]
The present invention extracts object information that specifies an object included in a video, searches for an object to be monitored based on the object information, and reproduces and displays a video in which the searched object is reflected in at least one frame. As a result, it is not necessary for the observer to always visually detect the abnormality. For this reason, even when monitoring a large number of cameras with one terminal, it is only necessary to monitor only objects that match the search conditions in the search on the terminal, and the efficiency of monitoring is greatly improved. Even when monitoring cannot be performed in real time, by storing the image of the object and the object information, the image of the object can be automatically displayed without the need to search for the image of the object.
[0010]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of a monitoring system and a monitoring method according to the present invention will be described with reference to the drawings.
[0011]
First Embodiment FIG. 1 is a diagram showing a configuration of a monitoring system according to a first embodiment of the present invention. The system includes a video input unit 101, an object extraction unit 102, an object information storage unit 103, a video storage unit 104, an object search unit 105, and a video display unit 106. These units are connected via general-purpose signal lines and communication lines. Note that the object extraction unit 102 and the object search unit 105 may be realized as software instead of hardware.
[0012]
The video input unit 101 includes, for example, a video camera, a digital still image camera that captures images at regular intervals, and the like. Although only one video input unit 101 is shown for convenience of description, a plurality of video input units may be provided.
[0013]
The object extraction unit 102 receives a video signal from the video input unit 101, extracts an object reflected in the video, creates object information (see FIG. 4) capable of specifying the extracted object, and transmits the object information to the object information storage unit 103. I do. The object information includes spatio-temporal region information describing motion features such as the shape and trajectory of the object, representative image information describing image features such as the color and texture of the object, and attribute features such as the name and classification of the object and the camera number. At least one piece of attribute information that describes. Here, the object is a group of regions in the video, such as a person, an animal, a car, a road, a river, or a part thereof (a person's face, a car tire), or a set thereof (a row of people). , Car traffic, etc., as long as it can be grasped as what should be monitored. The extraction of the object may be performed manually by the user on the screen displaying the video, or automatically performed by image processing such as background subtraction, inter-frame difference, stereo vision, template matching, color histogram extraction, and the like. You may do it. Further, it may be performed using another external device such as an infrared camera or a human sensor. Object extraction may be performed by combining these methods.
[0014]
The object information accumulation unit 103 accumulates and saves the object information created by the object extraction unit 102, and includes, for example, a hard disk, an optical disk, and a semiconductor memory.
[0015]
The video storage unit 104 receives a video signal from the video input unit 101 and stores and stores the video signal. The video storage unit 104 includes, for example, a hard disk, an optical disk, a semiconductor memory, and a video tape recorder. The video signal to be stored is not limited as long as the object reflects one or more frames, and all the video signals received from the video input unit 101 may be stored. Alternatively, the video signal may be stored only for the time interval during which the object exists.
[0016]
The video storage unit 104 may perform some thinning or compression to reduce the amount of stored data. For example, by lowering the frame rate or converting to a video compression format such as MPEG-2 or MPEG-4 and storing it, more video signals can be stored with the same capacity.
[0017]
If spatio-temporal region information or attribute information has been created as object information, the video signal may be subjected to some processing in accordance with the object information and stored. For example, using the spatio-temporal area information, create and save a video signal that is cut out only around the object, border the object outline with a specified color line, write the trajectory of the object with a specified color Alternatively, processing such as lowering the brightness of a portion other than the object or painting out the portion may be performed. In addition, the classification, the name, and the feature amount of the object may be superimposed on the video as character information using the attribute information, or the statistics may be displayed. A combination of the two may be used to switch the method of processing the object part depending on the feature amount of the attribute information.
[0018]
The object search unit 105 searches the object information stored in the object information storage unit 103 for object information corresponding to the search condition specified by the user. Search conditions may be registered in advance, or may be input by the user for each search. The above methods may be combined such that the user selects for each search from a plurality of search conditions registered in advance.
[0019]
The user designates one or more objects to be monitored based on (the object information of) the search result. When an object is designated, the reproduction information describing the designated object ID, the reproduction start time, the reproduction end time, and the like is transmitted to the video storage unit 104, and the video signal of the object specified by the reproduction information is transmitted to the video display unit 106. Let it.
[0020]
The search conditions include, for example, conditions relating to image features such as the color and texture of the object, conditions relating to movement features such as trajectory, existence time, shape and size, conditions relating to attribute features such as classification, name, camera number, and shooting date and time. Anything may be used as long as it relates to object information. Further, a plurality of conditions may be combined and used by a logical operator such as AND, OR, NOT, and the like.
[0021]
The video display unit 106 is for reproducing and displaying the video signal of the object received from the video storage unit 104, and is, for example, a CRT. At the time of reproduction and display, the object information of the object may be read from the object information storage unit 103, and some processing may be performed on the image of the object using spatio-temporal region information and attribute information in the object information. For example, using the spatio-temporal area information to cut out and save only the periphery of the object, trimming the outline of the object, performing processing such as reducing or filling the brightness of parts other than the object, and using attribute information The classification or name of the object may be superimposed on the video, or the processing method may be switched according to the feature amount.
[0022]
Also, the images of the objects may be combined so that the images of a plurality of objects can be viewed simultaneously.
[0023]
When there are a plurality of video input units 101, the object extraction units 102 and the object information storage units 103 may be prepared by the number of the video input units 101. May be processed from the video input unit 101, or a combination thereof.
[0024]
Next, the operation of the present embodiment will be described. FIG. 2 is a flowchart illustrating an example of the flow of the object extraction process. The extraction process is executed at certain time intervals such as every frame or every certain time.
[0025]
In step S21, a video signal is input from the video input unit 101 such as a camera (video input step).
[0026]
In step S22, it is determined whether a video input exists (video input determination step). If there is no video input, the object extraction processing ends. If there is a video input, an object is extracted in step S23.
[0027]
In the object extraction step S23, an object is extracted from the video signal input in the video input step S21. The object may be extracted manually by the user on the video display screen or automatically by image processing such as background subtraction, inter-frame difference, stereo vision, template matching, color histogram extraction, and the like. good. Further, it may be performed by using another external device such as an infrared camera or a human sensor. In addition, object extraction may be performed by combining user input, image processing, and external devices.
[0028]
In step S24, at least one of spatio-temporal region information, representative image information, and attribute information of the object extracted in the object extraction step S23 is created and stored as object information (see FIG. 3) (object information storage step). ).
[0029]
In step S25, the video signal acquired in the video input step S21 is stored and stored (video storage step). The image storage step S25 may be executed immediately after the image input step S21. However, the image storage step S25 is executed after the object information is created and stored, so that the object information generated in the object information storage step S24 is used to perform some processing on the image. Can be applied. For example, using the spatio-temporal area information to cut out and save only the periphery of the object, trimming the outline of the object, performing processing such as reducing or filling the brightness of parts other than the object, and using attribute information It is possible to superimpose the classification and name of the object, and to switch the processing method depending on the feature amount.
[0030]
FIG. 3 shows an example of the data structure of the object information stored in the object information storage unit 103. As shown in FIG. 3, the object information includes an object ID 401, a start time 402, an end time 403, spatiotemporal area information 404, representative image information 407, and attribute information 408. The spatio-temporal region information 404 includes a shape flag 405 and motion data 406, and the attribute information 408 includes one or more attribute names 409 and attribute values 410.
[0031]
These data may be collected as one file or may be divided into separate files. Further, the spatiotemporal region information 404, the representative image information 405, and the attribute information 406 can be omitted, respectively, or a plurality of them can be provided. The arrangement order of the spatiotemporal area information 404, the representative image information 405, and the attribute information 406 may be any order.
[0032]
The object ID 401 is an identification number uniquely assigned to distinguish the extracted object.
[0033]
The start time 402 and the end time 403 are the time when the object first appears and the time when the object last disappears, respectively.
[0034]
The spatio-temporal area information 404 holds spatio-temporal area information where an object exists, and may be any information as long as the time at which the object exists and its position and shape can be known. . For example, it is possible to use an alpha map representing a moving image in which the place where the object exists is “1” and the place where the object is not “0”, or use a data format such as MPEG-7 Spatial Temporal Locator. Is also good. The Spatial Temporal Locator approximates the object shape of each frame in the video with a rectangle, an ellipse, or a polygon with an arbitrary number of vertices. Expressed as a trajectory in the direction, a vertex trajectory is approximated by a spline function, and a figure trajectory expressed as a parameter of the approximation function, and an object shape in any one frame is approximated by a rectangle, an ellipse, or a polygon having an arbitrary number of vertices. As a reference graphic, any one of Parameter Trajectory in which the movement or shape change of the object is represented by a transformation parameter such as an affine transformation from the reference graphic, and the trajectory of the transformation parameter in the time direction is approximated by a spline function and represented as a parameter of the approximation function There is. The Spatial Temporal Locator is characterized by being able to represent an object shape and its movement with a much smaller amount of data than an alpha map.
[0035]
The shape flag 405 is a flag for indicating in what format the spatio-temporal area information is stored. A data format such as an alpha map or a Spatial Temporal Locator, a shape of an object, a function approximation method of a feature amount trajectory, and the like are described.
[0036]
The motion data 406 is data of motion in a spatiotemporal area. If it is an alpha map, it is moving image data, and if it is a Spatial Temporal Locator, it is a function parameter approximated by a function.
[0037]
The representative image information 407 is image information of one frame or a plurality of frames in which the object exists. This is information for the user to visually specify the object, such as image information such as BMP or JPEG, a pointer indicating the frame of the video signal, or the like. The image information may be the entire image of the frame or a part thereof. For example, when the position and shape in the frame are known, only the periphery of the object may be cut out from the image and used as representative image information. In addition, some processing may be performed on the image, such as lowering the luminance of the area other than the object area, or bordering the outline of the object.
[0038]
Any frame may be extracted as the representative image information as long as the object can be easily identified. For example, there is a method in which a center time of a time period in which an object exists is set as a representative frame.
[0039]
The attribute information 408 is for describing what characteristics the object has. For example, classification such as whether the object is a car, a person, or an animal, a feature amount such as a color, a shape, and a size, external device information such as a sound and a sensor acquired simultaneously with a video signal, A link to related information is described.
[0040]
The attribute information 408 includes one or more attribute names 409 and attribute values 410. The attribute name 409 and the attribute value 410 have a one-to-one correspondence. The attribute name 409 and the attribute value 410 may be described in any format. For example, MPEG-7 format which is an international standard, various database formats, text format, and the like can be used.
[0041]
FIG. 4 is a flowchart illustrating an example of the flow of an object search process. The process is executed for each object search.
[0042]
In step S31, search conditions for an object are input (search condition input step). Search conditions may be registered in advance, or may be manually input by the user for each search. The above methods may be combined such that the user selects for each search from a plurality of search conditions registered in advance.
[0043]
In step S32, each object information stored in the object information storage unit 103 is compared with the input search condition to determine whether the object information matches the search condition (object information matching step). The representative image information and the attribute information of the object determined to be matched are displayed on a CRT or the like or left in a log so that the user can browse the search result.
[0044]
In step S33, it is determined whether or not matching has been performed for all object information stored in the object information storage unit 103 (matching end determination step). Steps S32 and S33 are repeated until the determination of all object information is completed.
[0045]
In step S34, it is specified which of the objects that match the search condition is to reproduce the video signal (object specifying step). Basically, the user who browsed the search result specifies the object to play the image of the object, but if there are few search results or if you always want to see the image of the object at the latest time among the search results, the system It may be specified automatically.
[0046]
The specification may be a single object or a plurality of objects, and the reproduction start time, the reproduction end time, and the reproduction option can be simultaneously specified for the video signal of each object. These pieces of information are used as object reproduction information (see FIG. 5) when reproducing a video signal of an object. In the reproduction option, it is possible to specify a special effect method such as edging an object outline, and how to reproduce a video signal of an object when reproducing a plurality of video signals.
[0047]
In step S35, the video signal of the object is reproduced and displayed on a CRT or the like using the object reproduction information specified in the object specification step S34 (video reproduction step). At the time of reproduction, a video signal is processed and displayed based on the reproduction start time, reproduction end time, and reproduction option of the object information. FIG. 5 shows an example of a data structure of the object reproduction information designated at the time of reproducing the video signal. As shown in FIG. 5, the object reproduction information includes a designated object ID 501, a reproduction start time 502, a reproduction end time 503, and a reproduction option 504. When a plurality of objects are designated by one object reproduction information, the designated object ID 501, the reproduction start time 502, the reproduction end time 503, and the reproduction option 504 are each plural.
[0048]
The designated object ID 501 is an identification number uniquely assigned to identify an object to be reproduced and displayed. The designated object ID is the same as the object ID 401 in the object information in FIG. 4, and by specifying the object ID, the image of the object can be uniquely specified.
[0049]
The reproduction start time 502 is a start time when reproducing the video of the object. When omitted, reproduction is performed from the appearance time of the object (the head time 402 in the object information).
[0050]
The reproduction end time 503 is an end time when reproducing the video of the object. When omitted, the reproduction is performed until the disappearance time of the object (the end time 403 in the object information).
[0051]
The reproduction option 504 is an option for how to reproduce the image of the object. In the reproduction option 404, for example, a special effect such as a screen size, a reproduction rate, and an object border can be designated. When a plurality of objects are designated, a method of reproducing the plurality of objects can be designated, such as by placing the plurality of objects on one screen and reproducing them, or by designating them in order.
[0052]
As described above, according to the present embodiment, at least one of spatiotemporal region information, representative image information, and attribute information for specifying an object such as a person or a car included in a video acquired by the video input unit 101 such as a camera. The object extraction unit 102 extracts object information composed of at least one object and searches for an object to be monitored using the extracted object information. Play and display the video of a certain object.
[0053]
Thus, it is not necessary for the observer to always visually detect the abnormality. Even when monitoring a large number of cameras with one terminal, it is sufficient to check only the matching objects in the search on the terminal, and the monitoring efficiency is greatly improved. Even in a case where monitoring cannot be performed in real time, it is possible to perform monitoring without having to search for a video by storing the video of the object and the object information.
[0054]
The present invention is not limited to the embodiments described above, and can be implemented with various modifications. For example, the present invention provides a computer-readable recording medium on which a program for causing a computer to execute predetermined means, causing a computer to function as predetermined means, or causing a computer to realize predetermined functions is recorded. Can also be implemented.
[0055]
【The invention's effect】
As described above, according to the present invention, it is possible to search for an object to be monitored using object information of an object included in a video and reproduce and display the video of the object. As a result, the monitoring operation can be performed without viewing all the images, and the efficiency of the monitoring operation can be greatly increased.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of a configuration of a monitoring system according to an embodiment of the present invention.
FIG. 2 is an exemplary flowchart illustrating an example of an object extraction processing procedure according to the embodiment.
FIG. 3 is an exemplary view showing an example of a data structure of object information in the embodiment.
FIG. 4 is an exemplary flowchart illustrating an example of an object search processing procedure according to the embodiment.
FIG. 5 is an exemplary view showing an example of a data structure of object reproduction information in the embodiment.
[Explanation of symbols]
101 video input unit 102 object extraction unit 103 object information storage unit 104 video storage unit 105 object search unit 106 video display unit

Claims

A system for monitoring an object included in a video,
Extracting means for extracting object information for specifying the object from the video,
Search means for searching for an object that matches a search condition based on the object information extracted by the extraction means,
Display means for reproducing and displaying a video in which the object searched for by the search means is shown in at least one frame,
A monitoring system comprising:

The object information is at least one of spatio-temporal region information describing the shape and motion of the object, representative image information describing image features such as color and texture of the object, and attribute information describing the classification and feature of the object. With
The monitoring system according to claim 1, wherein the search unit searches for an object using at least one of the spatiotemporal area information, the representative image information, and the attribute information.

The spatio-temporal area information approximates the object shape of each frame in the video with a rectangle, an ellipse or a polygon having an arbitrary number of vertices. The monitoring system according to claim 2, wherein each of the vertices is represented as a trajectory in a time direction, and a parameter of an approximation function when the vertex trajectory is approximated by a spline function is described.

The spatio-temporal region information is obtained by approximating an object shape in an arbitrary frame in a video with a rectangle, an ellipse, or a polygon having an arbitrary number of vertices as a reference figure, and performing affine transformation from the reference figure on the movement or shape change of the object. 3. The monitoring system according to claim 2, wherein the monitoring system is represented by a conversion parameter, and describes a parameter of an approximation function when the trajectory of the conversion parameter in the time direction is approximated by a spline function.

5. The display device according to claim 2, wherein the display unit processes and displays an image in which the searched object is reflected in at least one frame using at least one of the spatiotemporal region information and the attribute information. 6. A monitoring system according to claim 1.

6. The image processing apparatus according to claim 1, further comprising a unit configured to process and store an image in which at least one frame of the searched object is reflected using at least one of the spatiotemporal region information and the attribute information. The monitoring system according to claim 1.

A method of monitoring an object included in a video,
Extracting object information identifying the object from the video;
Searching for an object that matches the search condition based on the extracted object information;
Reproducing and displaying an image in which the searched object is reflected in at least one frame;
A monitoring method comprising:

8. The monitoring method according to claim 7, wherein the displaying step processes and displays an image in which the searched object is shown in at least one frame by using at least one of the spatiotemporal region information and the attribute information.

9. The method according to claim 7, further comprising the step of processing and storing an image in which at least one frame of the searched object is reflected using at least one of the spatiotemporal region information and the attribute information. Monitoring method.