JP2009064162A

JP2009064162A - Image recognition system

Info

Publication number: JP2009064162A
Application number: JP2007230356A
Authority: JP
Inventors: Nobusuke Kasagi; 誠佑笠置; Motoya Ogawa; 原也小川
Original assignee: Fuji Heavy Industries Ltd
Current assignee: Subaru Corp
Priority date: 2007-09-05
Filing date: 2007-09-05
Publication date: 2009-03-26

Abstract

【課題】獲得された知識をシステム内で系統的に分類して保持・活用し、リアルタイムでの認識処理及び認識精度を向上させる。
【解決手段】教師データが入力されると、データベース管理部７で特徴量を抽出してクラスタリングし、クラスに分類して属性情報を付与して教師データベースＤＢ２に蓄積する。そして、このクラス毎に分類された教師データを用いて、現在用いている認識器を評価・更新することで、蓄積された知識の量が増大しても、多様な環境、対象に合わせて適応的に学習する際の最適化を効率的且つ高速に行うことができ、高精度かつロバストな認識を実現することができる。
【選択図】図１The acquired knowledge is systematically classified and retained and utilized in a system to improve real-time recognition processing and recognition accuracy.
When teacher data is input, the database management unit 7 extracts and clusters the feature amounts, classifies them into classes, adds attribute information, and stores them in the teacher database DB2. And by using the teacher data classified for each class, the current recognizer is evaluated and updated, so that it can be adapted to various environments and targets even if the amount of accumulated knowledge increases. Can be optimized efficiently and at high speed, and highly accurate and robust recognition can be realized.
[Selection] Figure 1

Description

本発明は、獲得された知識をシステム内で系統的に分類して認識処理を行う画像認識システムに関する。 The present invention relates to an image recognition system that performs recognition processing by systematically classifying acquired knowledge in a system.

近年、カメラ等からの画像データを処理し、その画像の中から特定の対象、例えば、環境内を移動する物体やその動き等を抽出する画像認識技術においては、ユーザが使用する環境下で予め開発者が想定していなかった状況での認識精度を確保・向上するため、各種認識器を用いたオンライン向けの学習アルゴリズムが開発されている。 In recent years, in image recognition technology that processes image data from a camera or the like and extracts a specific target, for example, an object moving in the environment or its movement from the image, the image data is extracted in advance under the environment used by the user. Learning algorithms for online use of various recognizers have been developed in order to ensure and improve recognition accuracy in situations not anticipated by developers.

例えば、非特許文献１には、各種画像フィルタを木構造状に組み合わせた認識器を用いる技術が開示されており、木構造状画像フィルタを遺伝的プログラミングにより自動的に最適化することで、より複雑な画像認識が可能となる（木構造状画像変換の自動構築法；ＡＣＴＩＴ）。 For example, Non-Patent Document 1 discloses a technique that uses a recognizer in which various image filters are combined in a tree structure. By automatically optimizing the tree structure image filter by genetic programming, Complex image recognition is possible (automatic construction method of tree-structured image conversion; ACTIT).

また、特許文献１には、動画像中から特定の対象、特に時間的な変化や変位を伴う特定対象の抽出を可能とするため、ＡＣＴＩＴを拡張した技術が開示されている。特許文献１の技術では、教師情報を与えることで遺伝的プログラミングにより木構造画像フィルタの処理構造を自動的に獲得でき、ユーザの使用する環境に合わせてシステムが適応し、高精度且つロバストに認識することが可能である。
特開２００６−１７８８５７号公報青木紳也、外１名、「木構造状画像変換の自動構築法ＡＣＴＩＴ」、映像情報メディア学会誌、社団法人映像情報メディア学会、１９９９年、第５３巻、第６号、ｐ．８８８〜８９４ Patent Document 1 discloses a technique that extends ACTIT in order to enable extraction of a specific target from a moving image, in particular, a specific target with temporal change or displacement. In the technique of Patent Document 1, the processing structure of the tree structure image filter can be automatically acquired by genetic programming by providing teacher information, the system adapts to the environment used by the user, and is recognized with high accuracy and robustness. Is possible.
JP 2006-178857 A Shinya Aoki, 1 other person, “Automatic construction method of tree-structured image conversion ACTIT”, The Journal of the Institute of Image Information and Television Engineers, The Institute of Image Information and Television Engineers, 1999, Vol. 53, No. 6, p. 888-894

上述の技術では、過去に学習した画像や認識処理をシステム内で保持し、現在状況に活用することで環境に適応的な認識を行っている。このため、多様な走行環境を学習していくと、過去に学習された認識処理の蓄積量が膨大になり、現在の状態へ活用する際に、膨大な計算を要することになる。 In the above-described technology, images learned in the past and recognition processing are held in the system and are used in the current situation to perform adaptive recognition for the environment. For this reason, when learning various driving environments, the accumulated amount of recognition processing learned in the past becomes enormous, and enormous calculation is required when utilizing in the current state.

従って、多様な環境を学習して一義的に蓄積するのみでは、効率的な活用の障害となるばかりでなく、特徴的なシーンに対して過去に蓄積された膨大な知識（認識処理）の中から平均的な知識を適用してしまう可能性があり、必ずしも有効ではない。 Therefore, learning various environments and storing them unambiguously will not only be an obstacle to efficient utilization, but also a huge amount of knowledge (recognition processing) accumulated in the past for characteristic scenes. May apply average knowledge, and is not always effective.

本発明は上記事情に鑑みてなされたもので、獲得された知識をシステム内で系統的に分類して保持・活用し、リアルタイムでの認識処理及び認識精度を向上させることのできる画像認識システムを提供することを目的としている。 The present invention has been made in view of the above circumstances, and an image recognition system capable of systematically classifying acquired knowledge and maintaining and utilizing the acquired knowledge to improve real-time recognition processing and recognition accuracy. It is intended to provide.

上記目的を達成するため、本発明による画像認識システムは、画像データを認識器を用いて認識処理する画像認識システムであって、学習により獲得される認識処理及び学習に用いる教師情報をクラス毎に分類し、保持するデータベース部と、上記認識器を上記クラス毎の教師データを用いて評価し、上記認識器を適応的に学習更新する学習更新部とを備えることを特徴とする。 In order to achieve the above object, an image recognition system according to the present invention is an image recognition system that recognizes image data using a recognizer, and recognizes recognition processing acquired by learning and teacher information used for learning for each class. A database unit that classifies and holds, and a learning update unit that evaluates the recognizer using the teacher data for each class and adaptively learns and updates the recognizer.

本発明によれば、獲得された知識をシステム内で系統的に分類して保持・活用し、リアルタイムでの認識処理及び認識精度を向上させることができる。 According to the present invention, acquired knowledge can be systematically classified and retained and utilized in the system, and real-time recognition processing and recognition accuracy can be improved.

以下、図面を参照して本発明の実施の形態を説明する。図１〜図１３は本発明の実施の一形態に係り、図１は画像認識システムの基本構成図、図２は人抽出問題への適用例を示す説明図、図３は木構造状画像フィルタを示す説明図、図４は認識器出力の統合を示す説明図、図５は学習処理の流れを示す説明図、図６はデータベース管理部の構成を示すブロック図、図７はフィルタリング後の画像特徴量を示す説明図、図８は自己組織化マップによるクラス分けを示す説明図、図９は統合画像の評価を示す説明図、図１０は入れ替え選択の説明図、図１１は逐次学習の説明図、図１２は全体の処理の流れを示す説明図、図１３は処理例を示す説明図である。 Embodiments of the present invention will be described below with reference to the drawings. 1 to 13 relate to an embodiment of the present invention, FIG. 1 is a basic configuration diagram of an image recognition system, FIG. 2 is an explanatory diagram showing an application example to a person extraction problem, and FIG. 3 is a tree-structured image filter 4 is an explanatory diagram illustrating integration of recognizer outputs, FIG. 5 is an explanatory diagram illustrating a flow of learning processing, FIG. 6 is a block diagram illustrating a configuration of a database management unit, and FIG. 7 is an image after filtering. FIG. 8 is an explanatory diagram showing classification according to a self-organizing map, FIG. 9 is an explanatory diagram showing evaluation of an integrated image, FIG. 10 is an explanatory diagram of replacement selection, and FIG. 11 is an explanatory diagram of sequential learning. FIG. 12 is an explanatory diagram showing the overall processing flow, and FIG. 13 is an explanatory diagram showing an example of processing.

本発明の画像認識システムは、オンラインで入力される画像データを認識器で処理しながら、現在用いている認識器を環境に合わせて適応的に更新し、多様な環境、対象に対して、より高精度でロバストなシステムを構築するものである。すなわち、認識器は、処理時間やメモリ空間等の関係からその大きさや数に制限があり、また、天候・環境等によっても要求される認識器の性能が変わる。 The image recognition system according to the present invention adaptively updates the recognizer currently used according to the environment while processing the image data input online by the recognizer, and can be more suitable for various environments and objects. It is intended to build a highly accurate and robust system. That is, the size and number of the recognizers are limited due to the relationship between processing time, memory space, and the like, and the required recognizer performance varies depending on the weather and environment.

このような状況においても、過去に入力された画像データの学習結果や認識結果をシステムが保有するデータベースに蓄積し、このデータベースに蓄積した学習データを用いて認識器をオンラインで更新することで、多様な環境、対象に合わせて適応的に学習し、高精度かつロバストな認識を実現することができる。 Even in such a situation, the learning results and recognition results of the image data input in the past are accumulated in the database held by the system, and the recognizer is updated online using the learning data accumulated in this database. It can learn adaptively according to various environments and objects, and can realize highly accurate and robust recognition.

しかし、多様な走行環境を学習していくことで、過去に学習された認識処理の量が膨大になり、現在の状態へ活用する際に最適化の計算が膨大になり、効率的な活用の障害となる可能性がある。このため、本発明の画像認識システムでは、獲得された知識（認識処理）をシステム内で系統的に分類して保持・活用することにより、リアルタイムでの認識処理を効率的に行うことを可能としている。 However, by learning various driving environments, the amount of recognition processing learned in the past becomes enormous, and the optimization calculation becomes enormous when using it in the current state. It can be an obstacle. For this reason, in the image recognition system of the present invention, the acquired knowledge (recognition processing) is systematically classified and retained and utilized in the system, thereby enabling real-time recognition processing to be performed efficiently. Yes.

尚、ここでの画像データとは、カメラ等のイメージセンサで撮像した視覚情報データのみならず、レーザ・レーダ等により物体の２次元的な分布を検出した疑似画像形態のデータも含むものとする。 Note that the image data here includes not only visual information data captured by an image sensor such as a camera but also data in a pseudo image form in which a two-dimensional distribution of an object is detected by a laser radar or the like.

図１に示すように、本形態における画像認識システム１は、入力される画像データを複数の認識器５，…で並列に処理する認識処理部２、各認識器５，…の出力を統合する統合部３、処理の目標となる教師データを用いて認識器を学習的に更新する学習部４、獲得された知識（認識処理）を系統的に分類して保持するデータベース部６を備えて構成されている。 As shown in FIG. 1, the image recognition system 1 in this embodiment integrates the recognition processing unit 2 that processes input image data in parallel by a plurality of recognizers 5, and outputs of the recognizers 5. Consists of an integration unit 3, a learning unit 4 that learnably updates a recognizer using teacher data that is a processing target, and a database unit 6 that systematically classifies and holds acquired knowledge (recognition processing). Has been.

学習部４の構成について詳細に述べると、学習部４は、個々の認識器を評価する認識器評価部１０、全ての認識器（現在使用している認識器及びストックしている認識器）の中から最適な組み合わせを求め、現在使用している認識器の組み合わせを最適な組み合わせに入れ替える入替選択部１１、教師データを元に新たに認識器を作成する逐次学習部１２を備えて構成されている。 The configuration of the learning unit 4 will be described in detail. The learning unit 4 includes a recognizer evaluation unit 10 that evaluates each recognizer, and all recognizers (a currently used recognizer and a stock recognizer). A replacement selection unit 11 that obtains an optimal combination from among them and replaces the combination of the currently used recognizers with the optimal combination, and a sequential learning unit 12 that newly creates a recognizer based on teacher data are configured. Yes.

また、データベース部６は、詳細には、過去に作成された認識器及び新たに作成する認識器を保存する認識器データベースＤＢ１、過去に入力された教師データ及び新たに入力される教師データを保存する教師データベースＤＢ２、各データＤＢ１，ＤＢ２に保存する知識を系統的に分類して管理するデータベース管理部７を備えて構成されている。 Further, the database unit 6 stores, in detail, a recognizer database DB1 for storing a recognizer created in the past and a newly created recognizer, teacher data input in the past, and teacher data newly input. And a database management unit 7 that systematically classifies and manages the knowledge stored in each data DB1, DB2.

以下では、画像認識システム１を自動車等の車両に搭載して車載カメラからの動画像を処理し、歩行者を抽出する例について説明する。これは、図２に示すように、異なるシーンの動画像Ｑ１，Ｑ２，Ｑ３の中から破線で示す領域ＱＲ１，ＱＲ２，ＱＲ３に写っている人を抽出する人抽出問題への適用例である。 Below, the example which mounts the image recognition system 1 in vehicles, such as a motor vehicle, processes the moving image from a vehicle-mounted camera, and extracts a pedestrian is demonstrated. As shown in FIG. 2, this is an application example to a person extraction problem of extracting a person appearing in regions QR1, QR2, QR3 indicated by broken lines from moving images Q1, Q2, Q3 of different scenes.

入力画像を撮像する車載カメラとしては、例えば、ＣＣＤやＣＭＯＳ等の撮像素子を有するカメラを用い、自動車の車室内のルームミラー付近のフロントガラス内側に車載カメラとして配設する。この車載カメラにより、所定の時間周期（例えば、１／３０ｓｅｃ）毎に車両前方が撮像され、ノイズ除去、ゲイン調整、γ補正等のビデオプロセス処理を経て所定の階調（例えば２５６階調のグレースケール）のデジタル画像に変換された入力画像が認識処理部２に入力される。 As an in-vehicle camera that captures an input image, for example, a camera having an image sensor such as a CCD or a CMOS is used, and is disposed as an in-vehicle camera inside a windshield in the vicinity of a room mirror in a vehicle interior. With this in-vehicle camera, the front of the vehicle is imaged every predetermined time period (for example, 1/30 sec), and after a video process such as noise removal, gain adjustment, γ correction, etc., a predetermined gradation (for example, 256 gradation gray) An input image converted into a digital image of scale) is input to the recognition processing unit 2.

尚、認識処理部２には、現在の時刻ｔ及びそれ以前の時刻（ｔ−ｋ）の画像がＭフレーム毎にメモリから読み出されて入力される。ｋ及びＭの値は適宜設定可能であり、また、他の選択方法により相異なる複数種類の入力画像を選択して入力するように構成することも可能である。 Note that the image at the current time t and the previous time (t−k) is read from the memory and input to the recognition processing unit 2 every M frames. The values of k and M can be set as appropriate, and a plurality of different types of input images can be selected and input by other selection methods.

認識処理部２は、複数の認識器５，…で入力画像を並列に処理し、対象を抽出した処理画像を出力する。本形態では、処理目的が車両前方の風景画像の中からの歩行者の抽出であることから、入力画像の中から歩行者のみを抽出した画像が出力される。 The recognition processing unit 2 processes input images in parallel by a plurality of recognizers 5,..., And outputs a processed image obtained by extracting a target. In this embodiment, since the processing purpose is extraction of pedestrians from a landscape image in front of the vehicle, an image in which only pedestrians are extracted from the input image is output.

また、認識器５として、本形態においては、図３に示すように複数の画像フィルタＦ１，Ｆ２，…，Ｆｎ（図においては、ｎ＝８）を木構造状に組み合わせた木構造状画像フィルタを採用している。この木構造の各ノードとなる画像フィルタとしては、既存の各種画像フィルタ（例えば、平均値フィルタ、ソベルフィルタ、２値化フィルタ等）や目的に応じて機能が特化された画像フィルタが用いられ、これらの画像フィルタの最適な組合わせと総数が、遺伝的アルゴリズム（GA;Genetic Algorithm）の遺伝子型を構造的な表現（木構造やグラフ構造等）が扱えるように拡張した遺伝的プログラミング（GP;Genetic Programming）によって学習的に獲得される。 As the recognizer 5, in this embodiment, as shown in FIG. 3, a tree-structured image filter in which a plurality of image filters F1, F2,..., Fn (n = 8 in the figure) are combined in a tree structure. Is adopted. As the image filter that becomes each node of this tree structure, there are used various existing image filters (for example, an average value filter, a Sobel filter, a binarization filter, etc.) and an image filter specialized in function according to the purpose. The optimal combination and total number of these image filters are genetic programming (GP) that extends the genetic algorithm (GA) genotype to handle structural representations (tree structure, graph structure, etc.) Obtained by learning through Genetic Programming).

尚、認識器５としては、木構造状画像フィルタの他、ニューラルネットワーク、サポートベクタマシン、ファジー等による認識器、ステレオ画像をマッチング処理する認識器、レーザ・レーダによるスキャン画像を処理する認識器等を用いることも可能である。 The recognizer 5 includes a tree-structured image filter, a neural network, a support vector machine, a recognizer using fuzzy, a recognizer that performs stereo image matching processing, a recognizer that processes a scanned image by a laser radar, and the like. It is also possible to use.

本形態で採用する木構造状画像フィルタによる画像処理の詳細については、本出願人による特開２００６−１７８８５７号公報に詳述されている。ここでは、その概要について説明する。 Details of image processing by the tree-structured image filter employed in this embodiment are described in detail in Japanese Patent Application Laid-Open No. 2006-178857 by the present applicant. Here, the outline will be described.

本形態における木構造状画像フィルタでは、以下の適応度評価、選択、交叉、突然変異、適応度評価、終了判定の過程を経て木構造の最適化が行われ、ＧＰによって自動的に生成される処理プログラムにより、原画像から目標画像までの最適な変換プロセスが実現される。 In the tree-structured image filter according to this embodiment, the tree structure is optimized through the following fitness evaluation, selection, crossover, mutation, fitness evaluation, and end determination processes, and is automatically generated by the GP. The processing program realizes an optimal conversion process from the original image to the target image.

［適応度評価］
木構造状画像フィルタを個体として、ランダムに生成される初期個体集団の各個体の適応度を評価する。適応度は、各個体から出力される画像の目標画像に対する類似度で定義され、以下の（１）式を用いて算出される。尚、各個体は、最適化されるまでの進化過程において、木構造を構成する終端ノードの数が予め設定した最大値（例えば４０）を超えないように制限される。
Ｋ＝１．０−(１／Ｒ)・Σ_f（Σ_pＷ・│Ｏ−Ｔ│／Σ_pＷ・Ｖ）…（１）
但し、Σ_f：フレーム数ｆについての総和
Σ_p：１フレーム中のピクセルについての総和
Ｋ：適応度
Ｒ：学習セット数（入力画像及び教師画像の組み合わせを学習セットとして
評価に用いたセット数）
Ｏ：出力画像
Ｔ：目標画像（最適化された処理で出力すべき画像
Ｗ：重み画像（目標とする画像内での領域の重要度を表し、
出力画像と目標画像との距離に応じた重みが画素毎に定義された画像）
Ｖ：最大階調度 [Evaluation of fitness]
Using the tree-structured image filter as an individual, the fitness of each individual in the randomly generated initial individual population is evaluated. The fitness is defined by the similarity between the image output from each individual and the target image, and is calculated using the following equation (1). Each individual is limited so that the number of terminal nodes constituting the tree structure does not exceed a preset maximum value (for example, 40) in the evolution process until optimization.
K = 1.0- (1 / R) · Σ f (Σ p W · │O-T│ / Σ p W · V) ... (1)
Where Σ _{f is} the sum of the number of frames f
Σ _p : Sum of pixels in one frame
K: Fitness
R: Number of learning sets (a combination of input images and teacher images as learning sets
Number of sets used for evaluation)
O: Output image
T: target image (image to be output by optimized processing)
W: Weighted image (represents the importance of the area in the target image,
An image in which the weight corresponding to the distance between the output image and the target image is defined for each pixel)
V: Maximum gradation

［選択］
個体の複製のために親集団を選択する過程であり、適応度Ｋに基づいてルーレット選択や期待値選択、ランキング選択、トーナメント選択等の方法で次世代に残すべき個体の選択及び増殖を行う。本形態の木構造状画像フィルタでは、トーナメント選択により設定数の個体を選択すると共に、適応度Ｋが最大の個体のエリート保存を同時に行う。 [Choice]
This is a process of selecting a parent group for replication of individuals, and selection and growth of individuals to be left in the next generation are performed based on the fitness K by methods such as roulette selection, expected value selection, ranking selection, tournament selection, and the like. In the tree-structured image filter of this embodiment, a set number of individuals are selected by selecting a tournament, and the elite of the individual having the maximum fitness K is simultaneously stored.

［交叉、突然変異］
親集団から交叉、突然変異によって子集団を生成する過程であり、選択された個体をペアにして、それぞれの交叉点をランダムに選び、一点交叉、多点交叉、一様交叉等により、それぞれ交叉点に応じた部分木同士で交叉させ、子集団を生成する。生成された子集団は、個体毎に所定の割合でノードの変異、挿入、欠失等が行われ、突然変異による子集団が生成される。 [Crossover, mutation]
This is a process of generating a child group by crossover and mutation from a parent group. Pair each selected individual, select each crosspoint at random, and perform crossover by one-point crossover, multipoint crossover, uniform crossover, etc. A child group is generated by crossing subtrees according to points. The generated child population is subjected to node mutation, insertion, deletion, etc. at a predetermined ratio for each individual, and a child population is generated by the mutation.

［適応度評価、終了判定］
突然変異で生成された各個体は前述した適応度が評価され、エリート保存された前世代の適応度が最大の個体を含めて、最適化の処理終了が判定される。この処理の終了は、実行すべき最大世代数まで達したか否か、予め設定した目標適応度に達した個体があるか否か（目的とする個体が得られたか否か）等によって判定される。 [Evaluation of fitness, end judgment]
Each of the individuals generated by the mutation is evaluated for the fitness described above, and the end of the optimization process is determined including the individual with the maximum fitness of the previous generation stored in elite. The end of this process is determined by whether or not the maximum number of generations to be executed has been reached, whether or not there is an individual that has reached a preset target fitness (whether or not the target individual has been obtained), etc. The

世代数が終了世代数に到達していないときには、親選択に戻り、以上の処理過程を繰り返す。一方、世代数が終了世代数に到達したとき、或いは、適応度の最大値が所定の世代数経過してもその間変化しない場合、すなわち、適応度の最大値が停滞した場合には、その世代で最適化を打切り、最大の適応度を有する個体を解として出力する。 When the number of generations has not reached the number of end generations, the process returns to the parent selection and the above processing steps are repeated. On the other hand, when the number of generations reaches the number of end generations, or when the maximum fitness value does not change during the predetermined number of generations, that is, when the maximum fitness value stagnates, To cancel the optimization and output the individual with the maximum fitness as a solution.

以上の木構造の最適化は、各種シーンに対応するため、予めオフラインの事前学習においても実行され、典型的なシーン、例えば、昼、夜、天候、環境（高速道路、幹線道路、市街地等）に特化した認識器として、認識器データベースＤＢ１に後述するクラス毎にストックされる。 The above tree structure optimization is performed in advance in offline pre-learning in order to deal with various scenes. Typical scenes such as daytime, nighttime, weather, environment (highways, highways, urban areas, etc.) As a specialized recognizer, it is stocked for each class to be described later in the recognizer database DB1.

尚、以下では、木構造状画像フィルタを、適宜、「木構造フィルタ列」、或いは単に「木」と記載する。 In the following description, the tree-structured image filter is appropriately described as “tree-structure filter row” or simply “tree”.

画像認識システム１における通常の入力画像の処理は、認識処理部２及び統合部３で実行され、オンラインで常に送られてくる入力画像の中から対象が抽出される。すなわち、入力画像が認識処理部２の複数の木構造フィルタ列で並列に処理されると、この並列出力が統合部３で平均化されて統合され、統合画像が認識結果として出力される。 Normal input image processing in the image recognition system 1 is executed by the recognition processing unit 2 and the integration unit 3, and a target is extracted from input images that are always sent online. That is, when an input image is processed in parallel by a plurality of tree structure filter strings of the recognition processing unit 2, the parallel outputs are averaged and integrated by the integration unit 3, and an integrated image is output as a recognition result.

例えば、図４に示すように、入力データとなる原画像を４本の木構造フィルタ列ＴＲ１，ＴＲ２，ＴＲ３，ＴＲ４で処理する場合、各木構造フィルタ列ＴＲ１，ＴＲ２，ＴＲ３，ＴＲ４で処理した複数枚の出力画像に対して、それぞれ、出力重みＷｉ（ｉ＝１，２，３，４）が設定され、この出力重みＷｉで統合された画像が出力される。 For example, as shown in FIG. 4, when an original image serving as input data is processed by four tree structure filter trains TR1, TR2, TR3, TR4, the original image is processed by each tree structure filter train TR1, TR2, TR3, TR4. An output weight Wi (i = 1, 2, 3, 4) is set for each of a plurality of output images, and an image integrated with the output weight Wi is output.

統合画像中のｎ番目のピクセル値Ｐｎは、以下の（２）式に示すように、各木構造フィルタ列ＦＡ，ＦＢ，ＦＣ，ＦＤからの出力画像の対応するピクセル値ＰＡｎ，ＰＢｎ，ＰＣｎ，ＰＤｎを、出力重みＷ１，Ｗ２，Ｗ３，Ｗ４で加重平均した値で与えられる。尚、出力重みＷｉについての詳細は、以下の学習部４における認識器の入れ替え選択処理の中で説明する。
Ｐｎ＝（ＰＡｎ×Ｗ１＋ＰＢｎ×Ｗ２＋ＰＣｎ×Ｗ３＋ＰＤｎ×Ｗ４）／４…（２） As shown in the following equation (2), the nth pixel value Pn in the integrated image is a pixel value PAn, PBn, PCn, corresponding to the output image from each tree structure filter array FA, FB, FC, FD. PDn is given as a weighted average value with output weights W1, W2, W3, and W4. Details of the output weight Wi will be described in the recognizing device replacement selection process in the learning unit 4 below.
Pn = (PAn × W1 + PBn × W2 + PCn × W3 + PDn × W4) / 4 (2)

一方、学習部４は、オンラインで常に送られている入力画像の中から対象を認識する認識処理部２及び統合部３の処理とは別に、図５に示すように、教師データの入力をトリガとして、現在用いている認識器を環境に合わせて適応的に更新する処理をバックグラウンドにて実行する。尚、図５において、太線で示す矢印線が学習処理の流れを示し、破線の矢印線、細線の矢印線は、それぞれ、学習用画像、認識器の流れを示している。 On the other hand, the learning unit 4 triggers the input of teacher data, as shown in FIG. 5, separately from the processing of the recognition processing unit 2 and the integration unit 3 for recognizing a target from input images that are always sent online. As described above, a process for adaptively updating the currently used recognizer according to the environment is executed in the background. In FIG. 5, a thick arrow line indicates the flow of the learning process, and a broken arrow line and a thin arrow line indicate the flow of the learning image and the recognizer, respectively.

概略的には、入力データから教師データが作成されると、この教師データがデータベース管理部７でクラス毎に分類され、教師データベースＤＢ２にストックされる。そして、このクラス毎に分類された教師データを用いて、認識器評価部１０で、現在用いている木構造フィルタ列、認識器データベースＤＢ１にストックされている木構造フィルタ列がクラス別に評価される。 Schematically, when teacher data is created from input data, the teacher data is classified into classes by the database management unit 7 and stocked in the teacher database DB2. Then, using the teacher data classified for each class, the recognizer evaluation unit 10 evaluates the currently used tree structure filter string and the tree structure filter string stocked in the recognizer database DB1 for each class. .

各クラスにおける木構造フィルタ列の評価結果は入替選択部１１で参照され、最適な木構造フィルタ列の組み合わせが決定される。最適な木構造フィルタ列の組み合わせは、認識処理部２を形成する現在の木構造フィルタ列すなわち現在使用している複数の木構造フィルタ列の統合結果よりも良好な評価を得られることが前提であり、絶対的な条件として、現在の木構造フィルタ列の組み合わせよりも評価が悪くならないことが必要である。 The evaluation result of the tree structure filter sequence in each class is referred to by the replacement selection unit 11, and the optimum combination of the tree structure filter sequences is determined. The optimal combination of tree structure filter sequences is based on the premise that a better evaluation can be obtained than the integration result of the current tree structure filter sequence forming the recognition processing unit 2, that is, a plurality of tree structure filter sequences currently used. Yes, as an absolute condition, it is necessary that the evaluation is not worse than the current combination of tree structure filter sequences.

使うべき候補の木構造フィルタ列がない場合には、逐次学習部１２において、前述した進化的最適化手法であるＧＰを用いた学習により、新たな木構造フィルタ列が作成される（逐次学習）。そして、逐次学習で順次追加された木構造フィルタ列を含めた組み合わせが反復評価され、最終的に決定された最適な木構造フィルタ列の組み合わせにより、現在の認識処理部２の複数の木構造フィルタ列が部分的或いは全面的に入れ替えられる。 When there is no candidate tree structure filter sequence to be used, the sequential learning unit 12 creates a new tree structure filter sequence by learning using GP, which is the evolutionary optimization method described above (sequential learning). . Then, a combination including the tree structure filter sequence sequentially added by the sequential learning is repeatedly evaluated, and a plurality of tree structure filters of the current recognition processing unit 2 are determined by a combination of the optimum tree structure filter sequences finally determined. The columns are replaced partially or completely.

以下、学習部４の処理についての詳細な説明に先立ち、データベース管理部７による教師データ及び知識データの管理について説明する。 Hereinafter, management of teacher data and knowledge data by the database management unit 7 will be described prior to detailed description of processing of the learning unit 4.

データベース管理部７は、入力データとして走行中に撮影される画像、車内ネットワークを介して得られる車両操作や車両状態等の情報を用いて、各データベースＤＢ１，ＤＢ２内の過去に学習した画像及び学習により得られた認識処理を適切に管理し、入替選択部１１及び逐次学習部１２を効率的に制御するための情報を送る。 The database management unit 7 uses, as input data, images taken during traveling, information on vehicle operations and vehicle states obtained via the in-vehicle network, images learned in the past and learning in the databases DB1 and DB2. The recognition process obtained by the above is appropriately managed, and information for efficiently controlling the replacement selection unit 11 and the sequential learning unit 12 is sent.

前述したように、各データベースＤＢ１，ＤＢ２には、多様な走行環境を学習した膨大な学習結果が蓄積されることから、現在の状態へ効率的に活用するには、適切な対策が必要となる。このため、データベース管理部７は、図６に示すように、特徴量抽出部７ａ、教師マップ作成部７ｂ、属性設定部７ｃ、クラス判別部７ｄの各機能部を備え、これらの機能部により、獲得された知識（認識処理）をシステム内で系統的に分類し、リアルタイムでの認識処理を効率的に行うことを可能としている。 As described above, each database DB1 and DB2 stores a large amount of learning results obtained by learning various driving environments, so that appropriate measures are required to efficiently utilize the current state. . For this reason, as shown in FIG. 6, the database management unit 7 includes functional units such as a feature amount extraction unit 7a, a teacher map creation unit 7b, an attribute setting unit 7c, and a class determination unit 7d. Acquired knowledge (recognition processing) is systematically classified in the system, enabling real-time recognition processing to be performed efficiently.

特徴量抽出部７ａは、走行中に経験したリスク情報から作成された教師画像が入力されると、教師画像から特徴量を抽出する。すなわち、教師画像から、エッジ情報、動き情報、色情報（明度、彩度、色相）等の特徴量を抽出し、それらの情報をＮ次元ベクトルとして保持する。このＮ次元ベクトルには、画像特徴量以外の車両情報、例えば、車速やヨー角の変化といった情報も含めることができる。 When a teacher image created from risk information experienced during traveling is input, the feature amount extraction unit 7a extracts a feature amount from the teacher image. That is, feature amounts such as edge information, motion information, and color information (brightness, saturation, hue) are extracted from the teacher image, and the information is held as an N-dimensional vector. The N-dimensional vector can include vehicle information other than the image feature amount, for example, information such as a change in vehicle speed and yaw angle.

この場合の特徴量抽出は、以降の認識のためのデータ抽出であるが、一般に、目的とする認識に相関がないデータは認識に悪影響を与える。つまり、この特徴量抽出処理においては、むやみに特徴量を増やすということは得策でなく、逆に、必要な特徴量を用いないことも精度を悪化させる。 The feature amount extraction in this case is data extraction for subsequent recognition. In general, data that is not correlated with the target recognition adversely affects the recognition. That is, in this feature quantity extraction process, it is not a good idea to increase the feature quantity unnecessarily, and conversely, not using a necessary feature quantity also deteriorates accuracy.

そのため、どの特徴量を用いるべきかという特徴量選択が課題として発生し、ここでの特徴量選択を学習的に行うと、以降の認識処理の上位の学習が必要になり、計算量・メモリ容量的にオンラインでの学習には不利である。従って、本形態では、ここでの特徴量抽出部分は固定として扱う例について説明する。 For this reason, feature quantity selection as to which feature quantity should be used occurs as a problem, and if feature quantity selection here is performed in a learning manner, higher learning of the subsequent recognition processing is required, and the calculation amount / memory capacity This is disadvantageous for online learning. Therefore, in this embodiment, an example will be described in which the feature amount extraction portion is treated as fixed.

尚、特徴量選択を学習的に行う場合には、システムの認識率を基準として評価し、各特徴量の組み合わせを最適化すれば良く、これには、組み合わせの全探索、ＧＡ等の発見的な探索法等、既存の最適化手法を用いることができる。 In addition, when performing feature selection in a learning manner, it is only necessary to evaluate the recognition rate of the system as a reference and optimize the combination of each feature amount. Existing optimization methods such as simple search methods can be used.

本形態においては、予め設定した種類の特徴量を抽出している。例えば、処理を複数の要素に分け、各要素毎に設定した特徴量を抽出する。複数の要素としては、前処理、特徴量計算、領域設定等を用いことが可能である。以下に示すように、前処理で６種類、特徴量計算で１０種類、領域設定で４種類のデータを抽出することができ、それらの組み合わせで計２４０（６×１０×４）次元のデータを抽出することができる。 In the present embodiment, feature types of preset types are extracted. For example, the process is divided into a plurality of elements, and feature amounts set for each element are extracted. As the plurality of elements, preprocessing, feature amount calculation, region setting, and the like can be used. As shown below, 6 types of data can be extracted in the pre-processing, 10 types in the feature amount calculation, and 4 types in the region setting, and a total of 240 (6 × 10 × 4) dimensional data can be obtained by combining them. Can be extracted.

＜前処理＞
入力画像に対して、ソベル、縦方向ソベル、横方向ソベル、フレーム間差分、輝度、彩度の６種類のフィルタ処理を行うことにより、６次元の特徴量データを抽出することができる。 <Pretreatment>
Six-dimensional feature data can be extracted by performing six types of filter processing on the input image, sobel, vertical sobel, horizontal sobel, inter-frame difference, luminance, and saturation.

＜特徴量＞
フィルタ処理された画像の画素値に対して、平均、分散、最大値、最小値、横方向重心、縦方向重心、コントラスト、均一性、エントロピー、フラクタル次元の１０種類の計算処理を行うことにより、１０次元の特徴量データを抽出することができる。 <Feature amount>
By performing 10 types of calculation processing on the pixel values of the filtered image, average, variance, maximum value, minimum value, horizontal centroid, vertical centroid, contrast, uniformity, entropy, fractal dimension, 10-dimensional feature data can be extracted.

＜領域＞
画像内に領域を設定し、この設定領域の全体、設定領域内の左側の領域、右側の領域、中央の領域の４種類の領域について、４次元の特徴量データを抽出することができる。 <Area>
An area is set in the image, and four-dimensional feature amount data can be extracted for the entire set area, the left area in the set area, the right area, and the central area.

以上の２４０次元の特徴量は、オンラインシステムの演算性能に応じて、使用する次元を絞るようにしても良い。また、画像以外にも車両データも用いて、画面全体のソベルの平均、分散、画面全体のフレーム間差分の平均、分散、車速、ハンドル角の６次元の特徴量を抽出するようにしても良い。 The above 240-dimensional feature values may be narrowed down according to the calculation performance of the online system. In addition to images, vehicle data may also be used to extract 6-dimensional feature values such as the average and variance of the Sobel for the entire screen, the average of inter-frame differences for the entire screen, the variance, the vehicle speed, and the steering wheel angle. .

また、以上の特徴量抽出処理においては、各特徴量は正規化しているが、理論上の範囲は非効率であるため、事前に各特徴量の分布を評価しておき、その評価結果を元に最大値及び最小値を設定し、０〜１の数値に正規化している。その場合、最大値・最小値を動的に変化させるようにしても良く、例えば、最大値を超える値もしくは最小値を下回る値が入力された場合には、それぞれ範囲を拡大するように最大値・最小値を変更する。逆に、しばらく最小値、最大値付近のデータが入ってこなかった場合は、範囲を狭めるように変更する。 In the above feature quantity extraction process, each feature quantity is normalized, but the theoretical range is inefficient. Therefore, the distribution of each feature quantity is evaluated in advance, and the evaluation result is used as a basis. The maximum value and the minimum value are set to, and normalized to a numerical value of 0 to 1. In that case, the maximum and minimum values may be changed dynamically. For example, when a value exceeding the maximum value or a value below the minimum value is input, the maximum value is expanded so that the range is expanded.・ Change the minimum value. Conversely, if the data near the minimum and maximum values has not been entered for a while, the range is changed to narrow.

また、ここでは基本的な特徴量を用いたが、過去のフレーム情報を用いて動き情報を算出する等、特徴量の時系列的な変動を計算し、その情報を特徴量として用いることもできる。更に、全体としてのリスク認識の精度向上のためには、この特徴量抽出処理に高精度の画像処理を入れることもでき、例えば、過去の歩行者認識結果、道路の白線認識結果、障害物認識結果等を含めて、ここでの抽出データに組み込むようにしても良い。 Although basic feature values are used here, it is also possible to calculate time-series fluctuations of feature values, such as calculating motion information using past frame information, and use the information as feature values. . Furthermore, in order to improve the accuracy of risk recognition as a whole, high-accuracy image processing can be added to this feature amount extraction processing. For example, past pedestrian recognition results, road white line recognition results, obstacle recognition You may make it incorporate in extraction data here including a result.

以上により教師画像から特徴量を抽出すると、教師マップ作成部７ｂでは、過去に得られた教師画像と併せて特徴量空間をクラスタリングしてクラス毎に分類し、クラス毎に教師画像のマップを作成する。ここでは、大脳皮質の視覚野をモデル化したニューラルネットワークの一種である自己組織化マップ（SOM;Self-Organization Maps）を用いてクラスタリングを行う。 When the feature amount is extracted from the teacher image as described above, the teacher map creating unit 7b classifies the feature amount space together with the teacher image obtained in the past to classify the class by class, and creates a map of the teacher image for each class. To do. Here, clustering is performed using a self-organizing map (SOM), which is a type of neural network that models the visual cortex of the cerebral cortex.

ＳＯＭは、Ｍ次元（通常は２次元）に並べられたユニットが、それぞれベクトル値（通常入力との結線の重みと呼ばれる）を持ち、入力に対して勝者ユニットをベクトルの距離を基準として決定するものであり、勝者ユニット及びその周辺のユニットの参照ベクトル値を、入力ベクトルに近づくように更新する。これを繰り返し、全体が入力データの分布を最適に表現できるように学習してゆくことで、代表ベクトルで代表されるデータ密度の高い空間をクラスとして分類する。 In the SOM, units arranged in the M dimension (usually two dimensions) each have a vector value (referred to as a connection weight with the normal input), and the winner unit is determined with respect to the input based on the vector distance. The reference vector values of the winner unit and its surrounding units are updated so as to approach the input vector. By repeating this and learning so that the entire distribution of the input data can be optimally expressed, a space having a high data density represented by a representative vector is classified as a class.

例えば、教師画像をソベルフィルタを用いてフィルタリングした後、画像中の画素値の平均、分散を用いて１枚の画像から１×２種類の画像特徴量を抽出し、図７に示すように、連続する３４フレーム分の画像から画像特徴量を抽出した場合、この画像特徴量をＳＯＭによってクラスタリングすると、図８に示すように、Ａ，Ｂ，Ｃ，Ｄのクラスに分類することができる。 For example, after filtering a teacher image using a Sobel filter, 1 × 2 types of image feature amounts are extracted from one image using the average and variance of pixel values in the image, and as shown in FIG. When image feature amounts are extracted from 34 consecutive frames of images, if these image feature amounts are clustered by SOM, they can be classified into classes A, B, C, and D as shown in FIG.

このＳＯＭクラスタリングによる教師画像のクラス分けは、確定的なものではなく、多様な走行環境に対応して適応的に更新する必要がある。このため、教師マップ作成部７ｂは、走行中に自律的にクラスタリングを更新してゆき、環境や時間の推移による特徴量の変化に対して適応的にクラスを分類する。 The classification of teacher images by SOM clustering is not deterministic and needs to be adaptively updated in accordance with various driving environments. For this reason, the teacher map creation unit 7b autonomously updates the clustering while traveling, and adaptively classifies the classes with respect to changes in the feature amount due to changes in the environment and time.

その後、入力教師画像の属するクラスが決定されると、次に、属性設定部７ｃにおいて、その教師画像に属性情報を付加し、教師データベースＤＢ２へ転送する。教師画像に付加する属性情報は、その教師画像が属するクラス、及び所属するクラスの中心からの距離を主として、その他、特徴量空間上の他のクラスの中心からの距離や、用いた学習セットのクラス属性の平均値等を付加する。 After that, when the class to which the input teacher image belongs is determined, the attribute setting unit 7c adds attribute information to the teacher image and transfers it to the teacher database DB2. The attribute information added to the teacher image mainly includes the class to which the teacher image belongs and the distance from the center of the class to which the teacher image belongs, in addition to the distance from the center of another class in the feature amount space, and the learning set used. Add the average value of class attributes.

尚、ＳＯＭによるクラスタリングを確率的なモデルを用いて行う場合には、クラスの中心からの距離に応じた確率表現的な情報を属性情報として付加するようにしても良い。 When clustering by SOM is performed using a probabilistic model, probabilistic information according to the distance from the center of the class may be added as attribute information.

以上によりクラス毎に分類された教師画像により、認識器がクラス毎に入れ替えられ、クラス毎の教師画像に対応した認識器のデータベースが認識器データＤＢ２内に形成される。次に、認識器のリアルタイム入れ替え処理について説明する。 Based on the teacher images classified for each class as described above, the recognizers are replaced for each class, and a database of recognizers corresponding to the teacher images for each class is formed in the recognizer data DB2. Next, the real-time replacement process of the recognizer will be described.

認識器の入れ替えは、入力データがデータベース管理部７へ入力されたときに開始され、先ず、データベース管理部７のクラス判別部７ｄにおいて、入力データがどのクラスに属するかを判断する。属するクラスの判断は、特徴量抽出部７ａによって得られる特徴量空間に入力データを投射し、教師画像のマップから入力データに対応するクラスを決定する。そして、決定されたクラスの属性をもつ認識器を選択し、リアルタイムに入れ替え処理を行う。 The replacement of the recognizer is started when input data is input to the database management unit 7. First, the class determination unit 7d of the database management unit 7 determines which class the input data belongs to. To determine the class to belong to, the input data is projected onto the feature amount space obtained by the feature amount extraction unit 7a, and the class corresponding to the input data is determined from the map of the teacher image. Then, a recognizer having the determined class attribute is selected and replaced in real time.

例えば、入力データの特徴量が前述の図８に示すＡ，Ｂ，Ｃ，Ｄのクラスに対して、クラスＡの中心からの距離が最も小さい場合には、入力データのクラスはＡであると決定し、認識器データベースＤＢ１にストックされているクラスＡの認識器（木構造フィルタ列）を用いて、現在使用している認識器（木構造フィルタ列）の入れ替え処理を行う。 For example, when the feature quantity of the input data is the distance from the center of the class A with respect to the classes A, B, C, and D shown in FIG. 8, the class of the input data is A. The classifier recognizer (tree structure filter string) stocked in the recognizer database DB1 is used to replace the currently used recognizer (tree structure filter string).

この場合、クラスは、必ずしも一つのクラスに限定されることなく、特徴量空間上の距離が近い複数のクラスを対象として学習サンプルを抽出し、複数のシーンで平均的な認識処理を行うようにしても良い。 In this case, the class is not necessarily limited to a single class, and learning samples are extracted for a plurality of classes having a short distance in the feature amount space, and an average recognition process is performed in a plurality of scenes. May be.

認識器入れ替え処理では、先ず、学習部４の認識器評価部１０において、現在使用している木構造フィルタ列、及び認識器データベースＤＢ１内の対応するクラスの木構造フィルタ列を個別に評価する。尚、この木構造フィルタ列の評価に際しては、認識器データベースＤＢ１において、対象となるクラス内の木構造フィルタ列の整理を行い、評価の低い木構造フィルタ列を削除するようにしても良い。 In the recognizer replacement process, first, the recognizer evaluation unit 10 of the learning unit 4 individually evaluates the currently used tree structure filter string and the tree structure filter string of the corresponding class in the recognizer database DB1. In evaluating the tree structure filter string, the tree structure filter string in the target class may be arranged in the recognizer database DB1, and the tree structure filter string having a low evaluation may be deleted.

具体的には、教師データを用いて個々の木構造フィルタ列の画像評価値を求め、更に、以下の（ａ）〜(ｄ）の条件を加算的に或いは選択的に考慮して評価を行う。木構造フィルタ列の画像評価値としては、（１）式の適応度Ｋに準じた値を用いることができる。 Specifically, image evaluation values of individual tree structure filter sequences are obtained using the teacher data, and further, evaluation is performed in consideration of the following conditions (a) to (d) in addition or selectively. . As the image evaluation value of the tree structure filter row, a value according to the fitness K in equation (1) can be used.

（ａ）寿命
（現在の時間−作られた時間）を木の寿命とし、最近作られた若い木ほど、評価値を高くする。
（ｂ）使用回数
過去に使用された回数が多い木は、評価値を高くする。
（ｃ）サイズ
サイズの小さい木ほど、評価値を高くする。
（ｄ）使用状態
現在使用している木に対しては、過去に使用した木よりも評価を高くする。 (A) Life (current time-time of creation) is the life of the tree, and the younger tree that has been recently made has a higher evaluation value.
(B) Number of uses Trees that have been used in the past have a high evaluation value.
(C) Size The smaller the tree, the higher the evaluation value.
(D) Usage status Evaluation of a currently used tree is higher than that of a tree used in the past.

例えば、画像評価値Ｇ、寿命Ｌ、使用回数Ｓ、使用状態ＴＳを加算的に考慮して木を評価する場合、評価値Ｆは、以下の（３）式により求めることができる。
Ｆ＝Ｇ×α＋Ｌ×β＋Ｓ×γ＋ＴＳ×δ …（３）
但し、α，β，γ，δ：定数 For example, when evaluating a tree in consideration of the image evaluation value G, the life L, the number of uses S, and the use state TS, the evaluation value F can be obtained by the following equation (3).
F = G × α + L × β + S × γ + TS × δ (3)
Where α, β, γ, δ: constants

求めた評価値は過去に遡り、累積した値が現在の評価値となる。クラス内の全ての木構造フィルタ列の評価が終わり次第、入替選択部１１の処理へ移る。 The obtained evaluation value goes back in the past, and the accumulated value becomes the current evaluation value. As soon as the evaluation of all the tree structure filter columns in the class is completed, the processing of the replacement selection unit 11 is performed.

入替選択部１１は、現在用いている木とクラス内にストックされている木とを含めて全ての木の中から、最も評価が高くなるＮ本の木の組み合わせを求める。組み合わせの数Ｎが一定数Ｍに満たない場合には、逐次学習により新しい木を作成して木を追加し、Ｎ＝Ｍとなった時点で、常に入力データを処理していた木群を新しい木群に入れ替える。 The replacement selection unit 11 obtains a combination of N trees having the highest evaluation from all trees including the currently used tree and the tree stocked in the class. If the number of combinations N is less than a certain number M, a new tree is created by sequential learning and added, and when N = M, the tree group that has always processed the input data is new. Replace with a group of trees.

一定数Ｍは、認識処理部２を形成する木構造フィルタ列の数（常時使用する木構造フィルタ列の数）であり、例えば、認識器データベースＤＢ１内に総計２０本の木構造フィルタ列がストックされている場合、組み合わせ数を対象とするクラス内に数によって制限し、常時使用する木として最大１０本までの最適な組み合わせを求める。これにより、走行中の状況に対応した高精度な認識処理の構築を行う上で、効率的な入れ替え選択を行うことができる。 The certain number M is the number of tree structure filter columns forming the recognition processing unit 2 (the number of tree structure filter columns used constantly). For example, a total of 20 tree structure filter columns are stocked in the recognizer database DB1. If the number of combinations is limited, the number of combinations is limited by the number of classes, and an optimal combination of up to 10 trees is obtained as a tree that is always used. Accordingly, efficient replacement selection can be performed in constructing a highly accurate recognition process corresponding to a traveling situation.

木群の入れ替えに際しては、現在使用している木の組み合わせによる統合画像の評価結果を基準とする。すなわち、図９に示すように、新しい教師データである原画像を現在の木群ＴＲで並列処理して統合し、その統合画像を目標画像と比較して評価し、この評価結果を基準として、新しい組み合わせの木群を入れ替えるか否かを判断する。 When replacing a group of trees, the evaluation result of the integrated image based on the currently used tree combination is used as a reference. That is, as shown in FIG. 9, the original image, which is new teacher data, is integrated by parallel processing in the current tree group TR, the integrated image is compared with the target image, and the evaluation result is used as a reference. Judge whether to replace the new group of trees.

また、最適な木の組み合わせに際しては、組み合わせた木群の統合画像を用いて評価を行う。例えば、図１０に示すように、認識器データベースＤＢ１の対応するクラスに、ＴＲ１，ＴＲ２，ＴＲ３，ＴＲ４という木があり、木ＴＲ１，ＴＲ２，ＴＲ３，ＴＲ４の中から、木ＴＲ１，ＴＲ２の２本を選択した場合、木ＴＲ１，ＴＲ２を用いて作成した統合画像を目標画像と比較して評価値を算出する。算出した評価値が他の組み合わせの評価値よりも高ければ、木ＴＲ１，ＴＲ２を選択し、低ければ、他の木を選択して同様に評価を行う。このような処理を反復して全ての組み合わせを評価し、評価が最も高い組み合わせを求める。 Further, when an optimum tree is combined, evaluation is performed using an integrated image of the combined tree group. For example, as shown in FIG. 10, the classes corresponding to the recognizer database DB1 include trees TR1, TR2, TR3, TR4, and two trees TR1, TR2 from the trees TR1, TR2, TR3, TR4. Is selected, the integrated image created using the trees TR1 and TR2 is compared with the target image to calculate an evaluation value. If the calculated evaluation value is higher than the evaluation values of other combinations, the trees TR1 and TR2 are selected. If the calculated evaluation value is lower, the other trees are selected and evaluated in the same manner. By repeating such processing, all combinations are evaluated, and the combination having the highest evaluation is obtained.

評価については以下に定義する式を用いて、評価値を算出する。
［評価方法］
評価値は、新しい組み合わせの木群によって作られた統合画像の目標画像に対する類似度で定義され、以下の（１）’式を用いて算出される。
Ｋ＝１．０−Σ_f（Σ_pＷ・│Ｏ−Ｔ│／Σ_pＷ・Ｖ）…（１）’
但し、Σ_f：フレーム数ｆについての総和
Σ_p：１フレーム中のピクセルについての総和
Ｋ：評価値
Ｏ：統合画像
Ｔ：目標画像（最適化された処理で出力すべき画像）
Ｗ：重み画像（目標とする画像内での領域の重要度を表し、
統合画像と目標画像との距離に応じた重みが画素毎に定義された画像）
Ｖ：最大階調度 For evaluation, the evaluation value is calculated using the formula defined below.
[Evaluation methods]
The evaluation value is defined by the similarity between the integrated image created by the new group of trees and the target image, and is calculated using the following equation (1) ′.
_{K = 1.0-Σ f (Σ} p W · │O-T│ / Σ p W · V) ... (1) '
Where Σ _{f is} the sum of the number of frames f
Σ _p : Sum of pixels in one frame
K: Evaluation value
O: Integrated image
T: Target image (image to be output by optimized processing)
W: Weighted image (represents the importance of the area in the target image,
An image in which the weight corresponding to the distance between the integrated image and the target image is defined for each pixel)
V: Maximum gradation

尚、どの木を使うかという組み合わせ中で最適なものを選ぶのと同時に、各木の出力の強弱を最適化するようにしても良い。この出力の強弱は、前述の（２）式で説明した出力重みＷｉを、個々の木の評価値を参照して決定することで最適化することができる。例えば、木ＴＲ１の出力画像（のピクセル値）ＰＡｎに対する出力重みが［０．３］、木ＴＲ２の出力画像（のピクセル値）ＰＢｎに対する出力重みが［０．８］とすると、統合画像中のｎ番目のピクセル値Ｐｎにおいて、以下の（２）’式の値となり、上記と同様に出力重みが付いた統合画像から、評価値を求めることができる。
Ｐｎ＝（ＰＡｎ×０．３＋ＰＢｎ×０．８）／２ …（２）’ It should be noted that it is also possible to optimize the strength of the output of each tree at the same time as selecting the optimum one among the combinations of which trees to use. The strength of the output can be optimized by determining the output weight Wi described in the above equation (2) with reference to the evaluation value of each tree. For example, if the output weight for the output image (pixel value) PAn of the tree TR1 is [0.3] and the output weight for the output image (pixel value) PBn of the tree TR2 is [0.8], The n-th pixel value Pn becomes the value of the following expression (2) ′, and the evaluation value can be obtained from the integrated image with the output weight as described above.
Pn = (PAn × 0.3 + PBn × 0.8) / 2 (2) ′

この場合、出力重みと木の組み合わせは、［重みの種類］を［木の本数］で累乗した数となり、例えば、出力重みの候補が［０］，［０．３］，［０．８］，［１．０］の４種類あり、２本の木があるとすると、出力重みと木の組み合わせは計１６種類となり、この１６種類について評価値を求め、評価値が最大となった組み合わせを求めることになる。尚、実際の出力重みは、０〜１まで０．１刻みの１０種類が設定されている。 In this case, the combination of the output weight and the tree is a number obtained by raising the [weight type] to the power of [number of trees]. For example, the output weight candidates are [0], [0.3], [0.8]. , [1.0], and there are two trees, there are a total of 16 combinations of output weights and trees. The evaluation values are obtained for these 16 types, and the combination having the maximum evaluation value is obtained. Will be asked. The actual output weights are set to 10 types in increments of 0.1 from 0 to 1.

入替選択部１１において、全ての木構造フィルタ列の組み合わせが評価され、最適な組み合わせとなる木群の数Ｎが一定数Ｍに満たない場合、逐次学習部１２での逐次学習が実行される。 In the replacement selection unit 11, all combinations of tree structure filter sequences are evaluated, and when the number N of tree groups that are optimal combinations is less than a certain number M, sequential learning in the sequential learning unit 12 is executed.

逐次学習部１２は、入替選択部１１によって選択された最適な組み合わせのＮ本の木の出力結果を更に修正し、最適な組み合わせの木の本数Ｎが一定数Ｍになるまで、逐次的に学習して木を追加する。 The sequential learning unit 12 further corrects the output result of the optimal combination of N trees selected by the replacement selection unit 11, and sequentially learns until the optimal combination tree number N reaches a certain number M. And add trees.

学習の流れとしては、例えば、図１１に示すように、入替選択部１１で選ばれた組み合わせが木ＴＲ１，ＴＲ２であったとすると、この木ＴＲ１，ＴＲ２の統合画像と目標画像との差から木ＴＲ１，ＴＲ２が間違った箇所について重み付けを行い、間違った個所を修正点として重み付けした画像（修正重み画像）を作成する。 As a learning flow, for example, as shown in FIG. 11, if the combination selected by the replacement selection unit 11 is the trees TR1 and TR2, the tree is determined from the difference between the integrated image of the trees TR1 and TR2 and the target image. Weighting is performed on the location where TR1 and TR2 are wrong, and an image (corrected weight image) is created by weighting the wrong location as a correction point.

例えば、目標画像の値のうち、人であると教師している領域を輝度値２５５（最重要）、統合画像と目標画像を比べて間違った部分を輝度値１２７（重要）、それ以外の領域を輝度値１（やや重要）として、修正重み画像を作成する。そして、作成した修正重み画像を用いて新たな木ＴＲ３’を一つ作成し、木構造のバッファへ追加する。 For example, among the values of the target image, the luminance value 255 (most important) is an area where the person is instructed to be a person, an incorrect portion is compared with the luminance value 127 (important) when comparing the integrated image and the target image, and the other areas Is set to a luminance value of 1 (somewhat important), and a correction weight image is created. Then, a new tree TR3 'is created using the created correction weight image and added to the tree structure buffer.

次に、木ＴＲ１，ＴＲ２，ＴＲ３’の統合画像を求め、この統合画像の目標画像に対する評価値に基づいて、新しい木ＴＲ３’を追加するか否かを判定する。評価値が閾値を超えていれば、図１１に示すように、木ＴＲ３’を追加して新たな組み合わせの木群ＴＲ１，ＴＲ２，ＴＲ３’とし、評価値が閾値以下の場合には、今回作成された木ＴＲ３’は追加せず、学習を逐次的にやり直す。すなわち、同様に、修正重み画像を作成し、更に新たな別の木ＴＲ４を作成し、木ＴＲ１，ＴＲ２，ＴＲ４の組み合わせによる統合画像を評価するという具合に、木構造の数Ｎが一定数Ｍになるまで木を追加する。 Next, an integrated image of the trees TR1, TR2, and TR3 'is obtained, and it is determined whether or not a new tree TR3' is to be added based on the evaluation value of the integrated image with respect to the target image. If the evaluation value exceeds the threshold value, as shown in FIG. 11, a tree TR3 ′ is added to form a new combination tree group TR1, TR2, TR3 ′. The learned tree TR3 ′ is not added, and learning is sequentially repeated. That is, similarly, a modified weight image is created, another new tree TR4 is created, and an integrated image based on a combination of the trees TR1, TR2, and TR4 is evaluated. Add trees until

実際には、一定数Ｍを１０本と定め、入れ替え選択によって選ばれた木が１０本になるまで、木の追加を行う。そして、木の数ＮがＭ本になった時点で逐次学習を終了し、常に入力データを処理していた木群を、作成した新しい木群に入れ替える。 In practice, the fixed number M is set to 10 and trees are added until 10 trees are selected by the replacement selection. When the number N of trees reaches M, the sequential learning is terminated, and the tree group that has always processed the input data is replaced with the new tree group that has been created.

尚、新たな木とは、前述したＧＰ（遺伝的プログラミング）により、ストックされている木を初期個体として進化させたもののみならず、現在使用している木を初期個体として進化させたものも含んでおり、計算時に選択される確率をクラス属性情報により設定し、入力された教師画像に該当するシーンを中心に探索を行うことで、効率的な入れ替え選択を行うことができる。 In addition, the new tree is not only a tree that has been evolved as an initial individual by GP (genetic programming), but also a tree that is currently being used as an initial individual. The probability of selection at the time of calculation is set by the class attribute information, and the search is performed centering on the scene corresponding to the input teacher image, so that efficient replacement selection can be performed.

全体の処理の流れを、図１２を中心として図１３を併用して説明する。図１２に示すように、原画像が新しい教師データとして入力されると、認識処理部２で現在の組み合わせのＭ本の認識器（木構造フィルタ列）によって並列に処理され、それぞれの出力結果が統合される。図１３のＱ１’が原画像の例であり、この原画像Ｑ１’を認識器で処理して統合した画像がＱ２’である。この統合画像Ｑ２’では、現在用いている認識器が新しい教師データに対して人を全く抽出していないことが分かる。 The overall processing flow will be described with reference to FIG. As shown in FIG. 12, when the original image is input as new teacher data, it is processed in parallel by the M combination recognizers (tree structure filter train) of the current combination in the recognition processing unit 2, and the respective output results are displayed. Integrated. Q1 'in FIG. 13 is an example of an original image, and an image obtained by processing and integrating the original image Q1' with a recognizer is Q2 '. In this integrated image Q2 ', it can be seen that the recognizer currently used does not extract any person from the new teacher data.

このとき、データベース管理部７で原画像の属するクラスが決定され、認識器データベースＤＢ１の対応するクラスの認識器及び現在用いている認識器を認識器評価部１０で評価した後、入替選択部１１で認識器の新たな組み合わせを決定し、認識器をＮ本選択してその統合画像を評価する。図１３のＱ３’は、木構造フィルタ列３本の新たな組み合わせを選択した場合の統合画像を示しており、この統合画像Ｑ３’では、人を抽出しているが、背景に誤抽出があることが分かる。 At this time, the class to which the original image belongs is determined by the database management unit 7, the classifier corresponding to the classifier database DB 1 and the currently used classifier are evaluated by the classifier evaluation unit 10, and then the replacement selection unit 11. To determine a new combination of recognizers, select N recognizers, and evaluate the integrated image. Q3 ′ in FIG. 13 shows an integrated image when a new combination of three tree structure filter rows is selected. In this integrated image Q3 ′, a person is extracted, but there is an erroneous extraction in the background. I understand that.

この背景の誤抽出は、逐次学習部１２での逐次学習により、画像を修正するように学習され、図１３のＱ４’に示すような統合画像が得られる。図１３の統合画像Ｑ４’は、人を抽出しつつ、背景の誤抽出が減っているのが分かる。この逐次学習の繰り返しを経て、最終的に決定される認識器の組み合わせの数がＭ本に達したとき、現在の認識処理部２が新しい組み合わせの認識器で更新され、背景の誤抽出を排除することができる。 This background erroneous extraction is learned so as to correct the image by the sequential learning in the sequential learning unit 12, and an integrated image as indicated by Q4 'in FIG. 13 is obtained. In the integrated image Q4 'in FIG. 13, it can be seen that background extraction is reduced while people are extracted. When the number of combinations of finally recognized recognizers reaches M after repeating this sequential learning, the current recognition processing unit 2 is updated with a new combination of recognizers to eliminate erroneous background extraction. can do.

以上のように、本実施の形態の画像認識システムは、過去に入力された画像データの学習結果や認識結果をシステム内で系統的に分類してデータベースに蓄積し、このデータベースに蓄積した学習データを用いて認識器をオンラインで更新するようにしている。これにより、蓄積された知識の量が増大しても、多様な環境、対象に合わせて適応的に学習する際の最適化を効率的且つ高速に行うことができ、高精度かつロバストな認識を実現することができる。 As described above, the image recognition system according to the present embodiment systematically classifies learning results and recognition results of image data input in the past in the system, accumulates them in a database, and stores the learning data accumulated in the database. Is used to update the recognizer online. As a result, even when the amount of accumulated knowledge increases, optimization can be performed efficiently and quickly for adaptive learning according to various environments and objects, and highly accurate and robust recognition is possible. Can be realized.

画像認識システムの基本構成図Basic configuration of image recognition system 人抽出問題への適用例を示す説明図Explanatory drawing showing an application example to the person extraction problem 木構造状画像フィルタを示す説明図Explanatory drawing showing a tree-structured image filter 認識器出力の統合を示す説明図Explanatory diagram showing integration of recognizer outputs 学習処理の流れを示す説明図Explanatory diagram showing the flow of the learning process データベース管理部の構成を示すブロック図Block diagram showing the configuration of the database manager フィルタリング後の画像特徴量を示す説明図Explanatory drawing which shows the image feature-value after filtering 自己組織化マップによるクラス分けを示す説明図Explanatory diagram showing classification by self-organizing map 統合画像の評価を示す説明図Explanatory drawing showing evaluation of integrated image 入れ替え選択の説明図Illustration of replacement selection 逐次学習の説明図Illustration of sequential learning 全体の処理の流れを示す説明図Explanatory diagram showing the overall process flow 処理例を示す説明図Explanatory drawing showing an example of processing

Explanation of symbols

１画像認識システム
２認識処理部
３統合部
４学習部
５認識器
６データベース部
７データベース管理部
１０認識器評価部
１１入替選択部
１２逐次学習部
ＤＢ１認識器データベース
ＤＢ２教師データベース DESCRIPTION OF SYMBOLS 1 Image recognition system 2 Recognition processing part 3 Integration part 4 Learning part 5 Recognizer 6 Database part 7 Database management part 10 Recognizer evaluation part 11 Replacement selection part 12 Sequential learning part DB1 Recognizer database DB2 Teacher database

Claims

An image recognition system for recognizing image data using a recognizer,
A database unit that classifies and holds recognition processing acquired by learning and teacher information used for learning for each class;
An image recognition system comprising: a learning update unit that evaluates the recognizer using teacher data for each class and adaptively learns and updates the recognizer.

In the database section above,
The image according to claim 1, further comprising: a management unit that clusters learning images in a multi-dimensional feature amount space obtained from input data, classifies the learning images for each class, and sets attribute information for each classified class. Recognition system.

The image recognition system according to claim 2, wherein the clustering is updated online by autonomous learning.

The image recognition system according to claim 2, wherein the clustering is performed using a self-organizing map.

The image recognition system according to claim 2, wherein the attribute information includes information corresponding to a distance from the class in the feature amount space.

3. The image recognition system according to claim 2, wherein a learning sample at the time of recognition processing is extracted from a plurality of classes having a short distance in the feature amount space, and average recognition processing is performed on a plurality of scenes.

A plurality of the above recognizers are provided.
In the learning update part,
A sequential learning unit that sequentially learns the integration results of the plurality of recognizers and creates a new recognizer;
A replacement selection unit that obtains an optimal combination from the class recognizers based on the attribute information including the recognizer created by the sequential learning and selectively replaces a plurality of recognizers currently used. The image recognition system according to any one of claims 2 to 6, wherein