JP2005032210A

JP2005032210A - Method for effectively using spatio-temporal image recomposition to improve scene classification

Info

Publication number: JP2005032210A
Application number: JP2003358021A
Authority: JP
Inventors: Jiebo Luo; ルオジエボ; Robert T Gray; テリーグレイロバート; Matthew R Boutell; アールブーテルマシュー
Original assignee: Eastman Kodak Co
Current assignee: Eastman Kodak Co
Priority date: 2002-10-31
Filing date: 2003-10-17
Publication date: 2005-02-03

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method of using many recomposed modifications of an input digital image for the purpose of improving scene classification. <P>SOLUTION: A method for improving scene classification of a digital image comprises the steps of; (a) providing an image; (b) systematically recomposing the image to generate an expanded set of images; and (c) using a classifier and the expanded set of images to determine a scene classification for the image, whereby the expanded set of images provides at least one of an improved classifier and improved classification result. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、一般的にデジタル画像処理の分野に関し、より詳細には、場面分類を改善するために入力デジタル画像の多数の再構成された変形を使用する方法に関する。 The present invention relates generally to the field of digital image processing, and more particularly to a method of using multiple reconstructed variants of an input digital image to improve scene classification.

任意の画像を意味のある分類（例えば、夕焼け、ピクニック、砂浜など）で自動的に決定することは困難である。最近では多くの研究が行われており、また、様々なクラシファイヤー及び特徴のセットが提案されている。そのようなシステムにおける最も共通の設計は、低レベルの特徴（例えば、色、テキスチャなど）及び統計的なパターン認識技術を使用している。かかるシステムはトレーニングセット（例えば、非特許文献１を参照。）からの学習パターンに依存する見本に基づく。かかる見本に基づくシステムは、種類の特徴が人間の知識を使用して直接的に特定される、モデルに基づくシステムか、又はモデルが学習されるハイブリッドシステムと対照的である。 It is difficult to automatically determine any image with meaningful classification (eg sunset, picnic, sandy beach, etc.). Much research has been done recently, and various classifiers and feature sets have been proposed. The most common designs in such systems use low level features (eg, color, texture, etc.) and statistical pattern recognition techniques. Such a system is based on a sample that depends on a learning pattern from a training set (see, for example, NPL 1). Such a sample-based system is in contrast to a model-based system in which the type of features are identified directly using human knowledge or a hybrid system in which the model is learned.

意味的のある場面分類は、内容に基づいた画像構成及び検索（ＣＢＩＲ）の性能を改善することができる。多数の現行のＣＢＩＲシステムは、類似性が色又はテキスチャの特性によってのみしばしば確定される場合において、ユーザが画像を特定し、その特定した画像に似ている画像を探索することを可能にする。このいわゆる「具体例による問合せ」は、不適当であることが頻繁に証明されている。場面のカテゴリーを最初から知ることは、探索空間を劇的に狭くすることを援助する。例えば、パーティー場面の構成を知ることは、「メアリーの誕生パーティーの写真を見つける」という質問に答える、我々の探索においてパーティ場面だけを考慮することを可能にする。このように、探索時間は減じられ、ヒット率はより高く、さらに、誤った警告率がより低いと予想される。 Semantic scene classification can improve the performance of content-based image composition and retrieval (CBIR). Many current CBIR systems allow a user to identify an image and search for images that are similar to the identified image, where similarity is often determined only by color or texture characteristics. This so-called “query by example” has often proved inadequate. Knowing the scene category from the beginning helps to dramatically narrow the search space. For example, knowing the composition of the party scene allows us to consider only the party scene in our search, answering the question “Find Mary's birthday party photo”. Thus, the search time is reduced, the hit rate is higher, and the false alarm rate is expected to be lower.

現行の場面分類システムは、拘束がない画像セットの制限のある成功を享受する。何がこの理由であるかというと、主な理由は、最も意味のある分類内に見られる信じられないほどの種類の画像であるように思える。見本に基づくシステムは、トレーニングセットにおけるそのような変化を説明するに違いない。たとえ何百もの見本は、いくつかの種類で固有なすべての変動性を必ずしも捕らえない。具体例として夕焼け画像の分類を挙げた場合、夕焼けの様々な段階で撮像された夕焼けの画像は、太陽が地平線に接近するにつれて色はより光り輝くようになり、次いで、時間が経つにつれて退色する傾向があるので、色が非常に異なりうる。さらに、構成は、地平線又は空のみを包含するか？地平線に関して太陽の位置はどこか？太陽は中心にあるか、又は一方にオフセットされているか？によって、カメラの視野の部分により一様ではない。 Current scene classification systems enjoy the limited success of unconstrained image sets. What seems to be the reason for this seems to be the incredible kind of image found in the most meaningful classification. A sample-based system must account for such changes in the training set. Hundreds of samples do not necessarily capture all the variability inherent in some types. As a specific example, when sunset images are classified, sunset images taken at various stages of sunset tend to glow more brightly as the sun approaches the horizon, and then tend to fade over time. Because there is a color can be very different. Furthermore, does the configuration include only the horizon or the sky? Where is the sun in relation to the horizon? Is the sun in the center or offset to one side? Is not more uniform in the field of view of the camera.

見本に基づく分類における限定された成功の第二の理由は、画像は、場面にそれほど原型でないように見えて、したがって、トレーニング見本のうちのどれとも十分に一致しない、過度の前景領域又は混乱させる前景領域を頻繁に含んでいることである。例えば、図１は、混乱させる前景領域を備える４つの場面（ａ）乃至（ｄ）を示す。これは、典型的なカメラの使用者が、専門の写真家よりも構成及び光に対してそれほど注意を払わない際に、カメラ使用者の画像において特に真実である。したがって、カメラ使用者の画像は、この領域で使用される場合に多くの既存システムにおける高性能（コレル（Ｃｏｒｅｌ）データベースなどの専門的に撮像されたストックフォトライブラリで）を引き起こす、多大な変動性を含む。 A second reason for limited success in sample-based classification is that the image appears to be less prototypical to the scene, and therefore does not match well with any of the training samples, excessive foreground regions or confusion It often includes the foreground area. For example, FIG. 1 shows four scenes (a) through (d) with a foreground area to be confused. This is particularly true in camera user images when a typical camera user pays less attention to composition and light than a professional photographer. Thus, the camera user's images cause a great deal of variability when used in this area, causing high performance in many existing systems (with professionally photographed stock photo libraries such as the Corel database) including.

結果的に、画像分類において上で記載された問題を克服する方法を提供する必要がある。これらの問題は、空間的な画像再構成の概念の導入により提案され、不適当な構成（つまり、前景物）の衝撃を最小限にするように設計され、さらにシミュレートされたか又は効果的で一時的な画像再構成の衝撃を最小限にするように設計され、時間とともに発生する変色作用を最小限にするように設計される。 Consequently, there is a need to provide a method that overcomes the problems described above in image classification. These problems were proposed by the introduction of the concept of spatial image reconstruction, designed to minimize the impact of improper configurations (ie, foreground objects), and are also simulated or effective. Designed to minimize the impact of temporary image reconstruction and to minimize the color change effects that occur over time.

このアプローチは他の領域における過去の成功に支持される。顔の認識及び検出では、研究者は、幾何学的な変化を処理するためにトレーニングで顔の混乱された変形を使用した（例えば、非特許文献２参照。）。これは、再サンプリングすること、又はブートストラッピングに関係がある。加えて、バッギング（ｂａｇｇｉｎｇ）（積極的なブートストラップ）は、異なる構成要素のクラシファイヤーをトレーニングするためにトレーニングセットの多数の変形を使用し、最終の分類決定は個々の構成要素におけるクラシファイヤーの選択に基づく（例えば、非特許文献３参照。）。
Ａ．Ｖａｉｌａｙａ，Ｍ．Ｆｉｇｕｅｉｒｅｄｏ，Ａ．Ｊａｉｎ，Ｈ．Ｊ．Ｚｈａｎｇ著、「Ｃｏｎｔｅｎｔ−ｂａｓｅｄｈｉｅｒａｒｃｈｉｃａｌｃｌａｓｓｉｆｉｃａｔｉｏｎｏｆｖａｃａｔｉｏｎｉｍａｇｅｓ」、ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＭｕｌｔｉｍｅｄｉａＣｏｍｐｕｔｉｎｇａｎｄＳｙｓｔｅｍｓ，１９９９Ｈ．Ｒｏｗｌｅｙ，Ｓ．Ｂａｌｕｊａ，Ｔ．Ｋａｎａｄｅ著、「Ｒｏｔａｔｉｏｎｉｎｖａｒｉａｎｔｎｅｕｒａｌｎｅｔｗｏｒｋ−ｂａｓｅｄｆａｃｅｄｅｔｅｃｔｉｏｎ」、ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，１９９８Ｒ．Ｏ．Ｄｕｄａ，Ｐ．Ｅ．Ｈａｒｔ，Ｄ．Ｇ．Ｓｔｏｃｋ著、「ＰａｔｔｅｒｎＣｌａｓｓｉｆｉｃａｔｉｏｎ」、ＪｏｈｎＷｉｌｅｙ＆Ｓｏｎｓ、ニューヨーク、２００１年、ｐｐ．４７５−４７６Ｒ．Ｐ．Ｗ．Ｄｕｉｎ著、「Ｔｈｅｃｏｍｂｉｎｉｎｇｃｌａｓｓｉｆｉｅｒ：Ｔｏｔｒａｉｎｏｒｎｏｔｔｏｔｒａｉｎ？」、ＰｒｏｃｅｅｄｉｎｇｓｏｆＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，２００２Ｂ．Ｓｃｈｏｌｋｏｐｆ，Ｃ．Ｂｕｒｇｅｓ，ａｎｄＡ．Ｓｍｏｌａ著、「ＡｄｖａｎｃｅｓｉｎＫｅｒｎｅｌＭｅｔｈｏｄｓ：ＳｕｐｐｏｒｔＶｅｃｔｏｒＬｅａｒｎｉｎｇ」ＭＩＴＰｒｅｓｓ，Ｃａｍｂｒｉｄｇｅ，ＭＡ，１９９９，ｐｐ．２６３−２６６Ｙ．ＷａｎｇａｎｄＨ．Ｚｈａｎｇ著、「Ｃｏｎｔｅｎｔ−ｂａｓｅｄｉｍａｇｅｏｒｉｅｎｔａｔｉｏｎｄｅｔｅｃｔｉｏｎｗｉｔｈｓｕｐｐｏｒｔｖｅｃｔｏｒｍａｃｈｉｎｅｓ」、ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥＷｏｒｋｓｈｏｐｏｎＣｏｎｔｅｎｔ−ＢａｓｅｄＡｃｃｅｓｓｏｆＩｍａｇｅａｎｄＶｉｄｅｏＬｉｂｒａｒｉｅｓ，２００１ This approach is supported by past successes in other areas. In face recognition and detection, researchers used confused deformations of faces in training to handle geometric changes (see, for example, Non-Patent Document 2). This is related to resampling or bootstrapping. In addition, bagging (aggressive bootstrapping) uses multiple variants of the training set to train different component classifiers, and the final classification decision determines the classifiers in the individual components. Based on the selection (for example, see Non-Patent Document 3).
A. Vailaya, M.M. Figueiredo, A.A. Jain, H.J. J. et al. Zhang, "Content-based hierarchical classification of vacation images", Proceedings of IEEE International Conference on Multimedia Computing and Systems, 1999. H. Rowley, S.M. Baluja, T.A. Kanade, "Rotation invariant neural network-based face detection", Proceedings of IEEE International Conferencing on Computer Vision and Pattern Recognition, 1998. R. O. Duda, P.M. E. Hart, D.D. G. By Stock, “Pattern Classification”, John Wiley & Sons, New York, 2001, pp. 475-476 R. P. W. Duin, "The combining classifier: To train or not to train?", Proceedings of International Conference on Pattern Recognition, 2002. B. Scholkopf, C.I. Burges, and A.A. Smol, "Advanceds in Kernel Methods: Support Vector Learning", MIT Press, Cambridge, MA, 1999, pp. 263-266 Y. Wang and H.W. Zhang, "Content-based image orientation detection with support vector machines," Proceedings of IEEE Workshop on Content-Based Access of Images 200.

本発明の目的は、上に記載の一つ以上の問題を克服することを導き、場面分類を改善するために入力デジタル画像の多数の再構成された変形を使用する方法を提供することである。 It is an object of the present invention to provide a method that uses multiple reconstructed variants of an input digital image to improve scene classification, leading to overcoming one or more of the problems described above. .

本発明の一つの態様によると、（ａ）画像の提供段階と、（ｂ）画像の拡張されたセットを生成するために画像の系統的な再構成段階と、及び（ｃ）画像の拡張されたセットが少なくとも一つの改善されたクラシファイヤー及び改善された分類結果を提供する、画像における画像分類を決定するためにクラシファイヤーと画像の拡張されたセットを使用する段階とからなるデジタル画像の画像分類を改善する方法を提供する。 According to one aspect of the invention, (a) providing an image, (b) systematic reconstruction of images to generate an expanded set of images, and (c) image expansion. Using a classifier and an expanded set of images to determine image classification in the image, wherein the set provides at least one improved classifier and improved classification results Provide a way to improve classification.

本発明は、ロバストクラシファイヤーを引き出すトレーニング見本の拡張したセットを生成するために見本画像の再構成された変形を系統的に生成する方法か、又はロバストな画像分類結果を引き出す同じ顕著な特性を備えた拡張したテスト画像セットを生成するためにテストする入力デジタル画像の再構成された変形を系統的に生成する方法のいずれか（若しくは両者）の方法を提供する。これは、トレーニング見本の多様性を増加させて、見本と画像のよりよい一致を容認し、よりロバストな画像分類を得る方法の提供という長所を有する。 The present invention provides a method for systematically generating a reconstructed variant of a sample image to generate an expanded set of training samples that derive a robust classifier, or the same salient properties that derive a robust image classification result. Either (or both) methods are provided for systematically generating a reconstructed variant of an input digital image to be tested to generate an expanded set of test images. This has the advantage of providing a way to increase the diversity of training samples, accept a better match between samples and images, and obtain a more robust image classification.

本発明の効果は、場面分類を改善するために入力デジタル画像の多数の再構成された変形を使用する方法を提供することができる。 The advantages of the present invention can provide a method that uses multiple reconstructed variants of the input digital image to improve scene classification.

本発明は、プログラム化されたデジタルコンピュータで実行するように記載されるだろう。デジタル画像処理及びソフトウェアプログラミングの当業者は、下記の記載から本発明を実行するようにコンピューターをプログラムすることができるだろう。本発明は、機械可読式コンピューターコードを保持する磁気又は光記憶媒体のようなコンピューターが読取り可能な記憶媒体を有するコンピュータプログラム製品で具体化されるかもしれない。代替として、本発明はハードウェア又はファームウェアで実行されてよい。 The invention will be described as executing on a programmed digital computer. Those skilled in the art of digital image processing and software programming will be able to program a computer to carry out the invention from the description below. The invention may be embodied in a computer program product having a computer-readable storage medium, such as a magnetic or optical storage medium that holds machine-readable computer code. Alternatively, the present invention may be implemented in hardware or firmware.

意味のある場面分類における高性能に対する大きな障害は、各種類における画像の色及び構成の両方に関する、広大な変化である。特に種類が多くの変化を含んでいる場合、見本に基づいたシステムのために十分なトレーニングデータを得ることは困難な作業であるかもしれない。高品質で原型画像の多数を手動で集めることはテーマで保存されたフォトライブラリの助けを借りてさえ時間を消費する。したがって、すべて利用可能なトレーニングデータを効率的に使用することが重要である。 A major obstacle to high performance in meaningful scene classification is the vast change in both the color and composition of images in each type. Obtaining sufficient training data for a sample-based system may be a difficult task, especially if the type contains many changes. Manually collecting a large number of high-quality and original images is time consuming even with the help of a theme-saved photo library. Therefore, it is important to use all available training data efficiently.

さらに、画像がその色及びその構成の両者において、その種類の見本と一致する場合、トレーニング見本のセットを備えたテスト画像の最良の一致が発生する。しかしながら、テスト画像はトレーニングセットに存在しない変化を含んでよい。一致の程度は、写真家が選択する撮像する画像（画像構成に影響する）及び撮像する時間（時間の経過で場面の照らしの変化により潜在的に画像の色に影響する）によって影響される。もし場面を「再現」することが可能ならば、種類において、より原型の色及び構成を備えた画像を得ることを試みることができるだろう。例えば、図２（ａ）乃至（ｄ）を参照するに、オリジナル場面（図２（ａ））は、クロップされてリサイズされた（図２（ｃ））、顕著なサブ領域を含む。最終的に、図２（ｄ）において、照らしのシフトはその後発生する日没をシミュレートして適用される。どのようにして場面を「再現」できるのであろうか？換言すると、どのようにして、任意の画像を原型の見本と良好に一致するであろう画像に変換できるのであろうか？
本発明によると、効果的な空間及び一時的な再構成と呼ばれる概念は上の問題を提示するために使用される。一般的に、画像の再構成は、空間的な再構成及び色構成を含む、同一画像の改変された変形を系統的に創生する、プロセスとして定義される。空間的な再構成（反映及びクロップ画像）及び効果的である（シミュレートされた）一時的な再構成（画像色のシフト）の異なるタイプ並びに使用は、表１に表され、下記により詳細に説明される。それらは、トレーニングと、テストと、及び両者での再構成として分類される。数多のタイプと使用の組み合わせは、かかる再構成がトレーニング見本の完全性を破壊しないことを保証するために目視検査を必要とする（例えば、積極的なクロップは、ピクチャーの主要な被写体の損失に帰着するかもしれない。）。 Furthermore, if the image matches that type of sample in both its color and its configuration, the best match of the test image with the set of training samples occurs. However, the test image may include changes that are not present in the training set. The degree of matching is affected by the image picked by the photographer (which affects the image composition) and the time taken (potentially affecting the color of the image due to changes in scene illumination over time). If it is possible to “reproduce” the scene, it would be possible to attempt to obtain an image with a more original color and composition in kind. For example, referring to FIGS. 2 (a) to 2 (d), the original scene (FIG. 2 (a)) contains the salient sub-regions that were cropped and resized (FIG. 2 (c)). Finally, in FIG. 2 (d), the illumination shift is applied simulating the subsequent sunset. How can you "reproduce" a scene? In other words, how can any image be converted to an image that will match the original sample well?
According to the present invention, a concept called effective space and temporal reconstruction is used to present the above problem. In general, image reconstruction is defined as a process that systematically creates modified variations of the same image, including spatial reconstruction and color composition. The different types and uses of spatial reconstruction (reflection and cropped images) and effective (simulated) temporary reconstruction (image color shift) are represented in Table 1 and are described in more detail below. Explained. They are categorized as training, testing, and reconstruction in both. Numerous types and combinations of use require visual inspection to ensure that such reconstruction does not destroy the integrity of the training sample (eg, aggressive cropping is a loss of the main subject of the picture You may end up in.)

トレーニングでの再構成
トレーニングデータの限定されたサイズセットで再構成を用いることは、見本のより豊富で、より多様なセットを生じうる。目標は、各画像を視覚的に検査する必要なしに、これらの見本を得ることである。１つの技術は垂直軸に関する各画像を反射することで、それによって、見本の数を倍にする。例えば、図３（ａ）乃至３（ｃ）に示されるように、オリジナル画像（３（ｂ））は、水平な反映（３（ａ））又はクロップ（３（ｃ）に示されるように底部から２０％）によって変換される。明らかに、新規画像の分類は変っておらず、すなわち、画像の左側の太陽の夕焼け画像が右側に太陽を移動させる一方、画像は有効な夕焼け画像のままである。

Reconstruction with training Using reconstruction with a limited size set of training data can result in a richer and more diverse set of samples. The goal is to obtain these samples without having to visually inspect each image. One technique is to reflect each image about the vertical axis, thereby doubling the number of samples. For example, as shown in FIGS. 3 (a) to 3 (c), the original image (3 (b)) can be viewed in the horizontal reflection (3 (a)) or crop (3 (c) as shown in FIG. To 20%). Clearly, the classification of the new image has not changed, i.e. the sunset image of the sun on the left side of the image moves the sun to the right, while the image remains a valid sunset image.

別の技術は画像のエッジをクロップすることである。画像の顕著な部分が中心にあり、不完全な構成は周囲の混乱によって引き起こされると仮定される。画像の各側から順番にクロップすることは、同一分類の４つの新しい画像を生ずる。当然のこととして、画像の顕著な部分を失いたくはないが、例えば、１０％のわずかな保守的なクロップにおいて、アルゴリズムによる分類は変わるかもしれないが、場面の意味のある分類が変化することはほとんどありそうもない。 Another technique is to crop the edges of the image. It is assumed that the prominent part of the image is at the center and the incomplete composition is caused by the surrounding mess. Cropping sequentially from each side of the image results in four new images of the same classification. Naturally, you don't want to lose a noticeable part of the image, but for example, in a small conservative crop of 10%, the algorithmic classification may change, but the meaningful classification of the scene changes. Is unlikely.

テストでの再構成
トレーニングセットの再構成はさらに見本を生じるが、テスト画像を再構成して新たに各々を分類し、再構成された画像はオリジナル画像の複合的な分類を生じる。空間的な再構成に関すると、見本に対してテスト画像の特徴を良好に一致する目的において、画像のエッジはクロップできる。そのような一致を獲得するために、より積極的に（図２に示されるように）クロップすることが必要であるかもしれない。しかしながら、クラシファイヤーが反映画像を用いてトレーニングする場合、クラシファイヤーに既に組み込まれた対称によりテスト画像を反映する必要はない。例えば、１−ＮＮクラシファイヤーを使用した場合、テスト画像の特徴ベクターＴは、最も近い例のベクターＥから、ある距離で位置するだろう。反映された画像Ｅ及びＴ、Ｅ´及びＴ´のベクターをそれぞれ呼び出す。特徴の対称性により、ｄ（Ｅ、Ｔ）＝ｄ（Ｅ´、Ｔ´）であり、Ｔ´を余分にする。 Reconstruction in the test Reconstruction of the training set gives a further example, but reconstructs the test image and classifies each new, and the reconstructed image yields a composite classification of the original image. With respect to spatial reconstruction, the edges of the image can be cropped for the purpose of better matching the test image features to the sample. To obtain such a match, it may be necessary to crop more aggressively (as shown in FIG. 2). However, if the classifier trains with the reflected image, it is not necessary to reflect the test image due to the symmetry already built into the classifier. For example, if a 1-NN classifier is used, the test image feature vector T would be located at a distance from the nearest example vector E. Call the vectors of the reflected images E and T, E ′ and T ′, respectively. Due to the symmetry of the feature, d (E, T) = d (E ′, T ′), and T ′ is redundant.

画像のいくつかの種類は、その画像種類の世界における色分配で多大な変化を含んでおり、テスト画像の全体的な色を変えることは、トレーニング見本との良好な一致を適切に生じるかもしれない。実施例として夕焼け画像の分類を用いて、早い段階の夕焼けと後の段階の夕焼けは、色の同一の空間的な分配（暗い前景上の明るい空）を有するかもしれないが、しかし、早い段階の全体の見た感じは、場面の照らしにおける色の変化により、より冷淡である。より暖色側へ向かい輝度（赤−青）軸に沿って色を人為的に変更することによって、我々は、後に撮像する画像の見た目をシミュレートすることができ、我々は、これを照らしのシフトにおける効果的な一時的な再構成と称する。例えば、図４（ａ）乃至４（ｆ）に示されるように、ボタンが写真停止の０．４に等しい場合、一時的な再構成は、−６ボタン（図４（ａ））から開始して＋９ボタン（図４（ｆ））で終了する、３ボタンの増分における一連の照らしのシフトを有する。同様にして、場面における光量内の変化は、輝度軸に沿った変化を使用して処理できる。他の軸に沿った色の変化は、別の問題の領域で適用してよい。 Some types of images contain a great deal of change in color distribution in the image type world, and changing the overall color of the test image may properly produce a good match with the training sample. Absent. Using the sunset image classification as an example, the early sunset and the later sunset may have the same spatial distribution of colors (bright sky over dark foreground), but early The overall look of is more chilly due to the color change in the lighting of the scene. By artificially changing the color along the luminance (red-blue) axis towards the warmer side, we can simulate the look of the image that will be captured later, and we will shift this to illuminate This is referred to as effective temporary reconstruction. For example, as shown in FIGS. 4 (a) to 4 (f), if the button is equal to a photo stop of 0.4, the temporary reconstruction starts with the -6 button (FIG. 4 (a)). With a series of illumination shifts in 3 button increments, ending with the +9 button (FIG. 4 (f)). Similarly, changes in light intensity in a scene can be handled using changes along the luminance axis. Color changes along other axes may be applied in other problem areas.

空間的又は一時的な再構成のいずれかを使用して、クラシファイヤーはオリジナル画像と同一種類で新規の再構成画像を標識してもしなくてもよい。再構成された画像の分類が異なる場合をどのように判断するのであろうか？ドゥイン（Ｄｕｉｎ）は、固定化された及びトレーニングされた、２タイプのコンバイナーを議論する（Ｒ．Ｐ．Ｗ．Ｄｕｉｎ著、「Ｔｈｅｃｏｍｂｉｎｉｎｇｃｌａｓｓｉｆｉｅｒ：Ｔｏｔｒａｉｎｏｒｎｏｔｔｏｔｒａｉｎ？」、ＰｒｏｃｅｅｄｉｎｇｓｏｆＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎｐａｔｔｅｒｎｒｅｃｏｇｎｉｔｉｏｎ，２００２を参照のこと）。固定された組み合わせ規則は、スキームの選択及びスコアの合計又は平均の使用を含む。トレーニングされたコンバイナーは、単一のスコアにスコアをマッピングするための第二のクラシファイヤーである。２つの考慮は、トレーニングデータの利用性及び基礎のクラシファイヤーがトレーニングされている度合いを使用する選択に影響する。ドゥインは、トレーニング中のクラシファイヤーがトレーニングされたコンバイナーから利益を得ることができる一方、過度なトレーニング（例えば、サポートベクターマシン（ＳＶＭｓ））はできないことを提案する。今日の研究において、これは、その事例（例えば、第二段階ＳＶＭは支援しない）であると分かった。 Using either spatial or temporal reconstruction, the classifier may or may not label the new reconstructed image with the same type as the original image. How do you determine when the reconstructed images have different classifications? Duin discusses two types of combiners, fixed and trained (RPw Duin, “The combining classifier: To train or not to train?”, Proceedings of International Conf. on pattern recognition, 2002). Fixed combination rules include scheme selection and use of sum or average of scores. A trained combiner is a second classifier for mapping scores to a single score. Two considerations affect the availability of training data and the choice to use the degree to which the underlying classifier is trained. Duin proposes that the classifier being trained can benefit from the trained combiner while not being able to overtrain (eg, support vector machines (SVMs)). In today's studies, this has been found to be the case (eg, second stage SVM does not support).

２種類の問題において、ｒ再構成の関心のある固定されたコンバイナーは、ｍ番目のオーダー統計、例えば、最大（ｍ＝１）、２番目に大きい（ｍ＝２）、又は中間（ｍ＝ｒ／２）を使用する。パラメータｍの変化は、操作曲線におけるクラシファイヤーの位置を移動する。小さなｍは、より多くの誤りのポジを犠牲にしてより大きなリコールを与え、積極的な手法で画像を積極的に分類する。ｍの選択は、明らかに適用に依存するであろう。 In two types of problems, the fixed combiner of interest for r reconstruction is the m th order statistic, eg maximum (m = 1), second largest (m = 2), or intermediate (m = r / 2) is used. A change in parameter m moves the position of the classifier in the operating curve. A small m gives a larger recall at the expense of more error positives and actively classifies images in a proactive way. The choice of m will obviously depend on the application.

さらに、スコアは、最も一貫した画像分類を見つけるような手法で組み合わせできる。例えば、スキームのボートは組合せのために使用できる。これは望ましく、同一の顕著な場面内容を備える多数のわずかに変更された再構成画像に基づく分類は、オリジナル画像だけに基づく分類よりもよりロバストであるべきである。オリジナル画像に基づく単一の分類は、いくらかの統計上の不規則性（例えば、前景の混乱又は見本セットを伴う不十分な空間的な登録）により不正確であり、さらに多くの再構成画像は正確に分類され、大多数の規則は不規則性を修正するだろう。 Furthermore, the scores can be combined in a way that finds the most consistent image classification. For example, scheme boats can be used for combination. This is desirable, and classification based on a number of slightly modified reconstructed images with the same salient scene content should be more robust than classification based only on the original image. A single classification based on the original image is inaccurate due to some statistical irregularities (eg, insufficient foreground confusion with foreground confusion or sample set), and many more reconstructed images Classified correctly, the majority of rules will correct irregularities.

トレーニング及びテストの両者における再構成
数多の適用において、再構成は、トレーニング及びテストデータの両者において使用されてよい。各々が異なる目的に役立つために、それらは容易に組み合わせされてよい。両タイプの再構成を使用する必要を質問するかもしれない。すなわち、もしトレーニング見本の十分に豊富なセットを有するのであれば、テスト画像の再構成はなぜ必要だろうか。トレーニング及びテストの両方での再構成を使用する必要は実際的である。トレーニングデータは開始するように十分に多様であるか、又はトレーニング見本の再構成が完全にあらゆる変化を創生しており、完全に画像空間を満たすという保証はない。 Reconstruction in both training and testing In many applications, reconstruction may be used in both training and test data. They can be easily combined so that each serves a different purpose. You may ask if you need to use both types of reconstruction. That is, why do we need to reconstruct test images if we have a sufficiently rich set of training samples? The need to use reconstruction in both training and testing is practical. The training data is sufficiently diverse to start, or there is no guarantee that the reconstruction of the training sample has completely created any changes and completely fills the image space.

関連した問題は、トレーニング画像の再構成と追加的な独特の見本の獲得との間の選択である。良好なトレーニングデータの欠如に関する初期の議論及びデータを集めるのに必要な時間は別にして、さらに有効なデータ特性の問題がある。原型の見本のわずかなセットの再構成は、もっと使用するが質が劣っている見本よりもさらに望ましいだろう。 A related issue is the choice between training image reconstruction and acquisition of additional unique samples. Apart from the initial discussion on the lack of good training data and the time required to collect the data, there are more effective data property issues. Reconstructing a small set of prototype samples would be more desirable than a more used but poor quality sample.

加えて、トレーニングでの再構成の上部におけるテストでの再構成の使用は、潜在的に高い誤りの警告率を犠牲にしているが、所望であれば、確かにリコールをブーストする方法である。 In addition, the use of reconstruction in the test at the top of the reconstruction in training is a way to certainly boost recall if desired, at the expense of potentially high false alarm rates.

最後の問題は、より積極的なアプローチが、テストデータを再構成する必要を最小限にするためにトレーニングデータを再構成するのに使用されるかどうかである。積極的な再構成が画像から顕著な内容を失わことができるので、拡張したトレーニングセットの完全性が圧縮されないことを保証するべきであり、その議論はこれを正確に行なうための技術となる。 The last question is whether a more aggressive approach is used to reconstruct the training data to minimize the need to reconstruct the test data. Since aggressive reconstruction can lose significant content from the image, it should be ensured that the integrity of the expanded training set is not compressed, and the discussion is a technique for doing this accurately.

トレーニングでの準管理された再構成
トレーニングセットで保守的な再構成を使用する我々の目的は、完全に管理されていない処理をなすことである。しかしながら、さらにトレーニングデータが所望であり、多大なクロップ又は著しい色のシフトなどの積極的な再構成が使用される場合、すべての再構成された画像を検査する反対の極に向かう必要がないように、トレーニング方法論が必要である。 Semi-managed reconfiguration in training Our purpose of using conservative reconfiguration in a training set is to do a process that is not fully managed. However, if more training data is desired and aggressive reconstructions such as large crops or significant color shifts are used, there is no need to go to the opposite pole to inspect all the reconstructed images. In addition, a training methodology is required.

明白に、いくつかの積極的な再構成が画像の種類に対してある場面内容の特性を移動できるので、トレーニングデータにこれらの画像を加えることに対する正確なアプローチは、再構成されたトレーニング画像の各々を視覚的に検査することだろう。そうすることは退屈で面倒になりうる。注意を必要とする、再構成された画像のサブセットを検査するだけのことが、より効率的だろう。再構成された画像を映すために、オリジナルのトレーニング画像を用いてクラシファイヤーをトレーニングでき、次いで、クラシファイヤーを用いてトレーニング画像の再構成された形態を分類する。再構成が画像から顕著な場面内容を失わせる場合を決定するために、クラシファイヤーを失敗する（又は低い確信で通過する）再構成された画像だけを視覚的に評価する必要がある。次いで、そのような再構成された画像は削除される一方、残存の再構成画像は、その豊富さを改善するために拡張されたトレーニングセットに加えられる。これは、管理されない手法で少数の再構成された画像の生成と、完全に管理された方法で多数の再構成された画像の生成との間の好ましい交換である。 Obviously, since some aggressive reconstructions can move certain scene content characteristics to image types, the exact approach to adding these images to the training data is the reconstructed training image Each would be visually inspected. Doing so can be tedious and cumbersome. It would be more efficient to only examine a subset of the reconstructed image that requires attention. To reflect the reconstructed image, the classifier can be trained using the original training image, and the classifier is then used to classify the reconstructed form of the training image. Only the reconstructed images that fail the classifier (or pass with low confidence) need to be visually evaluated in order to determine when the reconstruction causes significant scene content to be lost from the images. Such reconstructed images are then deleted, while the remaining reconstructed images are added to the expanded training set to improve its richness. This is a favorable exchange between the generation of a small number of reconstructed images in an unmanaged manner and the generation of a large number of reconstructed images in a fully managed manner.

次ぎに、本発明の好ましい夕焼けの検出と、野外景色の分類と、及び自動的な画像配位検出との３つの実施態様がそれぞれ記載される。 Next, the three embodiments of the present invention, preferred sunset detection, outdoor scene classification, and automatic image configuration detection are each described.

夕焼けの検出
特許文献１によって記載された前述の階層的画像分類スキームで、夕焼けは容易に山脈／森林景色と分離された。色は夕焼けがそれらの光り輝く暖色によって認識可能であるという直観を確認して、エッジ方向が単独のように、他の特徴よりもかかる問題においてより顕著に感じられた。さらに、空間的な情報は、砂漠の岩形成などのような暖色を含んでいる他の景色と夕焼けを識別するために組込まれるべきである。したがって、空間的な色の瞬間は、７ｘ７グリッドを使用し、かつ、Ｌｕｖ変形された画像の各バンドの平均値及び変化を計算して、画像を４９領域に分割して使用されてもよい。これは、４９ｘ２ｘ３＝２９４の特徴を生じる。 Sunset Detection With the hierarchical image classification scheme described by US Pat. Confirming the intuition that sunsets are recognizable by their glowing warm colors, the edge direction was felt more prominently in such problems than other features, as the edge direction alone. In addition, spatial information should be incorporated to distinguish sunsets from other landscapes that contain warm colors such as desert rock formations. Thus, spatial color moments may be used by dividing the image into 49 regions using a 7x7 grid and calculating the average and change of each band of the Luv transformed image. This yields 49x2x3 = 294 features.

サポートベクターマシン（ＳＶＭ）は、同様の問題（例えば、Ｂ．Ｓｃｈｏｌｋｏｐｆ，Ｃ．Ｂｕｒｇｅｓ，ａｎｄＡ．Ｓｍｏｌａ，ＡｄｖａｎｃｅｓｉｎＫｅｒｎｅｌＭｅｔｈｏｄｓ：ＳｕｐｐｏｒｔＶｅｃｔｏｒＬｅａｒｎｉｎｇ，ＭＩＴＰｒｅｓｓ，Ｃａｍｂｒｉｄｇｅ，ＭＡ，１９９９，ｐｐ．２６３−２６６，及びＹ．ＷａｎｇａｎｄＨ．Ｚｈａｎｇ， “Ｃｏｎｔｅｎｔ−ｂａｓｅｄｉｍａｇｅｏｒｉｅｎｔａｔｉｏｎｄｅｔｅｃｔｉｏｎｗｉｔｈｓｕｐｐｏｒｔｖｅｃｔｏｒｍａｃｈｉｎｅｓ，” ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥＷｏｒｋｓｈｏｐｏｎＣｏｎｔｅｎｔ−ＢａｓｅｄＡｃｃｅｓｓｏｆＩｍａｇｅａｎｄＶｉｄｅｏＬｉｂｒａｒｉｅｓ，２００１）においてラーニングベクタークオンタイザー（ＬｅａｒｎｉｎｇＶｅｃｔｏｒＱｕａｎｔｉｚｅｒｓ）（ＬＶＱ）（ＬｅａｒｎｉｎｇＶｅｃｔｏｒＱｕａｎｔｉｚｅｒｓ）などの他のクラシファイヤーよりも高性能であるために、クラシファイヤーとして好ましく使用される。特に、ＲＢＦスタイルクラシファイヤー（ＲＢＦ＝放射状の基礎的機能（ＲａｄｉａｌＢａｓｉｓＦｕｎｃｔｉｏｎ）、Ｗａｎｇ及びＺｈａｎｇを参照すること）を生成するガウスカーネル（Ｇａｕｓｓｉａｎｋｅｒｎｅｌ）が使用された。ＳＶＭは２クラスの問題のために設計されて、各テスト画像における実際の数を出力する。サインは分類で、確信の緩い手段として大きさを使用することができる。 Support vector machines (SVM) are similar problems (for example, B. Schokopf, C. Burges, and A. Smola, Advanceds in Kernel Methods: Support Vector Learning , MIT Press, 19Mandp. , And Y. Wang and H. Zhang, “Content-based image orientation detection with support vector machines,” Proceedings of IEE Workshop on Content-Based Accord. Tyzor to (Learning Vector Quantizers) (LVQ) for (Learning Vector Quantizers) outperforms other classifiers such as are preferably used as classifier. In particular, a Gaussian kernel that generates RBF style classifiers (see RBF = Radial Basis Function, Wang and Zhang) was used. SVM is designed for two classes of problems and outputs the actual number in each test image. Signs are classifications, and size can be used as a means of loose confidence.

セットがはるかに豊富だったので、トレーニングセットにおいて再合成を使用することは性能を著しく推測上増加させた。これは、制限のあるトレーニングセットを有する数多の作用を克服する。テストセットにおいて再構成を使用することは、打数及び誤りのポジ数の両方を増加させた。最終的に、トレーニングとテストの両者で再構成を使用することは、全体にわたって最良の結果を与えた。それらの結果は、異なる曲線における最適な操作ポイントに対応することを注意する。 Since the set was much richer, using resynthesis in the training set significantly increased the performance significantly. This overcomes a number of effects with a limited training set. Using reconstruction in the test set increased both the number of strokes and the number of false positives. Finally, using reconstruction in both training and testing gave the best overall results. Note that the results correspond to the optimal operating points on the different curves.

テスト画像で空間的な再構成を使用することは、かなり混乱する前景領域を備えた夕焼け画像を正確に分類する目標を達成した。例えば、図１に表示された画像は、ベースラインシステムによってすべて不正確に分類されたが、しかし、再構成が使用された（再構成によって得られた）場合は正確に分類された。右上の画像（ｂ）は、クロップすることによる再構成がどのように助けになることができるかのよい例である。画像における前景の広大で暗い水の領域をクロップすることは、ＳＶＭスコアを実質的に増大する。他の画像も同様に行ない、例えば、左の画像（ａ）の底部から底を２０％クリップすることは、水に映る混乱させる反射を除去する。 Using spatial reconstruction in the test image achieved the goal of accurately classifying sunset images with foreground regions that were rather confusing. For example, the images displayed in FIG. 1 were all incorrectly classified by the baseline system, but were correctly classified when reconstruction was used (obtained by reconstruction). The upper right image (b) is a good example of how reconstruction by cropping can help. Cropping the vast and dark water areas of the foreground in the image substantially increases the SVM score. Do the same for the other images, for example, clipping the bottom 20% from the bottom of the left image (a) will remove the confusing reflections in the water.

しかしながら、誤りのポジ画像の数はまた、リコールにおける増進を部分的に相殺して増加した。再構成によって引き起こされた典型的な誤りのポジは、図５ａ及び５ｂに示される。それら画像の各々は、クロップした場合に、見た目をより夕焼けに似せて画像を生成する、夕焼けの典型的でないパターン（例えば、夜景での複合的な明るい領域又は砂漠景色での空）を含む。 However, the number of false positive images also increased, partially offsetting the increase in recall. A typical error positive caused by reconstruction is shown in FIGS. 5a and 5b. Each of these images includes an atypical pattern of sunset (eg, complex bright areas in the night view or sky in the desert landscape) that, when cropped, produces an image that more closely resembles the sunset.

数多の夕焼け画像は、原型の構成を有するが、早い段階又は後の段階の夕焼けに対応する弱色を有する。場面の照らしの「暖まる」これらの画像をシフトし、それら画像を正確に分類させ、また、それら両方が図６において示される、多くの誤りのポジを導入する。 Many sunset images have an original configuration, but have a weak color corresponding to an early or later sunset. Shifting these images "warming up" in the scene illusions causes them to be classified correctly, and they both introduce many false positives as shown in FIG.

野外景色の分類
上に記載のシステムは、砂浜、夕焼け、落ち葉、野原、山脈及び都市の６種類の野外景色を識別するために拡張される（図２で定義）。トレーニング及びテストのために使用された画像は、コレル（Ｃｏｒｅｌ）及びカメラ使用者の画像を含んでいた。ＳＶＭクラシファイヤーは１対すべてのアプローチ（Ｂ．Ｓｃｈｏｌｋｏｐｆ，Ｃ．Ｂｕｒｇｅｓ，ａｎｄＡ．Ｓｍｏｌａ，ＡｄｖａｎｃｅｓｉｎＫｅｒｎｅｌＭｅｔｈｏｄｓ：ＳｕｐｐｏｒｔＶｅｃｔｏｒＬｅａｒｎｉｎｇ，ＭＩＴＰｒｅｓｓ，Ｃａｍｂｒｉｄｇｅ，ＭＡ，１９９９，ｐｐ．２５６−２５８を参照のこと）の使用により多数の種類まで拡張されたが、同じ特徴及びクラシファイヤーは夕焼け検出器のためのように使用される。トレーニングセットが未だ制限されていので、トレーニングで使用された場合、空間的な再構成は特に効果的であった。再構成はテストセットでは使用されなかった。 Classification of outdoor landscapes The system described above is extended to identify six types of outdoor landscapes: sandy beaches, sunsets, fallen leaves, fields, mountains, and cities (defined in Figure 2). Images used for training and testing included images of Corel and camera users. The SVM classifier is a one-to-all approach (see B. Scholkopf, C. Burges, and A. Smola, Advanceds in Kernel Methods: Support Vector Learning , MIT Press, 99, MIT Bridge, 99. The same features and classifiers are used as for sunset detectors. Spatial reconstruction was particularly effective when used in training, as the training set is still limited. Reconfiguration was not used in the test set.

画像配位検出
自動的な画像配位検出（Ｙ．ＷａｎｇａｎｄＨ．Ｚｈａｎｇ， “Ｃｏｎｔｅｎｔ−ｂａｓｅｄｉｍａｇｅｏｒｉｅｎｔａｔｉｏｎｄｅｔｅｃｔｉｏｎｗｉｔｈｓｕｐｐｏｒｔｖｅｃｔｏｒｍａｃｈｉｎｅｓ，” ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥＷｏｒｋｓｈｏｐｏｎＣｏｎｔｅｎｔ−ＢａｓｅｄＡｃｃｅｓｓｏｆＩｍａｇｅａｎｄＶｉｄｅｏＬｉｂｒａｒｉｅｓ，２００１を参照）の目的は、画像の上部が直面している方向に依存して、４つの磁針方向（Ｎ、Ｓ、Ｅ、Ｗ）のうちの１つに任意の画像を分類することである。画像内容だけに基づいてそうすることは、困難な問題である。好ましい実施態様において、Ｗａｎｇ等に類似して同様の結果を達成する、ベースラインシステムは空間的な色の瞬間と、１対すべてのＳＶＭクラシファイヤーを使用する。 Image Coordination Detection Automatic Image Coordination Detection (Y. Wang and H. Zhang, “Content-based image orientation with detection of the first and third of the worlds,” Proceedings of IE The purpose of (see) is to classify any image into one of four magnetic needle directions (N, S, E, W), depending on the direction the top of the image is facing. Doing so based solely on image content is a difficult problem. In a preferred embodiment, a baseline system that achieves similar results similar to Wang et al. Uses spatial color moments and one-to-all SVM classifiers.

テストでの再構成は、同様にこの領域の分類を改善すると予期できるが、しかし、その予測を使用するための論理的基礎は非常に異なり、画像エッジのクロップは、画像の知覚された配位に影響してはならない。したがって、多数のわずかに異なる画像に基づいた組合せの分類は、単一画像の分類よりもよりロバストであるべきである。固定された（選択）及びトレーニングされたコンバイナーの両者でテストして、各々の性能は比較可能であることが分かり、選択は性能の簡素化のために選択された。 Reconstruction in the test can be expected to improve the classification of this region as well, but the logical basis for using that prediction is very different: the cropping of the image edge is the perceived configuration of the image Should not affect. Therefore, combination classification based on a number of slightly different images should be more robust than single image classification. Testing with both fixed (choice) and trained combiners, it was found that the performance of each was comparable, and the choice was chosen for performance simplification.

この適用において、画像は、与えられた配位の画像を認識するために調整されたＳＶＭから各々４つのスコアで分類される。１対すべてのクラシファイヤーは、最大スコアを生じるＳＶＭに対応する配位で画像を分類する。この処理は９回繰り返され、画像の各クロップされた変形において一度である。かかる工程は、結束を断つためにスコアを使用して、９つの分類において最終的に選択する（結束は支配された単一の配位を意味しないが、画像は拒絶に対するよい候補である、つまり明白な配位がないことである）。選択スキームの実施例は、図７に与えられている。 In this application, images are classified with 4 scores each from SVMs adjusted to recognize images of a given configuration. One-to-all classifiers classify images with a configuration corresponding to the SVM that produces the highest score. This process is repeated nine times, once for each cropped deformation of the image. Such a process uses the score to break the bond and finally selects in 9 classifications (bonding does not mean a single coordinated domination, but the image is a good candidate for rejection, ie No obvious coordination). An example of a selection scheme is given in FIG.

再構成スキームを用いて得られたサンプルコレル画像は、図８に示される。それらの各々において、画像の境界における数多の領域は混乱している。暗い影（図８（ｃ））と、暗い木（図８（ｂ））と、太陽からの反射（図８（ｃ））とはすべてクラシファイヤーを混乱させ、明るいか又は暗い領域は画像の側面に現れて、上下には現れない。 A sample collel image obtained using the reconstruction scheme is shown in FIG. In each of them, the many regions at the image boundaries are confused. Dark shadows (Fig. 8 (c)), dark trees (Fig. 8 (b)), and reflections from the sun (Fig. 8 (c)) all confused the classifier, with bright or dark areas in the image. Appears on the side, not up or down.

画像再構成は、単一のクラシファイヤーだけが分類でトレーニングされ使用される主な区別を伴う、ブートストラップ又はバッグ方法の趣旨と同様である。画像分類問題に対するこのスキームの成功した適用の鍵は、かかる画像再構成が、最終分類でそれら分類を無視することができ、また、顕著な内容物は画像へのそのような混乱に対する不変であるような方法で画像において混乱する構成部分に単に影響するだろうということである。したがって、これは、再構成の適切な方法が、問題の領域及び使用される特徴／クラシファイヤーによって選択される限り、分類性能をブーストすることに対する一般的なアプローチである。 Image reconstruction is similar to the spirit of the bootstrap or bag method, with the main distinction that only a single classifier is trained and used in classification. The key to the successful application of this scheme to the image classification problem is that such image reconstruction can ignore those classifications in the final classification, and significant content is invariant to such confusion to the image It will simply affect the components that are confused in the image in such a way. This is therefore a general approach to boosting classification performance as long as the appropriate method of reconstruction is selected by the problem area and the feature / classifier used.

下記のガイドラインは、画像分類において画像再構成を使用する方法を決定することを支援するために提示される。第一に、トレーニングセットがまばらな場合、保守的で空間的な再構成の使用は非常に助けになるかもしれない。空間的及び一時的である両者のより積極的な再構成は、準管理された手法で行われるべきである。２つの種類の問題において、テスト画像の再構成は、適用への性能を修正するために使用することができる操作の曲線パラメーターを与えて、同じ種類の見本と良好な一致を引き起こす。多数の種類の問題では、再構成された画像の分類中に選択することは、よりロバストである。明らかに、種類がトレーニングデータで良好に分離されてテスト画像が見本と良好に一致する理想的な場合、再構成は多大に支援することを期待しない。 The following guidelines are presented to assist in determining how to use image reconstruction in image classification. First, if the training set is sparse, the use of conservative and spatial reconstruction may be very helpful. More aggressive reconstruction of both spatial and temporal should be done in a semi-controlled manner. In the two types of problems, the reconstruction of the test image gives operational curve parameters that can be used to modify the performance to the application, causing a good match with the same type of sample. For many types of problems, selecting during classification of the reconstructed image is more robust. Obviously, in the ideal case where the types are well separated in the training data and the test images are in good agreement with the sample, the reconstruction is not expected to greatly assist.

図９は、本発明によるデジタル画像の場面分類を改善するための方法を示す。最初に、見本画像１０又は入力テスト画像１２のいずれかが入力段階１４に提供され、次いで、発明の詳細な記載で記述したように、空間的な再構成アルゴリズム１８又は一時的な再構成アルゴリズム２０のいずれか（又は両者）にしたがって画像が系統的に再構成される、再構成段階１６に適用される。再構成の結果は、見本画像２４の拡張されたセット又はテスト画像２６の拡張されたセットのいずれか（又は両者）であろう、入力画像（見本又はテスト画像）の種類に依存する、画像２２の拡張されたセットである。画像の拡張されたセットが見本画像である場合、画像の拡張されたセットはトレーニング段階２８のクラシファイヤーでトレーニングするために使用され、それによって、本発明による改善されたクラシファイヤーを提供する。画像の拡張されたセットがテスト画像である場合、画像の拡張されたセットはクラシファイヤー段階３０で使用され、それによって、本発明による改善された画像分類の結果を提供する。トレーニング段階２８と分類段階３０とを接続する点線３２で示されるように、見本画像２４の拡張されたセットから結果となる改善されたクラシファイヤーは、全体にわたる改善された分類結果を提供するために、テスト画像２６の拡張されたセットと共に使用されてよい。しかしながら、さらに、図９で示される２つの径路の１つのみに再構成段階１６を適用することは可能である（つまり、改善されたクラシファイヤーのトレーニングか、又は改善された画像分類結果の提供のいずれかであり、両者ではない。）。 FIG. 9 illustrates a method for improving scene classification of digital images according to the present invention. First, either the sample image 10 or the input test image 12 is provided to the input stage 14 and then, as described in the detailed description of the invention, a spatial reconstruction algorithm 18 or a temporal reconstruction algorithm 20. Applied to the reconstruction stage 16, in which the image is systematically reconstructed according to either (or both). The result of the reconstruction depends on the type of input image (sample or test image), which may be either an expanded set of sample images 24 or an expanded set of test images 26 (or both). Is an extended set of If the expanded set of images is a sample image, the expanded set of images is used to train with the classifier at training stage 28, thereby providing an improved classifier according to the present invention. If the expanded set of images is a test image, the expanded set of images is used in the classifier stage 30, thereby providing improved image classification results according to the present invention. The improved classifier resulting from the expanded set of sample images 24, as indicated by the dotted line 32 connecting the training stage 28 and the classification stage 30, is to provide improved overall classification results. , May be used with an expanded set of test images 26. In addition, however, it is possible to apply the reconstruction stage 16 to only one of the two paths shown in FIG. 9 (i.e., improved classifier training or providing improved image classification results). Or both.)

本発明の主題は、人間の理解しうる対象物、性状又は条件に対する有用な意味を認識しそれによって選定し、次いで、ディジタル画像の一層の処理において得られる結果を利用するためにデジタル画像をデジタル的に処理する技術を意味するために理解される、デジタル画像の理解技術に関係がある。 The subject of the present invention is to recognize and select useful meanings for human-understood objects, properties or conditions and then digitally convert the digital image to take advantage of the results obtained in further processing of the digital image. Related to digital image comprehension technology, which is understood to mean a technical processing technique.

場面分類はまた、画像の強調における適用で見ることができる。総括的な色のバランス及びすべての場面に対して露出調整を適用するのではなく、調整は場面に対してカスタマイズすることができるかもしれない。例えば、タングステンを照射された屋内の画像から暖色系のキャストを取り除く間に、夕焼け画像における光り輝く色を保持するか又はブーストする。 Scene classification can also be seen in applications in image enhancement. Rather than applying an overall color balance and exposure adjustment to every scene, the adjustment may be customizable for the scene. For example, while removing a warm cast from an indoor image illuminated with tungsten, the glowing color in the sunset image is retained or boosted.

本発明によって記載された再構成技術は、写真画像に限定されない。例えば、空間的な再構成はさらに、医療画像分類における医療画像に対して（色の再構成は適用されないが、）適用できる。 The reconstruction technique described by the present invention is not limited to photographic images. For example, spatial reconstruction can also be applied to medical images in medical image classification (although color reconstruction does not apply).

混乱させる前景領域を備える夕焼け画像の４つの画像の一つを示す。One of the four images of a sunset image with a confusing foreground area is shown. 混乱させる前景領域を備える夕焼け画像の４つの画像の一つを示す。One of the four images of a sunset image with a confusing foreground area is shown. 混乱させる前景領域を備える夕焼け画像の４つの画像の一つを示す。One of the four images of a sunset image with a confusing foreground area is shown. 混乱させる前景領域を備える夕焼け画像の４つの画像の一つを示す。One of the four images of a sunset image with a confusing foreground area is shown. 原型の見本と良好に一致する、任意の画像（ａ）がどのように画像（ｄ）に変形されるかを例証する一連の画像の一つを示す。FIG. 6 shows one of a series of images that illustrate how an arbitrary image (a) is transformed into an image (d) that is in good agreement with a prototype sample. 原型の見本と良好に一致する、任意の画像（ａ）がどのように画像（ｄ）に変形されるかを例証する一連の画像の一つを示す。FIG. 6 shows one of a series of images that illustrate how an arbitrary image (a) is transformed into an image (d) that is in good agreement with a prototype sample. 原型の見本と良好に一致する、任意の画像（ａ）がどのように画像（ｄ）に変形されるかを例証する一連の画像の一つを示す。FIG. 6 shows one of a series of images that illustrate how an arbitrary image (a) is transformed into an image (d) that is in good agreement with a prototype sample. 原型の見本と良好に一致する、任意の画像（ａ）がどのように画像（ｄ）に変形されるかを例証する一連の画像の一つを示す。FIG. 6 shows one of a series of images that illustrate how an arbitrary image (a) is transformed into an image (d) that is in good agreement with a prototype sample. オリジナル画像（ｂ）が水平反射（ａ）又はクロップ（（ｃ）で示されるような底部から２０％）によって変形される際の空間的な再構成の実施例を示す図である。FIG. 6 shows an example of spatial reconstruction when the original image (b) is deformed by horizontal reflection (a) or crop (20% from the bottom as shown in (c)). オリジナル画像（ｂ）が水平反射（ａ）又はクロップ（（ｃ）で示されるような底部から２０％）によって変形される際の空間的な再構成の実施例を示す図である。FIG. 6 shows an example of spatial reconstruction when the original image (b) is deformed by horizontal reflection (a) or crop (20% from the bottom as shown in (c)). オリジナル画像（ｂ）が水平反射（ａ）又はクロップ（（ｃ）で示されるような底部から２０％）によって変形される際の空間的な再構成の実施例を示す図である。FIG. 6 shows an example of spatial reconstruction when the original image (b) is deformed by horizontal reflection (a) or crop (20% from the bottom as shown in (c)). 一連の照らしのシフトからなる一時的な再構成の実施例を示す図である。FIG. 5 shows an example of a temporary reconstruction consisting of a series of illumination shifts. 一連の照らしのシフトからなる一時的な再構成の実施例を示す図である。FIG. 5 shows an example of a temporary reconstruction consisting of a series of illumination shifts. 一連の照らしのシフトからなる一時的な再構成の実施例を示す図である。FIG. 5 shows an example of a temporary reconstruction consisting of a series of illumination shifts. 一連の照らしのシフトからなる一時的な再構成の実施例を示す図である。FIG. 5 shows an example of a temporary reconstruction consisting of a series of illumination shifts. 一連の照らしのシフトからなる一時的な再構成の実施例を示す図である。FIG. 5 shows an example of a temporary reconstruction consisting of a series of illumination shifts. 一連の照らしのシフトからなる一時的な再構成の実施例を示す図である。FIG. 5 shows an example of a temporary reconstruction consisting of a series of illumination shifts. 空間的な再構成を使用することによって引き起こされた誤りのポジの典型的な実施例を示す図である。FIG. 6 illustrates an exemplary embodiment of the positive of errors caused by using spatial reconstruction. 空間的な再構成を使用することによって引き起こされた誤りのポジの典型的な実施例を示す図である。FIG. 6 illustrates an exemplary embodiment of the positive of errors caused by using spatial reconstruction. オリジナル（左の画像）と照らしのシフト（＋６ボタン分）の画像（右の画像）が示される際に、一時的な再構成が使用される場合に、夕焼け及び誤りのポジが処理される方法を示す図である。How to handle sunset and error positives when temporary reconstruction is used when the original (left image) and shift (+6 buttons) image (right image) is shown FIG. オリジナル（左の画像）と照らしのシフト（＋６ボタン分）の画像（右の画像）が示される際に、一時的な再構成が使用される場合に、夕焼け及び誤りのポジが処理される方法を示す図である。How to handle sunset and error positives when temporary reconstruction is used when the original (left image) and shift (+6 buttons) image (right image) is shown FIG. オリジナル（左の画像）と照らしのシフト（＋６ボタン分）の画像（右の画像）が示される際に、一時的な再構成が使用される場合に、夕焼け及び誤りのポジが処理される方法を示す図である。How to handle sunset and error positives when temporary reconstruction is used when the original (left image) and shift (+6 buttons) image (right image) is shown FIG. オリジナル（左の画像）と照らしのシフト（＋６ボタン分）の画像（右の画像）が示される際に、一時的な再構成が使用される場合に、夕焼け及び誤りのポジが処理される方法を示す図である。How to handle sunset and error positives when temporary reconstruction is used when the original (left image) and shift (+6 buttons) image (right image) is shown FIG. オリジナル（左の画像）と照らしのシフト（＋６ボタン分）の画像（右の画像）が示される際に、一時的な再構成が使用される場合に、夕焼け及び誤りのポジが処理される方法を示す図であって、太陽が地平線に近いが沈まない冬景色を示す、意図的に混同する画像のうちの１つの画像である。How to handle sunset and error positives when temporary reconstruction is used when the original (left image) and shift (+6 buttons) image (right image) is shown FIG. 2 is an image of one of intentionally confused images showing a winter scene where the sun is close to the horizon but does not sink. オリジナル（左の画像）と照らしのシフト（＋６ボタン分）の画像（右の画像）が示される際に、一時的な再構成が使用される場合に、夕焼け及び誤りのポジが処理される方法を示す図であって、太陽が地平線に近いが沈まない冬景色を示す、意図的に混同する画像のうちの１つの画像である。How to handle sunset and error positives when temporary reconstruction is used when the original (left image) and shift (+6 buttons) image (right image) is shown FIG. 2 is an image of one of intentionally confused images showing a winter scene where the sun is close to the horizon but does not sink. 選択、例えば、「Ｔ１０」は画像上部から１０％のクリップを意味する、を使用して、独立した再構成決定を解決する表である。A table that resolves independent reconstruction decisions using a selection, eg, “T10” means 10% clip from the top of the image. 本発明による再構成を使用することによって得られるサンプルのテスト画像の実施例を示す図である。FIG. 4 shows an example of a sample test image obtained by using the reconstruction according to the invention. 本発明による再構成を使用することによって得られるサンプルのテスト画像の実施例を示す図である。FIG. 4 shows an example of a sample test image obtained by using the reconstruction according to the invention. 本発明による再構成を使用することによって得られるサンプルのテスト画像の実施例を示す図である。FIG. 4 shows an example of a sample test image obtained by using the reconstruction according to the invention. 本発明を実行するための方法の要素を概略する図である。FIG. 2 schematically illustrates elements of a method for carrying out the present invention.

Explanation of symbols

１０見本画像の入力
１２テスト画像の入力
１４入力段階
１６再構成段階
１８空間的な再構成のアルゴリズム
２０一時的な再構成のアルゴリズム
２２画像の拡張されたセット
２４見本画像の拡張されたセット
２６テスト画像の拡張されたセット
２８トレーニング段階
３０分類段階
３２点線 10 Sample Image Input 12 Test Image Input 14 Input Phase 16 Reconstruction Phase 18 Spatial Reconstruction Algorithm 20 Temporary Reconstruction Algorithm 22 Extended Set of Images 24 Extended Set of Sample Images 26 Tests Extended set of images 28 Training stage 30 Classification stage 32 Dotted line

Claims

A method for improving image classification of digital images,
(A) an image provision stage;
(B) a systematic reconstruction step of the image to generate an expanded set of images; and (c) at least one improved classifier and improved classification of the expanded set of images. Using a classifier and an expanded set of images to determine an image classification in the image that provides a result.

The method of claim 1, wherein step (b) includes spatial reconstruction of the images to produce an expanded set of spatially reconstructed images.

Step (b) includes a temporal reconstruction of the images to generate an extended set of temporarily reconstructed images, whereby the images of the expanded set are initially The method of claim 1, further comprising simulating the appearance of late imaging.