JP2023031373A

JP2023031373A - Image processing device, image processing system, and program

Info

Publication number: JP2023031373A
Application number: JP2021136811A
Authority: JP
Inventors: 達哉森; Tatsuya Mori
Original assignee: Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2023-03-09

Abstract

【課題】対象領域と背景領域との境界が曖昧である場合でも、対象領域を背景領域と区別して分離することができる画像処理装置、画像処理システム、プログラムを提供する。【解決手段】プロセッサを備え、プロセッサは、ユーザが入力した位置であり、画像中で分離する対象となる領域である対象領域の代表位置を取得し、対象領域の輪郭を推定し、代表位置と推定した輪郭とを用いて対象領域を求める画像処理装置１０。対象領域の輪郭は、例えば、輪郭を学習させた学習モデルを基に推定する。【選択図】図１[Problem] To provide an image processing device, image processing system, and program capable of distinguishing and separating a target region from a background region even when the boundary between the target region and the background region is unclear. [Solution] An image processing device 10 equipped with a processor, the processor acquires a representative position of the target region, which is a position input by a user and is the region to be separated in an image, estimates the contour of the target region, and obtains the target region using the representative position and the estimated contour. The contour of the target region is estimated, for example, based on a learning model that has been trained to learn the contour. [Selected Figure] Figure 1

Description

本発明は、画像処理装置、画像処理システム、プログラムに関する。 The present invention relates to an image processing device, an image processing system, and a program.

従来、画像中の必要な領域を対象領域として分離し、利用することが行われている。 Conventionally, a necessary area in an image is separated as a target area and used.

特許文献１には、原画像を光電走査して読み取った画像データから主要画像の輪郭線を抽出するアルゴリズムとして複数の異なるアルゴリズムを予め備えるようにすることが記載されている。ここで、画像データから検出した原画像の特徴データや、オペレータによって入力されるアルゴリズム指示などに基づいて複数のアルゴリズムの中から１つを選択する。そして、選択されたアルゴリズムに従って輪郭線を抽出させ、抽出された輪郭線に基づいて画像切り抜きを行い、マスク版を作成する。
特許文献２には、処理対象の画像を取り込み、この画像に対して所定のエッジ検出処理を行うと共に、このエッジ検出処理によって検出されたエッジに基づいて輪郭を抽出することが記載されている。そして、抽出した複数の輪郭の中から最外殻の輪郭を選択し、この選択した最外殻の輪郭に基づいて、前景領域、背景領域、境界領域に区分した３値画像を生成し、この３値画像に対して前景画素と背景画素に分割する所定の画像分割処理を行う。
特許文献３には、入力部が、元画像の前景領域の境界を含み、前景領域の大まかな形状を示す処理領域を入力し、初期確率場生成部が、処理領域画像の前景領域側を前景の一部、背景領域側を背景の一部の教示として、初期確率場を生成し、形状候補抽出部が前景領域の輪郭線を抽出し、グラフカット計算部が、輪郭線に基づいて重み付けた初期確率場をコスト関数としてグラフカットを用いて前景領域と背景領域とに分離することが記載されている。 Japanese Patent Application Laid-Open No. 2002-200001 describes that a plurality of different algorithms are provided in advance as algorithms for extracting contour lines of a main image from image data read by photoelectrically scanning an original image. Here, one algorithm is selected from a plurality of algorithms based on the feature data of the original image detected from the image data, algorithm instructions input by the operator, and the like. Then, contour lines are extracted according to the selected algorithm, and the image is clipped based on the extracted contour lines to create a mask plane.
Patent Document 2 describes capturing an image to be processed, performing predetermined edge detection processing on this image, and extracting a contour based on the edges detected by this edge detection processing. Then, the outermost contour is selected from among the plurality of extracted contours, and based on the selected outermost contour, a ternary image divided into a foreground area, a background area, and a boundary area is generated. Predetermined image division processing for dividing a ternary image into foreground pixels and background pixels is performed.
In Patent Document 3, an input unit inputs a processing region that includes the boundary of the foreground region of the original image and indicates a rough shape of the foreground region, and an initial random field generation unit converts the foreground region side of the processing region image into the foreground. A part of the background area is used as a part of the background, an initial random field is generated, the shape candidate extraction unit extracts the outline of the foreground area, and the graph cut calculation unit weights the outline based on the outline. Separating the initial random field into foreground and background regions using graph cut as a cost function is described.

特開平７－９２６５１Japanese Patent Laid-Open No. 7-92651 特開２０１６－１２２３６７JP 2016-122367 特開２０１７－２２００９８JP 2017-220098

ところが、対象領域と対象領域以外の背景領域との境界が曖昧である場合、対象領域に背景領域の一部が含まれてしまう場合がある。かかる場合には、ユーザが対象領域として取り込まれた背景領域の一部を削除するなどの手間が発生する。
本発明は、対象領域と背景領域との境界が曖昧である場合でも、対象領域を背景領域と区別して分離することができる画像処理装置、画像処理システム、プログラムを提供することを目的とする。 However, when the boundary between the target region and the background region other than the target region is ambiguous, the target region may include part of the background region. In such a case, the user is required to delete part of the background area captured as the target area.
SUMMARY OF THE INVENTION An object of the present invention is to provide an image processing apparatus, an image processing system, and a program capable of distinguishing and separating a target area from a background area even when the boundary between the target area and the background area is ambiguous.

請求項１に記載の発明は、プロセッサを備え、前記プロセッサは、ユーザが入力した位置であり、画像中で分離する対象となる領域である対象領域の代表位置を取得し、前記対象領域の輪郭を推定し、前記代表位置と推定した輪郭とを用いて前記対象領域を求める画像処理装置である。
請求項２に記載の発明は、前記対象領域の輪郭は、輪郭を学習させた学習モデルを基に推定する請求項１に記載の画像処理装置である。
請求項３に記載の発明は、前記学習モデルは、曖昧な輪郭を学習させたものである請求項２に記載の画像処理装置である。
請求項４に記載の発明は、前記プロセッサは、ユーザの確認による輪郭の修正を受け付け、推定した前記対象領域の輪郭を修正する請求項２に記載の画像処理装置である。
請求項５に記載の発明は、前記修正は、輪郭が推定されていない箇所をユーザが補充する請求項４に記載の画像処理装置である。
請求項６に記載の発明は、前記代表位置と推定した輪郭とを用いて前記対象領域を求めるときは、画素値の差分計算により行う請求項１に記載の画像処理装置である。
請求項７に記載の発明は、前記差分計算は、前記代表位置の画素値を基に求められる画素値の差分と、輪郭を表す境界特徴の差分とをともに利用した距離関数の計算である請求項６に記載の画像処理装置である。
請求項８に記載の発明は、前記代表位置と推定した輪郭とを用いて前記対象領域を求めるときは、当該代表位置を基に仮の対象領域を求め、推定した輪郭により当該仮の対象領域を補正する請求項１に記載の画像処理装置である。
請求項９に記載の発明は、前記補正は、前記仮の対象領域のうち、推定した輪郭より内側の領域を前記対象領域に入れ、推定した輪郭より外側の領域を当該対象領域から外す請求項８に記載の画像処理装置。
請求項１０に記載の発明は、前記代表位置と推定した輪郭とを用いて前記対象領域を求めるときは、画像の画素値と輪郭を表す境界特徴との要素積により行う請求項１に記載の画像処理装置である。
請求項１１に記載の発明は、前記対象領域を求めるときは、前記境界特徴を反映する割合を調整する請求項１０に記載の画像処理装置である。
請求項１２に記載の発明は、前記代表位置を用いて前記対象領域を求めるときは、当該代表位置に含まれる画素の強さおよび当該代表位置に含まれる画素の周囲に及ぼす加重に基づいて周囲の画素が当該対象領域に含まれるか否かを表すラベルを決定し、ラベルが決定した画素を新たな起点の画素としてさらに当該決定を行うことを繰り返すことで周囲の画素のラベルを予測し、当該対象領域を求める請求項１に記載の画像処理装置である。
請求項１３に記載の発明は、画像を表示する表示装置と、前記表示装置に表示される前記画像の中で分離する対象となる領域である対象領域の輪郭をユーザが確認して画像処理を行うプロセッサを有する画像処理装置と、を備え、前記プロセッサは、ユーザが入力した位置であり、前記対象領域の代表位置を取得し、前記対象領域の輪郭を推定し、前記代表位置と推定した輪郭とを用いて前記対象領域を求める画像処理システムである。
請求項１４に記載の発明は、コンピュータに、ユーザが入力した位置であり、画像中で分離する対象となる領域である対象領域の代表位置を取得する機能と、前記対象領域の輪郭を推定する機能と、前記代表位置と推定した輪郭とを用いて前記対象領域を求める機能と、を実現させるためのプログラムである。 The invention according to claim 1 comprises a processor, and the processor acquires a representative position of a target region, which is a position input by a user and is a region to be separated in an image, and obtains a contour of the target region. is estimated, and the target area is obtained using the representative position and the estimated contour.
The invention according to claim 2 is the image processing apparatus according to claim 1, wherein the contour of the target area is estimated based on a learning model that has learned the contour.
The invention according to claim 3 is the image processing apparatus according to claim 2, wherein the learning model is obtained by learning an ambiguous contour.
The invention according to claim 4 is the image processing apparatus according to claim 2, wherein the processor accepts contour correction based on user confirmation and corrects the estimated contour of the target region.
The invention according to claim 5 is the image processing apparatus according to claim 4, wherein the correction is performed by the user supplementing a portion where the contour is not estimated.
The invention according to claim 6 is the image processing apparatus according to claim 1, wherein when the target area is obtained by using the representative position and the estimated contour, difference calculation of pixel values is performed.
In a seventh aspect of the present invention, the difference calculation is a calculation of a distance function using both the pixel value difference obtained based on the pixel value of the representative position and the boundary feature difference representing the contour. Item 7. An image processing apparatus according to Item 6.
According to the eighth aspect of the invention, when the target area is obtained using the representative position and the estimated contour, a temporary target area is obtained based on the representative position, and the estimated contour is used to determine the temporary target area. 2. The image processing apparatus according to claim 1, which corrects the .
According to a ninth aspect of the present invention, in the provisional target area, the correction includes an area inside the estimated contour in the target area, and an area outside the estimated outline is excluded from the target area. 9. The image processing device according to 8.
According to a tenth aspect of the present invention, when the target region is obtained using the representative position and the estimated contour, the element product of the pixel value of the image and the boundary feature representing the contour is used. It is an image processing device.
The invention according to claim 11 is the image processing apparatus according to claim 10, wherein when obtaining the target area, a ratio of reflecting the boundary feature is adjusted.
According to a twelfth aspect of the invention, when the target area is obtained using the representative position, the intensity of the pixels included in the representative position and the weight applied to the surroundings of the pixels included in the representative position are used. determine a label indicating whether or not the pixel is included in the target region, and repeat the determination with the pixel with the determined label as a new starting point pixel to predict the labels of the surrounding pixels, 2. The image processing apparatus according to claim 1, wherein said target area is obtained.
According to a thirteenth aspect of the invention, there is provided a display device for displaying an image, and a user confirms the outline of a target region, which is a region to be separated in the image displayed on the display device, and performs image processing. and an image processing apparatus having a processor for performing the following: the processor obtains a representative position of the target area, which is a position input by a user, estimates a contour of the target area, and calculates the representative position and the estimated contour is an image processing system that obtains the target area using and.
According to the fourteenth aspect of the invention, a computer has a function of acquiring a representative position of a target region, which is a position input by a user and is a region to be separated in an image, and estimating the contour of the target region. and a function of determining the target area using the representative position and the estimated contour.

請求項１の発明によれば、対象領域と背景領域との境界が曖昧である場合でも、対象領域を背景領域と区別して分離することができる。
請求項２の発明によれば、対象領域をより簡単に分離することができる。
請求項３の発明によれば、曖昧な輪郭が含まれていても輪郭を推定することができる。
請求項４の発明によれば、学習モデルにより輪郭が推定できない箇所でも輪郭を定めることができる。
請求項５の発明によれば、学習モデルにより輪郭が推定できない場合でも輪郭を定めることができる。
請求項６の発明によれば、対象領域の分離がより簡易になる。
請求項７の発明によれば、境界をより簡単に決定できる。
請求項８、９の発明によれば、輪郭をより精度よく定めることができる。
請求項１０の発明によれば、対象領域をより容易に定めることができる。
請求項１１の発明によれば、対象領域をより精度よく定めることができる。
請求項１２の発明によれば、対象領域を算出する計算をより高速に行うことができる。
請求項１３の発明によれば、ユーザがインタラクティブに対象領域を分離することができる画像処理システムを提供できる。
請求項１４の発明によれば、対象領域と背景領域との境界が曖昧である場合でも、対象領域を背景領域と区別して分離することができる機能をコンピュータにより実現できる。 According to the invention of claim 1, even when the boundary between the target region and the background region is ambiguous, the target region can be distinguished and separated from the background region.
According to the invention of claim 2, the target area can be separated more easily.
According to the invention of claim 3, even if an ambiguous contour is included, the contour can be estimated.
According to the fourth aspect of the invention, the contour can be determined even at a location where the contour cannot be estimated by the learning model.
According to the fifth aspect of the invention, the contour can be determined even when the contour cannot be estimated by the learning model.
According to the sixth aspect of the invention, separation of the target area becomes easier.
According to the invention of claim 7, the boundary can be determined more easily.
According to the eighth and ninth aspects of the invention, the contour can be defined with higher accuracy.
According to the invention of claim 10, the target area can be defined more easily.
According to the eleventh aspect of the invention, the target area can be determined with higher accuracy.
According to the invention of claim 12, the calculation for calculating the target area can be performed at a higher speed.
According to the thirteenth aspect of the invention, it is possible to provide an image processing system that allows a user to interactively separate a target region.
According to the invention of claim 14, even when the boundary between the target area and the background area is ambiguous, a computer can realize a function capable of distinguishing and separating the target area from the background area.

本実施の形態における画像処理システムの構成例を示す図である。1 is a diagram showing a configuration example of an image processing system according to an embodiment; FIG. 本実施形態における画像処理装置の機能構成例を表すブロック図である。1 is a block diagram showing a functional configuration example of an image processing apparatus according to an embodiment; FIG. 対象領域を指定する作業をユーザインタラクティブに行う方法の例を示した図である。FIG. 10 is a diagram showing an example of a method of user-interactively performing an operation of designating a target region; （ａ）～（ｃ）は、輪郭推定部が、学習モデルを使用して対象領域の輪郭を推定する処理について示した概念図である。(a) to (c) are conceptual diagrams showing a process of estimating a contour of a target region by a contour estimating unit using a learning model. （ａ）～（ｂ）は、曖昧な輪郭を有するものとして、ビニール袋の例を示した例である。(a) to (b) are examples showing examples of plastic bags having ambiguous contours. （ａ）～（ｂ）は、学習モデルにより推定した輪郭を活用せず、従来の領域分離方法で対象領域を切り出した結果を示している。(a) and (b) show the result of clipping the target area by the conventional area separation method without using the contour estimated by the learning model. （ａ）～（ｂ）は、本実施の形態の深層学習モデルを使用して、輪郭を抽出した結果を示している。(a) and (b) show the results of contour extraction using the deep learning model of this embodiment. （ａ）～（ｂ）は、ユーザが輪郭を入力する場合について示した図である。(a) and (b) are diagrams showing a case where a user inputs a contour. （ａ）～（ｃ）は、学習モデルにより輪郭を推定する方法と、ユーザによる輪郭の入力とを併用した場合について示している。(a) to (c) show the case where the method of estimating the contour by the learning model and the input of the contour by the user are used together. （ａ）～（ｃ）は、図３で示した画像について、領域拡張方法により対象領域が切り出される様子を示している。(a) to (c) show how the target region is cut out by the region expansion method for the image shown in FIG. （ａ）～（ｂ）は、図５（ａ）～（ｂ）の画像に対し、本実施の形態の方法により対象領域が決定された場合を示している。(a) and (b) show the case where the target area is determined by the method of the present embodiment for the images of FIGS. 5(a) and 5(b). （ａ）～（ｂ）は、加重について説明した図である。(a) and (b) are diagrams explaining weighting. （ａ）～（ｂ）は、加重を決定する方法について示した図である。(a)-(b) are diagrams showing a method of determining weights. （ａ）～（ｃ）は、第２の実施形態の方法で対象領域を切り出す様子を示した図である。(a) to (c) are diagrams showing how a target region is cut out by the method of the second embodiment. 本実施形態における画像処理装置の動作について説明したフローチャートである。4 is a flow chart describing the operation of the image processing apparatus according to the embodiment; 対象領域が２つの場合について示した図である。FIG. 10 is a diagram showing a case where there are two target areas;

以下、添付図面を参照して、本発明の実施の形態について詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

＜画像処理システム全体の説明＞
図１は、本実施の形態における画像処理システム１の構成例を示す図である。
図示するように本実施の形態の画像処理システム１は、表示装置２０に表示される画像の画像情報に対し画像処理を行う画像処理装置１０と、画像処理装置１０により作成された画像情報が入力され、この画像情報に基づき画像を表示する表示装置２０と、画像処理装置１０に対しユーザが種々の情報を入力するための入力装置３０とを備える。 <Description of the entire image processing system>
FIG. 1 is a diagram showing a configuration example of an image processing system 1 according to this embodiment.
As shown in the figure, the image processing system 1 of the present embodiment includes an image processing device 10 that performs image processing on image information of an image displayed on a display device 20, and image information created by the image processing device 10. A display device 20 for displaying an image based on this image information, and an input device 30 for a user to input various information to the image processing device 10 are provided.

画像処理装置１０は、例えば、所謂汎用のパーソナルコンピュータ（ＰＣ）である。そして、画像処理装置１０は、ＯＳ（Operating System）による管理下において、各種アプリケーションソフトウェアを動作させることで、画像情報の作成等が行われるようになっている。
画像処理装置１０は、演算手段であるＣＰＵ（Central Processing Unit）と、記憶手段であるメインメモリ、およびＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等のストレージとを備える。ここで、ＣＰＵは、ＯＳやアプリケーションソフトウェア等の各種プログラムを実行する。また、メインメモリは、各種プログラムやその実行に用いるデータ等を記憶する記憶領域であり、ストレージは、各種プログラムに対する入力データや各種プログラムからの出力データ等を記憶する記憶領域である。さらに、画像処理装置１０は、外部との通信を行うための通信インターフェースを備える。ここで、ＣＰＵは、プロセッサの一例である。 The image processing apparatus 10 is, for example, a so-called general-purpose personal computer (PC). The image processing apparatus 10 operates various application software under the control of an OS (Operating System) to create image information.
The image processing apparatus 10 includes a CPU (Central Processing Unit) as a calculation means, a main memory as a storage means, and a storage such as a HDD (Hard Disk Drive) or an SSD (Solid State Drive). Here, the CPU executes various programs such as an OS and application software. The main memory is a storage area for storing various programs and data used for their execution, and the storage is a storage area for storing input data to various programs and output data from various programs. Furthermore, the image processing apparatus 10 has a communication interface for communicating with the outside. Here, the CPU is an example of a processor.

表示装置２０は、表示画面２１に画像を表示する。表示装置２０は、例えばＰＣ用の液晶ディスプレイ、液晶テレビあるいはプロジェクタなど、加法混色にて画像を表示する機能を備えたもので構成される。したがって、表示装置２０における表示方式は、液晶方式に限定されるものではない。なお、図１に示す例では、表示装置２０内に表示画面２１が設けられているが、表示装置２０として例えばプロジェクタを用いる場合、表示画面２１は、表示装置２０の外部に設けられたスクリーン等となる。 The display device 20 displays an image on the display screen 21 . The display device 20 is configured by, for example, a liquid crystal display for a PC, a liquid crystal television, a projector, or the like, which has a function of displaying an image by additive color mixture. Therefore, the display method in the display device 20 is not limited to the liquid crystal method. In the example shown in FIG. 1, the display screen 21 is provided in the display device 20. However, if a projector is used as the display device 20, the display screen 21 may be a screen or the like provided outside the display device 20. becomes.

入力装置３０は、キーボードやマウス等で構成される。入力装置３０は、画像処理を行うためのアプリケーションソフトウェアの起動、終了や、詳しくは後述するが、画像処理を行う際に、ユーザが画像処理装置１０に対し画像処理を行うための指示を入力するのに使用する。 The input device 30 is composed of a keyboard, a mouse, and the like. The input device 30 is used by the user to input instructions to the image processing apparatus 10 to start and end application software for image processing, and to perform image processing, which will be described later in detail. used for

画像処理装置１０および表示装置２０は、ＤＶＩ（Digital Visual Interface）を介して接続されている。なお、ＤＶＩに代えて、ＨＤＭＩ（登録商標）（High-Definition Multimedia Interface）やDisplayPort等を介して接続するようにしてもかまわない。
また、画像処理装置１０と入力装置３０とは、例えば、ＵＳＢ（Universal Serial Bus）を介して接続されている。なお、ＵＳＢに代えて、ＩＥＥＥ１３９４やＲＳ－２３２Ｃ等を介して接続されていてもよい。 The image processing device 10 and the display device 20 are connected via a DVI (Digital Visual Interface). In place of DVI, HDMI (registered trademark) (High-Definition Multimedia Interface), DisplayPort, or the like may be used for connection.
Also, the image processing device 10 and the input device 30 are connected via, for example, a USB (Universal Serial Bus). Note that connection may be made via IEEE1394, RS-232C, or the like instead of USB.

このような画像処理システム１において、表示装置２０には、まず最初に画像処理を行う前の画像である原画像が表示される。そして、ユーザが入力装置３０を使用して、画像処理装置１０に対し画像処理を行うための指示を入力すると、画像処理装置１０により原画像の画像情報に対し画像処理がなされる。この画像処理の結果は、表示装置２０に表示される画像に反映され、画像処理後の画像が再描画されて表示装置２０に表示されることになる。この場合、ユーザは、表示装置２０を見ながらインタラクティブに画像処理を行うことができ、より直感的に、また、より容易に画像処理の作業を行える。 In such an image processing system 1, the display device 20 first displays an original image, which is an image before image processing. When the user uses the input device 30 to input an instruction for image processing to the image processing apparatus 10, the image processing apparatus 10 performs image processing on the image information of the original image. The result of this image processing is reflected in the image displayed on the display device 20 , and the image after the image processing is redrawn and displayed on the display device 20 . In this case, the user can perform image processing interactively while looking at the display device 20, and can perform image processing work more intuitively and more easily.

なお、本実施の形態における画像処理システム１は、図１の形態に限られるものではない。例えば、画像処理システム１としてタブレット端末を例示することができる。この場合、タブレット端末は、タッチパネルを備え、このタッチパネルにより画像の表示を行うとともにユーザの指示が入力される。即ち、タッチパネルが、表示装置２０および入力装置３０として機能する。また、同様に表示装置２０および入力装置３０を統合した装置として、タッチモニタを用いることもできる。これは、上記表示装置２０の表示画面２１としてタッチパネルを使用したものである。この場合、画像処理装置１０により画像情報が作成され、この画像情報に基づきタッチモニタに画像が表示される。そして、ユーザは、このタッチモニタをタッチ等することで画像処理を行うための指示を入力する。 Note that the image processing system 1 according to the present embodiment is not limited to the form shown in FIG. For example, a tablet terminal can be exemplified as the image processing system 1 . In this case, the tablet terminal has a touch panel, and the touch panel displays images and inputs user instructions. That is, the touch panel functions as the display device 20 and the input device 30 . A touch monitor can also be used as a device that similarly integrates the display device 20 and the input device 30 . This uses a touch panel as the display screen 21 of the display device 20 . In this case, image information is created by the image processing device 10, and an image is displayed on the touch monitor based on this image information. Then, the user inputs an instruction for image processing by touching the touch monitor.

＜画像処理装置１０の説明＞
本実施の形態では、画像処理装置１０は、画像処理として、ユーザが所望する箇所を他の箇所と分離する処理を行う。即ち、ユーザの指示により、画像の中から特定の領域として画像中の一部の領域を切り出す処理を行う。なお、以後、この領域を「対象領域」という場合がある。「対象領域」は、ユーザが画像中で分離したい領域であり、画像中で分離する対象となる領域である。 <Description of Image Processing Apparatus 10>
In the present embodiment, the image processing apparatus 10 performs, as image processing, a process of separating a portion desired by the user from other portions. That is, according to a user's instruction, a process of cutting out a part of the image as a specific area from the image is performed. In addition, hereinafter, this area may be referred to as a "target area". A “target region” is a region that the user wants to separate in an image, and is a region to be separated in the image.

図２は、本実施形態における画像処理装置１０の機能構成例を表すブロック図である。なお、図２では、画像処理装置１０が有する種々の機能のうち本実施形態に関係するものを選択して図示している。
図示するように本実施の形態の画像処理装置１０は、原画像の画像情報を取得する画像情報取得部１１と、ユーザ指示を受け付けるユーザ指示受付部１２と、シードを設定するシード設定部１３と、対象領域の輪郭を推定する輪郭推定部１４と、画像を分離する画像分離部１５と、分離後の画像の画像情報を出力する画像情報出力部１６とを備える。 FIG. 2 is a block diagram showing a functional configuration example of the image processing apparatus 10 according to this embodiment. In FIG. 2, among the various functions of the image processing apparatus 10, those related to the present embodiment are selected and illustrated.
As shown, the image processing apparatus 10 of the present embodiment includes an image information acquisition unit 11 that acquires image information of an original image, a user instruction reception unit 12 that receives user instructions, and a seed setting unit 13 that sets a seed. , a contour estimation unit 14 for estimating the contour of a target region, an image separation unit 15 for separating an image, and an image information output unit 16 for outputting image information of the separated image.

画像情報取得部１１は、画像処理を行う対象となる画像の画像情報を取得する。即ち、画像情報取得部１１は、画像処理を行う前の原画像の画像情報を取得する。この画像情報は、表示装置２０で表示を行うための、例えば、ＲＧＢ（Ｒｅｄ、Ｇｒｅｅｎ、Ｂｌｕｅ）のビデオデータ（ＲＧＢデータ）である。 The image information acquisition unit 11 acquires image information of an image to be subjected to image processing. That is, the image information acquisition unit 11 acquires image information of the original image before image processing. This image information is, for example, RGB (Red, Green, Blue) video data (RGB data) for display on the display device 20 .

ユーザ指示受付部１２は、入力装置３０により入力された画像処理に関するユーザによる指示を受け付ける。
詳しくは後述するが、具体的には、ユーザ指示受付部１２は、ユーザが対象領域を切り出す際に、この位置を指定する指示をユーザ指示として受け付ける。 The user instruction accepting unit 12 accepts a user's instruction regarding image processing input from the input device 30 .
Although details will be described later, specifically, when the user cuts out the target area, the user instruction accepting unit 12 accepts an instruction to designate this position as a user instruction.

本実施の形態では、対象領域を指定する作業を下記に説明するユーザインタラクティブに行う方法を採用する。
図３は、対象領域を指定する作業をユーザインタラクティブに行う方法の例を示した図である。
図３では、表示装置２０の表示画面２１で表示している画像が、前景として写る人物と、人物の背後に写る背景とからなる写真の画像Ｇである場合を示している。そして、ユーザが、前景である人物の部分を対象領域として選択する場合を示している。 In the present embodiment, a user-interactive method, which will be described below, is used to specify the target area.
FIG. 3 is a diagram showing an example of a user-interactive method of designating a target region.
FIG. 3 shows a case where the image displayed on the display screen 21 of the display device 20 is a photographic image G including a person appearing in the foreground and a background appearing behind the person. A case is shown in which the user selects the foreground portion of the person as the target area.

そして、ユーザは、対象領域である前景およびそれ以外の背景の領域である背景領域のそれぞれに対し代表となる軌跡をそれぞれ与える。この軌跡は、入力装置３０により入力することができる。具体的には、入力装置３０がマウスであった場合は、マウスを操作して表示装置２０の表示画面２１で表示している画像Ｇをドラッグし軌跡を描く。また、入力装置３０がタッチパネルであった場合は、ユーザの指やタッチペン等により画像Ｇをなぞりスワイプすることで同様に軌跡を描く。なお、軌跡ではなく、点として与えてもよい。即ち、ユーザは、人物の領域に対し代表となる位置を示す情報を与えればよい。これは対象領域の代表位置を表す位置情報をユーザが入力する、と言い換えることもできる。なお以後、この軌跡や点等を「シード」と言うことがある。 Then, the user gives a representative trajectory to each of the foreground, which is the target area, and the background area, which is the other background area. This trajectory can be input by the input device 30 . Specifically, when the input device 30 is a mouse, the mouse is operated to drag the image G displayed on the display screen 21 of the display device 20 to draw a trajectory. Also, if the input device 30 is a touch panel, the locus is similarly drawn by swiping the image G with the user's finger, touch pen, or the like. In addition, you may give as a point instead of a locus|trajectory. In other words, the user only needs to provide information indicating a representative position for the person's area. This can also be rephrased as the user inputting the position information representing the representative position of the target area. Hereinafter, this trajectory, points, etc. may be referred to as "seed".

図３の例では、顔の部分と顔以外の部分にそれぞれシードが描かれている（以後、これらのシードをそれぞれ「シード１」、「シード２」と言うことがある）。
またここでは、説明の便宜上、原画像の領域を、前景である対象領域と背景である背景領域とに分けるものである。よって、３次元空間中で、前景が、前方に位置する画像であり、背景が後方に位置する画像であることを必ずしも意味するものではない。即ち、前景が、後方に位置する画像であり、背景が前方に位置する画像であってもよい。つまり、ここで原画像を、２つの領域に分けたときに、一方を前景と呼び、他方を背景と言うものである。 In the example of FIG. 3, seeds are drawn on the face portion and the non-face portion (hereinafter, these seeds may be referred to as "seed 1" and "seed 2", respectively).
Here, for convenience of explanation, the area of the original image is divided into a target area which is the foreground and a background area which is the background. Therefore, it does not necessarily mean that the foreground is an image located in front and the background is an image located behind in three-dimensional space. That is, the foreground may be an image located in the back and the background may be an image located in the front. That is, when the original image is divided into two areas, one area is called the foreground and the other is called the background.

シード設定部１３は、ユーザ指示受付部１２が受け付けたシードの位置の情報を受け取る。これは、シード設定部１３は、ユーザが入力した位置であり、画像中で分離する対象となる領域である対象領域の代表位置を取得する、と言うこともできる。これにより、シード設定部１３は、シードとなる画素の位置を設定することができる。 The seed setting unit 13 receives the seed position information received by the user instruction receiving unit 12 . It can also be said that the seed setting unit 13 acquires the representative position of the target region, which is the position input by the user and is the region to be separated in the image. Thereby, the seed setting unit 13 can set the position of the pixel to be the seed.

輪郭推定部１４は、対象領域の輪郭を推定する。本実施の形態では、輪郭推定部１４は、対象領域の輪郭を、この輪郭を学習させた学習モデルを基に推定する。
図４（ａ）～（ｃ）は、輪郭推定部１４が、学習モデルを使用して対象領域の輪郭を推定する処理について示した概念図である。
このうち、図４（ｂ）は、ここで使用する学習モデルである深層学習モデルを示している。深層学習モデルを作成するための学習データとして、ここでは、複数の画像についてその画像と輪郭の位置を表す画像のペアを利用し学習させる。学習させる画像は、例えば、曖昧な輪郭を有するものの画像である。これにより、曖昧な輪郭を学習させた学習モデルを作成する。深層学習モデルを作成するための深層学習モデルや学習方法は、特に限られるものではない。例えば、画像変換系タスクで使用されるエンコーダ・デコーダ構造を利用する方法、損失関数にｄｉｃｅｌｏｓｓを導入して学習させる精度を向上させる方法、Ｐｉｘ２ＰｉｘのようなＧＡＮ（Generative Adversarial Networks：敵対的生成ネットワーク）による画像生成などの方法を使用することができる。
また、図４（ａ）は、曖昧な輪郭を有するものが含まれる画像である。この場合、例えば、透明なコップＣｕを含む画像Ｇについて、学習データとして学習させた場合を示している。
そして、図４（ｃ）は、深層学習モデルを使用して、画像Ｇに含まれる透明なコップＣｕの輪郭を推定した結果を示している。
この場合、透明なコップＣｕの輪郭Ｅａを推定している。曖昧な輪郭を有するものとしては、具体的には、このようなコップのように透明なものの画像に対する輪郭を学習させる。 The contour estimator 14 estimates the contour of the target region. In the present embodiment, the contour estimator 14 estimates the contour of the target region based on a learning model that has learned the contour.
4A to 4C are conceptual diagrams showing the process of estimating the contour of the target area by the contour estimating section 14 using the learning model.
Among them, FIG. 4B shows a deep learning model which is a learning model used here. As learning data for creating a deep learning model, pairs of images and images representing the positions of contours of a plurality of images are used for learning. An image to be learned is, for example, an image having an ambiguous contour. In this way, a learning model that learns ambiguous contours is created. A deep learning model and a learning method for creating a deep learning model are not particularly limited. For example, a method of using the encoder-decoder structure used in image conversion tasks, a method of introducing dice loss into the loss function to improve the accuracy of learning, a GAN (Generative Adversarial Network) such as Pix2Pix ) can be used.
FIG. 4(a) is an image that includes an ambiguous contour. In this case, for example, an image G including a transparent cup Cu is learned as learning data.
And FIG.4(c) has shown the result of estimating the outline of transparent cup Cu contained in the image G using the deep learning model.
In this case, the contour Ea of the transparent cup Cu is estimated. For objects with ambiguous contours, specifically, contours for images of transparent objects such as cups are learned.

また、曖昧な輪郭を有するものとしては、透明なコップの他、例えば、透明なビニール袋などが挙げられる。
図５（ａ）～（ｂ）は、曖昧な輪郭を有するものとして、透明なビニール袋Ｆｕの例を示した例である。
図５（ａ）は、野菜であるねぎＮｅが、透明なビニール袋Ｆｕに入れられている画像Ｇを示している。この場合、対象領域の輪郭は、透明なビニール袋Ｆｕの輪郭Ｅｂとなる。また、図５（ｂ）は、野菜であるにんじんＮｉが、透明なビニール袋Ｆｕに入れられている画像Ｇを示している。この場合、対象領域の輪郭は、ビニール袋Ｆｕの輪郭Ｅｃとなる。 In addition to a transparent cup, for example, a transparent plastic bag can also be used as an object having an ambiguous outline.
FIGS. 5(a)-(b) are examples showing an example of a transparent plastic bag Fu as having an ambiguous outline.
FIG. 5(a) shows an image G in which green onion Ne, which is a vegetable, is placed in a transparent plastic bag Fu. In this case, the outline of the target area is the outline Eb of the transparent plastic bag Fu. FIG. 5(b) shows an image G in which carrots Ni, which are vegetables, are placed in a transparent plastic bag Fu. In this case, the outline of the target area is the outline Ec of the plastic bag Fu.

図６（ａ）～（ｂ）は、学習モデルにより推定した輪郭を活用せず、従来の領域分離方法で対象領域を切り出した結果を示している。
従来の切り抜き技術は、例えば、エッジ抽出や微分フィルタを用いた輪郭活用領域分離技術である。この場合、透明なビニール袋Ｆｕの輪郭Ｅｂ、Ｅｃが正確に抽出されておらず、輪郭Ｅｂ、Ｅｃでない部分が輪郭であるとされている。このように、従来の輪郭活用切り抜き技術では、誤った領域分離を行うことがある。 6(a) and 6(b) show the results of segmenting the target region by the conventional segmentation method without using the contour estimated by the learning model.
A conventional clipping technique is, for example, edge extraction or a contour utilization area separation technique using a differential filter. In this case, the contours Eb and Ec of the transparent plastic bag Fu are not accurately extracted, and the part other than the contours Eb and Ec is considered to be the contour. Thus, conventional contour-based cropping techniques may result in erroneous segmentation.

図７（ａ）～（ｂ）は、本実施の形態の深層学習モデルを使用して、輪郭を抽出した結果を示している。
図７（ａ）～（ｂ）では、輪郭Ｅｂ、Ｅｃのうち曖昧な箇所が抽出されている。つまり、図７（ａ）では、輪郭Ｅｂは、全て曖昧であるので、上述した学習モデルで、ほぼ全ての箇所が抽出されている。対して、図７（ｂ）では、輪郭Ｅｃは、ビニール袋ＦｕがにんじんＮｉと接触している箇所は、曖昧でなく明確であるので、上述した学習モデルでは、この部分は輪郭Ｅｃが抽出されていない。対して、他の部分は、輪郭Ｅｃは、曖昧であるので、上述した学習モデルで、この箇所が抽出されている。つまり、本実施の形態のように学習モデルを使用した方が、曖昧な輪郭について、より正確な輪郭を抽出しやすい。 FIGS. 7A and 7B show contour extraction results using the deep learning model of this embodiment.
In FIGS. 7A and 7B, ambiguous portions are extracted from the contours Eb and Ec. In other words, in FIG. 7A, since the contour Eb is all ambiguous, almost all points are extracted by the learning model described above. On the other hand, in FIG. 7(b), the contour Ec is clear and unambiguous where the plastic bag Fu is in contact with the carrot Ni. not On the other hand, since the contour Ec of other parts is ambiguous, this part is extracted by the learning model described above. In other words, using a learning model as in the present embodiment facilitates extraction of more accurate contours for ambiguous contours.

ただし、輪郭を推定するのに学習モデルを使用する場合に限られるものではない。
例えば、輪郭推定部１４は、ユーザにより入力された輪郭により輪郭を推定することもできる。
図８（ａ）～（ｂ）は、ユーザが輪郭を入力する場合について示した図である。
このうち図８（ａ）は、画像Ｇ中の対象領域を示し、対象領域が、透明なコップＣｕである場合を示している。
図８（ｂ）は、対象領域の輪郭をユーザが入力した場合を示している。輪郭Ｅａは、シードを入力する方法と同じ方法で入力することができる。即ち、ユーザの指やタッチペン等により対象領域の輪郭Ｅａをなぞりスワイプすることで、入力することができる。この結果は、ユーザ指示受付部１２を介し、輪郭推定部１４が取得する。 However, it is not limited to using a learning model to estimate contours.
For example, the contour estimator 14 can also estimate the contour from a contour input by the user.
FIGS. 8A and 8B are diagrams showing the case where the user inputs a contour.
Among them, FIG. 8(a) shows the target area in the image G, and shows the case where the target area is the transparent cup Cu.
FIG. 8B shows a case where the user inputs the outline of the target area. The contour Ea can be entered in the same manner as the seed is entered. That is, the input can be performed by tracing and swiping the contour Ea of the target area with the user's finger, touch pen, or the like. This result is obtained by the contour estimation unit 14 via the user instruction reception unit 12 .

また、図４で説明した学習モデルにより輪郭を推定する方法と、図８で説明したユーザによる輪郭の入力とを併用することもできる。
図９（ａ）～（ｃ）は、学習モデルにより輪郭を推定する方法と、ユーザによる輪郭の入力とを併用した場合について示している。
このうち、図９（ａ）は、画像Ｇに含まれる対象領域が、透明な鉢Ｈａである場合を示している。
また、図９（ｂ）は、深層学習モデルを使用して、輪郭Ｅｄを推定した結果を示している。この場合、鉢Ｈａの全ての輪郭Ｅｄが推定されず、一部が欠けている。
そして、図９（ｃ）は、ユーザが欠けた部分の輪郭を確認し、その部分をユーザが修正した場合を示している。即ち、図９（ｃ）の場合は、輪郭Ｅｄが推定されていない箇所をユーザが補充する。この場合、欠けている輪郭Ｅｄの両端を直線や曲線で結ぶようにしている。また、深層学習モデルにより推定された輪郭が誤っていた場合に、このような方法で、ユーザが修正するようにしてもよい。
図９（ｃ）の場合、ユーザによる修正の指示は、ユーザ指示受付部１２を介し、輪郭推定部１４が取得する。輪郭推定部１４は、ユーザの確認による輪郭Ｅｄの修正を受け付け、推定した対象領域の輪郭を修正する。 It is also possible to use both the method of estimating the contour using the learning model described with reference to FIG. 4 and the input of the contour by the user described with reference to FIG.
FIGS. 9A to 9C show a case where the method of estimating the contour by the learning model and the input of the contour by the user are used together.
Among them, FIG. 9A shows a case where the target area included in the image G is the transparent bowl Ha.
Also, FIG. 9B shows the result of estimating the contour Ed using the deep learning model. In this case, the contour Ed of the pot Ha is not estimated, and part of it is missing.
FIG. 9(c) shows a case where the user confirms the contour of the missing portion and corrects that portion. That is, in the case of FIG. 9C, the user supplements the portion where the contour Ed is not estimated. In this case, both ends of the missing outline Ed are connected by straight lines or curved lines. In addition, when the contour estimated by the deep learning model is incorrect, the user may correct it by such a method.
In the case of FIG. 9C , the user's correction instruction is obtained by the contour estimation unit 14 via the user instruction reception unit 12 . The contour estimation unit 14 receives correction of the contour Ed confirmed by the user, and corrects the estimated contour of the target region.

画像分離部１５は、代表位置と推定した輪郭とを用いて対象領域を求める。実際には、画像分離部１５は、代表位置と推定した輪郭とを用いて、原画像の中から、対象領域を切り出す処理を行う。 The image separation unit 15 obtains the target area using the representative position and the estimated contour. In practice, the image separating unit 15 uses the representative position and the estimated contour to perform a process of cutting out the target area from the original image.

画像分離部１５が、シードの情報を基にして対象領域を切り出すには、まずシードが描かれた箇所の画素に対しラベルを付加する。図３の例では、人物の領域に描かれた軌跡（シード１）に対応する画素に「ラベル１」を、人物以外の領域である背景領域に描かれた軌跡（シード２）に対応する画素に「ラベル２」を付加する。本実施の形態では、このようにラベルを付与することを「ラベル付け」と言う。 In order for the image separation unit 15 to cut out the target region based on the information of the seed, first, a label is added to the pixels where the seed is drawn. In the example of FIG. 3, "Label 1" is assigned to the pixels corresponding to the trajectory (seed 1) drawn in the area of the person, and the pixels corresponding to the trajectory (seed 2) drawn in the background area, which is the area other than the person. add "label 2" to . In the present embodiment, assigning a label in this way is referred to as "labeling".

そして、シードが描かれた画素と周辺の画素との間で画素値の近さを基に、近ければ連結し、遠ければ連結しない作業等を繰り返し、領域を拡張していく領域拡張方法により、対象領域を切り出していく。 Then, based on the closeness of the pixel values between the pixel on which the seed is drawn and the surrounding pixels, if they are close, they are connected, and if they are far, they are not connected. Cut out the target area.

図１０（ａ）～（ｃ）は、図３で示した画像Ｇについて、領域拡張方法により対象領域が切り出される様子を示している。
このうち図１０（ａ）は、図３で示した画像Ｇであり、シード１およびシード２が描かれた状態を示している。
そして、図１０（ｂ）で示すように、シード１の箇所から対象領域Ｓ１内に領域が拡張していく。また、シード２の箇所から対象領域以外の領域である背景領域Ｓ２内に領域が拡張していく。そして、図１０（ｃ）で示すように最終的に対象領域Ｓ１と背景領域Ｓ２とが確定する。 FIGS. 10A to 10C show how the target area is cut out from the image G shown in FIG. 3 by the area expansion method.
Among them, FIG. 10(a) is the image G shown in FIG. 3, showing a state in which seeds 1 and 2 are drawn.
Then, as shown in FIG. 10B, the region expands from the seed 1 into the target region S1. Also, the area expands from the seed 2 into the background area S2, which is an area other than the target area. Then, as shown in FIG. 10C, the target area S1 and the background area S2 are finally determined.

以上のような方法を採用することで、対象領域が複雑な形状であっても、ユーザは、より直感的に、また、より容易に対象領域Ｓ１が切り出せる。 By adopting the method described above, even if the target region has a complicated shape, the user can more intuitively and easily cut out the target region S1.

画像情報出力部１６は、対象領域の画像情報を出力する。つまり、切り出し後の対象領域Ｓ１の画像情報を出力する。これにより、表示装置２０に切り出し後の対象領域が表示される。 The image information output unit 16 outputs image information of the target area. That is, the image information of the target region S1 after clipping is output. As a result, the target area after clipping is displayed on the display device 20 .

図１１（ａ）～（ｂ）は、図５（ａ）～（ｂ）の画像に対し、本実施の形態の方法により対象領域が決定された場合を示している。
領域拡張方法とともに、図７（ａ）～（ｂ）で示した輪郭Ｅｂ、Ｅｃの情報を加えることで、図１１（ａ）～（ｂ）の場合、対象領域Ｓ１が正確に決定されていることがわかる。実際には、図１１（ａ）～（ｂ）の画像をマスクとし、原画像に適用することで対象領域Ｓ１を切り出すことができる。例えば、決定された対象領域Ｓ１の箇所を「１」とし、他の箇所を「０」としてマスク画像として原画像に当て嵌め、「１」の箇所を対象領域Ｓ１とする。 FIGS. 11(a) and 11(b) show cases where the target region is determined by the method of the present embodiment for the images of FIGS. 5(a) and 5(b).
By adding the information of the contours Eb and Ec shown in FIGS. 7A and 7B together with the region expansion method, the target region S1 is accurately determined in the case of FIGS. 11A and 11B. I understand. Actually, the target area S1 can be cut out by using the images of FIGS. 11A and 11B as masks and applying them to the original image. For example, a portion of the determined target region S1 is set to "1" and other portions are set to "0", and a mask image is applied to the original image, and a portion of "1" is set to the target region S1.

＜画像分離部１５の詳細説明＞
以下、画像分離部１５が、領域拡張方法により対象領域を求める方法を、第１の実施形態～第３の実施形態により説明する。 <Detailed Description of Image Separating Unit 15>
A method for obtaining a target area by the image separation unit 15 by the area expansion method will be described below with reference to the first to third embodiments.

［第１の実施形態］
第１の実施形態では、画像分離部１５は、代表位置と推定した輪郭とを用いて、画素値の差分計算により対象領域を求める。この差分計算は、例えば、代表位置の画素値を基に求められる画素値の差分と、輪郭を表す境界特徴の差分とをともに利用したユークリッド距離の計算である。ここで、「境界特徴」とは、境界であることを示す特徴量である。例えば、一の画素が、境界に位置することを示す確率を多値で示した数値である。例えば、画素値が、８ｂｉｔであり、０以上２５５以下の整数値として表される場合、これに合わせ、境界であることを示す確率を０以上２５５以下の整数値として表す。この場合、０が境界に位置する確率が最も低く、２５５が境界に位置する確率が最も高い。また、画素値が、例えば、０以上１以下で正規化した数値で表される場合、境界特徴も０以上１以下の数値で表すことができる。 [First embodiment]
In the first embodiment, the image separating unit 15 uses the representative position and the estimated contour to find the target area by calculating the difference between the pixel values. This difference calculation is, for example, calculation of Euclidean distance using both the difference in pixel values obtained based on the pixel values of the representative positions and the difference in boundary features representing the contour. Here, the "boundary feature" is a feature quantity indicating a boundary. For example, it is a numerical value indicating the probability that one pixel is located on the boundary. For example, when the pixel value is 8 bits and is represented as an integer value of 0 to 255, the probability indicating the boundary is represented by an integer value of 0 to 255 accordingly. In this case, 0 is the least likely to lie on the boundary and 255 is the most likely to lie on the boundary. Also, if the pixel value is represented by a normalized numerical value between 0 and 1, for example, the boundary feature can also be represented by a numerical value between 0 and 1.

以下、本実施の形態が適用される領域拡張方法について詳しく説明する。
画像分離部１５は、対象領域に属する画素に基準画素であるシード１を設定する。また、背景領域に属する画素に基準画素であるシード２を設定する。そして、シードが与えられた画素にラベル（上述したラベル１、ラベル２）を設ける。そして、シードが与えられた画素には強さ１を設定しておき、シードが与えられた画素から、まだシードが与えられていない画素に対して、強さを伝搬していきながら、かつ、強さ同士を比較しながら強い方のラベルが採用されていく方法である。この方法によれば、対象領域と背景領域とのそれぞれに与えられたシードからそれぞれのラベルを有する画素がその領域を拡張していき、最終的に対象領域と背景領域とに分離される。 A region expansion method to which the present embodiment is applied will be described in detail below.
The image separation unit 15 sets seed 1, which is a reference pixel, to pixels belonging to the target region. Also, a seed 2, which is a reference pixel, is set to a pixel belonging to the background area. Then, labels (label 1 and label 2 described above) are provided to the pixels to which the seeds have been applied. Then, a strength of 1 is set to the seeded pixels, and the strength is propagated from the seeded pixels to the unseeded pixels, and This is a method of comparing strengths and adopting the stronger label. According to this method, pixels having respective labels extend from seeds given to the target region and the background region, respectively, and finally the target region and the background region are separated.

このとき１つの画素から隣接する画素への強さの影響度として加重を考える。そして、例えば、この１つの画素から隣接する画素へ強さを伝搬する際には、１つの画素の持つ強さと加重を乗じ、乗じた値が隣接画素の強さになるように行うことを基本とする。このとき「強さ」は、ラベルに対応する対象領域や背景領域に属する強さであり、ある画素がラベルに対応する対象領域や背景領域に属する可能性の大きさを表す。強さが大きいほどその画素がラベルに対応する対象領域や背景領域に属する可能性が高く、強さが小さいほどその画素がラベルに対応する対象領域や背景領域に属する可能性が低い。
また、「加重」については、次のように考えることができる。 At this time, the weight is considered as the degree of influence of strength from one pixel to an adjacent pixel. Then, for example, when propagating the intensity from one pixel to an adjacent pixel, the intensity of the one pixel is multiplied by the weight, and the multiplied value becomes the intensity of the adjacent pixel. and At this time, the "strength" is the strength of belonging to the target region or background region corresponding to the label, and represents the degree of possibility that a certain pixel belongs to the target region or background region corresponding to the label. The higher the intensity, the higher the probability that the pixel belongs to the target or background region corresponding to the label, and the lower the intensity, the lower the probability that the pixel belongs to the target or background region corresponding to the label.
Also, "weight" can be considered as follows.

図１２（ａ）～（ｂ）は、加重について説明した図である。
図１２（ａ）では、対象画素Ｔに対して加重を決定する隣接画素Ｒを示している。この場合、隣接画素Ｒは、対象画素Ｔに隣接する８画素である。そして、加重は、原画像の画素情報を用いて決定される。つまり、対象画素Ｔに対する隣接画素Ｒへの加重は、画素値が近いものほどより大きく、画素値が遠いものほどより小さくなるように決められる。画素値が近いか否かは、例えば、画素値（例えば、ＲＧＢ値）および境界特徴のユークリッド距離などを使用して決めることができる。 FIGS. 12(a) and 12(b) are diagrams explaining weighting.
In FIG. 12(a), adjacent pixels R that determine the weight for the target pixel T are shown. In this case, the adjacent pixels R are eight pixels adjacent to the target pixel T. FIG. The weights are then determined using the pixel information of the original image. That is, the weighting of the neighboring pixels R to the target pixel T is determined such that the closer the pixel values are, the larger the weight is, and the farther the pixel values are, the smaller the weighting is. Whether pixel values are close can be determined using, for example, pixel values (eg, RGB values) and Euclidean distances of boundary features.

例えば、対象画素Ｔの画素値や境界特徴の値をＣ_ｎとし、隣接画素Ｒの画素値や境界特徴の値をＣ_ｃとする。このとき、対象画素Ｔと隣接画素Ｒとのユークリッド距離ｄは、下記数１式で定義できる。なお、下記数１式で、（Ｃ_ｎ－Ｃ_ｃ）_Ｒは、対象画素ＴのＲ値と隣接画素ＲのＲ値との差であり、（Ｃ_ｎ－Ｃ_ｃ）_Ｇは、対象画素ＴのＧ値と隣接画素ＲのＧ値との差である。また、下記数１式で、（Ｃ_ｎ－Ｃ_ｃ）_Ｂは、対象画素ＴのＢ値と隣接画素ＲのＢ値との差であり、（Ｃ_ｎ－Ｃ_ｃ）_{Ｂｏｕｎｄａｒｙ}は、対象画素Ｔの境界特徴の値と隣接画素Ｒの境界特徴の値との差である。 For example, let _Cn be the pixel value and boundary feature value of the target pixel T, and _Cc be the pixel value and boundary feature value of the adjacent pixel R. At this time, the Euclidean distance d between the target pixel T and the adjacent pixel R can be defined by Equation 1 below. Note that, in Equation 1 below, (C _n −C _c ) _R is the difference between the R value of the target pixel T and the R value of the adjacent pixel R, and (C _n −C _c ) _G is the target pixel T and the G value of the adjacent pixel R. Further, in the following formula 1, (C _n −C _c ) _B is the difference between the B value of the target pixel T and the B value of the adjacent pixel R, and (C _n −C _c ) _Boundary is the target pixel T is the difference between the value of the boundary feature of R and the value of the boundary feature of the neighboring pixel R.

上記数１式のユークリッド距離ｄの代わりに下記数２式に示したＹＣｂＣｒ値を使用したユークリッド距離ｄ^ｗを考えてもよい。数２式は、対象画素Ｔの画素値や境界特徴の値をＣ_ｎとし、隣接画素Ｒの画素値や境界特徴の値をＣ_ｃとする。このとき、対象画素Ｔと隣接画素Ｒとのユークリッド距離ｄ^ｗは、下記数２式で定義できる。なお、下記数２式で、（Ｃ_ｎ－Ｃ_ｃ）_Ｙは、対象画素ＴのＹ値と隣接画素ＲのＹ値との差であり、（Ｃ_ｎ－Ｃ_ｃ）_Ｃｂは、対象画素ＴのＣｂ値と隣接画素ＲのＣｂ値との差である。また、下記数２式で、（Ｃ_ｎ－Ｃ_ｃ）_Ｃｒは、対象画素ＴのＣｒ値と隣接画素ＲのＣｒ値との差であり、（Ｃ_ｎ－Ｃ_ｃ）_{Ｂｏｕｎｄａｒｙ}は、対象画素Ｔの境界特徴の値と隣接画素Ｒの境界特徴の値との差である。
また、数２式のユークリッド距離ｄ^ｗは、重み係数Ｗ_Ｙ、Ｗ_Ｃｂ、Ｗ_Ｃｒ、Ｗ_{Ｂｏｕｎｄａｒｙ}を使用した重みづけユークリッド距離となっている。
なお、ＹＣｂＣｒ値の代わりに、Ｌ^＊ａ^＊ｂ^＊値やＨＳＶ値などを使用してもよい。 The Euclidean distance dw using the YCbCr values shown in the following equation ⁽ 2) instead of the Euclidean distance d in the equation (1) may be considered. In Equation 2, let _Cn be the pixel value of the target pixel T and the value of the boundary feature, and _Cc be the pixel value of the adjacent pixel R and the value of the border feature. At this time, the Euclidean distance ^dw between the target pixel T and the adjacent pixel R can be defined by the following formula (2). In Equation 2 below, (C _n −C _c ) _Y is the difference between the Y value of the target pixel T and the Y value of the adjacent pixel R, and (C _n −C _c ) _Cb is the target pixel T is the difference between the Cb value of R and the Cb value of the adjacent pixel R. Further, in Equation 2 below, (C _n −C _c ) _Cr is the difference between the Cr value of the target pixel T and the Cr value of the adjacent pixel R, and (C _n −C _c ) _Boundary is the target pixel T is the difference between the value of the boundary feature of R and the value of the boundary feature of the neighboring pixel R.
Also, the Euclidean distance ^dw in Expression 2 is a weighted Euclidean distance using weighting coefficients _WY , _WCb , _WCr , and _WBoundary .
Note that L ^* a ^* b ^* values, HSV values, or the like may be used instead of the YCbCr values.

さらに、画素値は、３成分からなるものに限定されるものではない。例えば、ｎ次元色空間を使用し、ｎ個の色成分によるユークリッド距離ｄ^ｗを考えてもよい。
例えば、下記数３式は、色成分が、Ｘ_１、Ｘ_２、…、Ｘ_ｎである場合である。数３式は、対象画素Ｔの画素値や境界特徴の値をＣ_ｎとし、隣接画素Ｒの画素値や境界特徴の値をＣ_ｃとする。このとき、対象画素Ｔと隣接画素Ｒとのユークリッド距離ｄ^ｗは、下記数３式で定義できる。なお、下記数３式で、（Ｃ_ｎ－Ｃ_ｃ）_Ｘ１は、対象画素ＴのＸ_１値と隣接画素ＲのＸ_１値との差であり、（Ｃ_ｎ－Ｃ_ｃ）_Ｘ２は、対象画素ＴのＸ_２値と隣接画素ＲのＸ_２値との差である。また、下記数３式で、（Ｃ_ｎ－Ｃ_ｃ）_Ｘｎは、対象画素ＴのＸ_ｎ値と隣接画素ＲのＸ_ｎ値との差であり、（Ｃ_ｎ－Ｃ_ｃ）_{Ｂｏｕｎｄａｒｙ}は、対象画素Ｔの境界特徴の値と隣接画素Ｒの境界特徴の値との差である。なお数３式のユークリッド距離ｄ^ｗも重み係数Ｗ_Ｘ１、Ｗ_Ｘ２、…、Ｗ_Ｘｎ、Ｗ_{Ｂｏｕｎｄａｒｙ}を使用した重みづけユークリッド距離となっている。 Furthermore, pixel values are not limited to three components. For example, using an n-dimensional color space, one may consider the Euclidean distance d ^w by n color components.
For example, Equation 3 below is for the case where the color components are X ₁ , X ₂ , . . . , X _n . In Expression 3, _Cn is the pixel value of the target pixel T and the value of the boundary feature, and _Cc is the pixel value of the adjacent pixel R and the value of the boundary feature. At this time, the Euclidean distance ^dw between the target pixel T and the adjacent pixel R can be defined by the following formula (3). In Equation 3, (C _n −C _c ) _X1 is the difference between the X ₁ value of the target pixel T and the X ₁ value of the adjacent pixel R, and (C _n −C _c ) _X2 is the difference between the target pixel T and the X 1 value of the adjacent pixel R. It is the difference between the _X2 value of pixel T and the _X2 value of neighboring pixel R. Further, in Equation 3 below, (C _n −C _c ) _{X n} is the difference between the X _n value of the target pixel T and the X _n value of the adjacent pixel R, and (C _n −C _c ) _Boundary is the target It is the difference between the value of the boundary feature of pixel T and the value of the boundary feature of neighboring pixel R. Note that the Euclidean distance ^dw in Expression 3 is also a weighted Euclidean distance using weighting coefficients _WX1 , _WX2 , . . . , _WXn , and _WBoundary .

図１２（ｂ）では、対象画素Ｔに対して決められる加重の大きさを図示している。ここでは、対象画素Ｔに対して決められる加重がより大きい隣接画素Ｒについては、より太い線ＬＦで示し、対象画素Ｔに対して決められる加重がより小さい隣接画素Ｒについては、より細い線ＬＨで示している。 In FIG. 12B, the magnitude of the weight determined for the target pixel T is illustrated. Here, adjacent pixels R with a higher weight determined for the target pixel T are indicated by a thicker line LF, and adjacent pixels R with a lower weight determined for the target pixel T are indicated by a thinner line LH. is shown.

なお、ユークリッド距離ｄから加重を決定するのは、具体的には以下の方法で行っている。
図１３（ａ）～（ｂ）は、加重を決定する方法について示した図である。図１３（ａ）～（ｂ）において、横軸は、ユークリッド距離ｄを表し、縦軸は、加重を表す。
このユークリッド距離ｄは、強さを与えられた画素とその画素の周辺に位置する画素との間で決まる画素値のユークリッド距離ｄである。そして、例えば、図１３（ａ）に図示するように非線形の単調減少関数ｆ１を定め、ユークリッド距離ｄに対し、この単調減少関数ｆ１により決まる値を加重とする。
つまり、ユークリッド距離ｄが小さいほど、加重はより大きくなり、ユークリッド距離ｄが大きいほど、加重はより小さくなる。
なお、単調減少関数は、図１３（ａ）のような形状のものに限られるものではなく、単調減少関数であれば特に限られるものではない。よって、図１３（ｂ）のような線形の単調減少関数ｆ２であってもよい。また、ユークリッド距離ｄの特定の範囲で線形であり、他の範囲で非線形であるような区分線形の単調減少関数であってもよい。なお、図１３（ａ）～（ｂ）で、横軸が、重みづけユークリッド距離ｄ^ｗであってもよいのはもちろんである。 It should be noted that determining the weight from the Euclidean distance d is specifically performed by the following method.
FIGS. 13(a) and 13(b) are diagrams showing a method of determining weights. In FIGS. 13A and 13B, the horizontal axis represents the Euclidean distance d, and the vertical axis represents the weight.
This Euclidean distance d is the Euclidean distance d of the pixel values determined between the pixel given the intensity and the pixels located in the vicinity of that pixel. Then, for example, as shown in FIG. 13A, a nonlinear monotonically decreasing function f1 is determined, and the Euclidean distance d is weighted by the value determined by this monotonically decreasing function f1.
That is, the smaller the Euclidean distance d, the larger the weighting, and the larger the Euclidean distance d, the smaller the weighting.
Note that the monotonically decreasing function is not limited to the shape shown in FIG. 13A, and is not particularly limited as long as it is a monotonically decreasing function. Therefore, it may be a linear monotonically decreasing function f2 as shown in FIG. 13(b). It may also be a piecewise linear, monotonically decreasing function that is linear in a certain range of the Euclidean distance d and nonlinear in other ranges. Of course, in FIGS. 13(a) and 13(b), the horizontal axis may be the weighted Euclidean distance ^dw .

以上のように、強さの伝搬と強さの比較により、結果的に「ラベル」が伝搬され領域を分離することができる。この場合、ラベルと強さが伝搬し、領域の切り分けを行うと考えることができる。
これは、代表位置であるシードを用いて対象領域を求めるときに、代表位置に含まれる画素の強さおよびシードに含まれる画素の周囲に及ぼす加重に基づいて周囲の画素が対象領域に含まれるか否かを表すラベルを決定し、ラベルが決定した画素を新たな起点の画素としてさらにこの決定を行うことを繰り返すことで周囲の画素のラベルを予測し、対象領域を求める、と言うこともできる。 As described above, strength propagation and strength comparison result in the propagation of "labels" to separate regions. In this case, it can be considered that the label and strength are propagated to perform region segmentation.
This is because when a target region is obtained using a seed, which is a representative position, the surrounding pixels are included in the target region based on the intensity of the pixels included in the representative position and the weight applied to the surroundings of the pixels included in the seed. It is also possible to predict the labels of the surrounding pixels by repeating this determination with the pixel with the determined label as the pixel of the new starting point, and to find the target area. can.

［第２の実施形態］
第２の実施形態では、画像分離部１５は、領域拡張方法により対象領域を求めるときに、代表位置を基に仮の対象領域をまず求め、推定した輪郭により仮の対象領域を補正する。そしてこれにより、画像分離部１５は、最終的な対象領域を求める。
仮の対象領域を求めるには、第１の実施形態で説明した領域拡張方法を使用する。ただし、第２の実施形態では、境界特徴を用いず、画素値のユークリッド距離だけを使用する。 [Second embodiment]
In the second embodiment, the image separating unit 15 first obtains a provisional target region based on the representative position when obtaining the target region by the region expansion method, and corrects the provisional target region using the estimated contour. And thereby, the image separation part 15 calculates|requires a final object area|region.
To find the temporary target area, the area growing method described in the first embodiment is used. However, in the second embodiment, only the Euclidean distance of pixel values is used without using boundary features.

例えば、対象画素Ｔの画素値や境界特徴の値をＣ_ｎとし、隣接画素Ｒの画素値や境界特徴の値をＣ_ｃとする。このとき、対象画素Ｔと隣接画素Ｒとのユークリッド距離ｄは、下記数４式で定義できる。なお、下記数４式で、（Ｃ_ｎ－Ｃ_ｃ）_Ｒは、対象画素ＴのＲ値と隣接画素ＲのＲ値との差であり、（Ｃ_ｎ－Ｃ_ｃ）_Ｇは、対象画素ＴのＧ値と隣接画素ＲのＧ値との差である。また、下記数４式で、（Ｃ_ｎ－Ｃ_ｃ）_Ｂは、対象画素ＴのＢ値と隣接画素ＲのＢ値との差である。つまり、数４式では、上記数１式と比較して、境界特徴の項がない。 For example, let _Cn be the pixel value and boundary feature value of the target pixel T, and _Cc be the pixel value and boundary feature value of the adjacent pixel R. At this time, the Euclidean distance d between the target pixel T and the adjacent pixel R can be defined by Equation 4 below. In Equation 4 below, (C _n −C _c ) _R is the difference between the R value of the target pixel T and the R value of the adjacent pixel R, and (C _n −C _c ) _G is the target pixel T and the G value of the adjacent pixel R. Also, (C _n −C _c ) _B is the difference between the B value of the target pixel T and the B value of the adjacent pixel R in Equation 4 below. That is, in Equation 4, there is no boundary feature term compared to Equation 1 above.

なお、下記数５式のように、ＹＣｂＣｒ値を用いたユークリッド距離ｄ^ｗを考えてもよい。また、下記数６式のように、ｎ次元色空間を使用し、ｎ個の色成分によるユークリッド距離ｄ^ｗを考えてもよい。この場合も、それぞれ上記数２式、数３式と比較して、境界特徴の項がない。なお、画素値の近さを算出する方法は、ユークリッド距離に限られるものではなく、マンハッタン距離など任意の距離関数を使用してもよい。 Note that the Euclidean distance ^dw using the YCbCr values may be considered as in the following equation (5). Also, as in Equation 6 below, an n-dimensional color space may be used and the Euclidean distance ^dw by n color components may be considered. In this case also, there is no boundary feature term compared to the above Equations 2 and 3, respectively. Note that the method of calculating the closeness of pixel values is not limited to the Euclidean distance, and any distance function such as the Manhattan distance may be used.

図１４（ａ）～（ｃ）は、第２の実施形態の方法で対象領域Ｓ１を切り出す様子を示した図である。
このうち、図１４（ａ）は、原画像を示している。ここでは、原画像は、透明なコップＣｕの画像であり、対象領域Ｓ１としてこのコップＣｕの箇所を切り出す場合について示している。
図１４（ｂ）は、代表位置を基に仮の対象領域Ｓ１’を求めた場合を示す。図１４（ｂ）では、透明なコップＣｕの箇所のみならず、この箇所からはみ出た部分についても仮の対象領域Ｓ１’となっていることがわかる。
図１４（ｃ）は、図１４（ｂ）で求めた仮の対象領域Ｓ１’を補正した場合を示す。
つまり、推定した輪郭Ｅａの箇所が図１４（ｂ）で切り出した仮の対象領域Ｓ１’に適用される。この場合、仮の対象領域Ｓ１’のうち、推定した輪郭より内側の領域を対象領域Ｓ１に入れ、推定した輪郭より外側の領域を対象領域Ｓ１から外す。その結果、対象領域Ｓ１として、透明なコップＣｕの領域が切り出されていることがわかる。 FIGS. 14A to 14C are diagrams showing how the target region S1 is cut out by the method of the second embodiment.
Of these, FIG. 14(a) shows the original image. Here, the original image is an image of a transparent cup Cu, and the case where the portion of this cup Cu is cut out as the target region S1 is shown.
FIG. 14(b) shows a case where a temporary target area S1' is obtained based on the representative position. In FIG. 14B, it can be seen that not only the portion of the transparent cup Cu but also the portion protruding from this portion is the provisional target region S1'.
FIG. 14(c) shows a case where the provisional target region S1' obtained in FIG. 14(b) is corrected.
That is, the part of the estimated contour Ea is applied to the provisional target region S1′ cut out in FIG. 14(b). In this case, of the temporary target region S1', the region inside the estimated contour is included in the target region S1, and the region outside the estimated contour is excluded from the target region S1. As a result, it can be seen that the region of the transparent cup Cu is cut out as the target region S1.

［第３の実施形態］
第３の実施形態では、画像分離部１５は、領域拡張方法により対象領域を求めるときに、画像の画素値と輪郭を表す境界特徴との要素積を利用して行う。要素積としては、例えば、内積である。
具体的には、対象画素Ｔや隣接画素Ｒの画素値をＩ、境界特徴の値をＢとし、これらについて、下記数７式による要素積を考える。なおここで、「〇」は、要素積を表し、λは、どの程度境界特徴を反映させるかを表すパラメータであり、定数である。即ち、対象領域を求めるときに、λにより境界特徴を反映する割合を調整する。また、境界特徴は、予め０以上１以下の値に正規化を行う。 [Third embodiment]
In the third embodiment, the image separation unit 15 uses the element product of the pixel values of the image and the boundary feature representing the contour when obtaining the target area by the area expansion method. The element product is, for example, an inner product.
Specifically, let I be the pixel value of the target pixel T and the adjacent pixel R, and let B be the value of the boundary feature. Here, "o" represents an element product, and λ is a parameter representing how much the boundary feature is reflected, and is a constant. That is, when obtaining the target area, the ratio of reflecting the boundary feature is adjusted by λ. Also, the boundary feature is normalized in advance to a value between 0 and 1 inclusive.

そして、画像分離部１５は、この結果を、対象画素Ｔや隣接画素Ｒの画素値としてみなし、第２の実施形態で説明した数４式～数６式等を用いることで、対象領域を求める。 Then, the image separating unit 15 regards this result as the pixel values of the target pixel T and the adjacent pixel R, and calculates the target region by using Equations 4 to 6 described in the second embodiment. .

また、数７式で示した要素積の他に、画像分離部１５は、画像の画素値と輪郭を表す境界特徴との和を利用して行うようにしてもよい。この場合、対象画素Ｔや隣接画素Ｒの画素値をＩ、境界特徴の値をＢとし、これらについて、下記数８式による和を考える。λは、数７式と同様であり、どの程度境界特徴を反映させるかを表すパラメータである。 In addition to the element product shown in Equation 7, the image separation unit 15 may use the sum of the pixel value of the image and the boundary feature representing the contour. In this case, let I be the pixel value of the target pixel T and the adjacent pixel R, and let B be the value of the boundary feature. λ is the same as Equation 7, and is a parameter representing how much the boundary feature is reflected.

＜画像処理システム１の動作の説明＞
次に、画像処理システム１の動作の説明を行う。
図１５は、本実施形態における画像処理装置の動作について説明したフローチャートである。
まず、画像情報取得部１１が、画像処理を行う原画像の画像情報としてＲＧＢデータを取得する（ステップ１０１）。
次に、ユーザの指示として、対象領域や背景領域の代表位置を示すシードを入力する（ステップ１０２）。これは、上述したように、ユーザが表示装置２０に表示された原画像を見ながら、入力装置３０を操作し、線や点を描くことで行う。原画像中のシードの位置は、ユーザ指示受付部１２が取得し、シード設定部１３は、このシードの位置を設定する（ステップ１０３）。
また、輪郭推定部１４が、深層学習モデルを使用して、対象領域の輪郭を推定する（ステップ１０４）。
そして、画像分離部１５が、シードと推定した輪郭とを用いて対象領域を求める（ステップ１０５）。これは、領域拡張法を用いることで、上述した第１の実施形態～第３の実施形態のようにして行う。
求められた対象領域は、画像情報出力部１６から出力され、表示装置２０に表示される（ステップ１０６）。 <Description of the operation of the image processing system 1>
Next, the operation of the image processing system 1 will be described.
FIG. 15 is a flow chart describing the operation of the image processing apparatus according to this embodiment.
First, the image information acquisition unit 11 acquires RGB data as image information of an original image to be subjected to image processing (step 101).
Next, as an instruction from the user, a seed indicating representative positions of the target area and the background area is input (step 102). As described above, the user operates the input device 30 while viewing the original image displayed on the display device 20 to draw lines and points. The position of the seed in the original image is obtained by the user instruction reception unit 12, and the seed setting unit 13 sets the position of this seed (step 103).
Also, the contour estimator 14 uses a deep learning model to estimate the contour of the target region (step 104).
Then, the image separating unit 15 uses the seed and the estimated contour to obtain the target area (step 105). This is done by using the region growing method as in the first to third embodiments described above.
The obtained target area is output from the image information output unit 16 and displayed on the display device 20 (step 106).

以上説明した形態によれば、対象領域と背景領域との境界が曖昧である場合でも、対象領域を背景領域と区別して分離することができる。つまり、領域拡張法では、対象領域の輪郭が明確であるときは、対象領域を正確に切り出すことができる。また、学習モデルにより輪郭を推定する方法によれば、図７で示したように、対象領域の輪郭が曖昧であるときに、輪郭をより正確に抽出することができる。よって、この双方の方法を補完的に使用することで、対象領域と背景領域との境界に曖昧な箇所がある場合でも、対象領域を背景領域と区別して分離することができる。 According to the embodiment described above, even when the boundary between the target region and the background region is ambiguous, the target region can be distinguished and separated from the background region. In other words, in the region growing method, when the outline of the target region is clear, the target region can be accurately cut out. Further, according to the method of estimating the contour using the learning model, as shown in FIG. 7, when the contour of the target area is ambiguous, the contour can be extracted more accurately. Therefore, by using both of these methods in a complementary manner, it is possible to distinguish and separate the target region from the background region even when there is an ambiguous boundary between the target region and the background region.

また、ユーザは、表示装置２０を見ながらインタラクティブに画像処理を行うことができ、より直感的に、また、より容易に対象領域の切り出しの作業を行える。
例えば、表示装置２０に表示される画像の中で対象領域の輪郭をユーザが確認して画像処理を行うことができる。また、切り出した後の対象領域または背景領域に対し、さらに、画像処理を行うこともできる。この画像処理は、特に限られるものではないが、例えば、対象領域または背景領域の輝度や色を調整する処理である。そして、このときも、表示装置２０に表示される画像を見ながら、調整後の画像をユーザが確認して画像処理を行うことができる。 In addition, the user can perform image processing interactively while looking at the display device 20, and can more intuitively and easily cut out the target area.
For example, the user can confirm the outline of the target area in the image displayed on the display device 20 and perform image processing. Further, image processing can be performed on the target region or the background region after clipping. Although this image processing is not particularly limited, it is, for example, processing for adjusting the brightness and color of the target region or background region. Also at this time, while viewing the image displayed on the display device 20, the user can confirm the adjusted image and perform the image processing.

また、以上説明した形態では、領域拡張法により対象領域を切り出したが、領域拡張の具体的な方法については、特に限られるものではない。例えば、グラフカットにより行ってもよい。
さらに、以上説明した形態では、図７（ｂ）で示したように、深層学習モデルを使用して、曖昧な輪郭をより強く推測する例を示し、明確な輪郭については弱く推測する場合を示したが、明確な輪郭についても推測してもよい。これは、学習データセットのアノテーション方法の工夫により実現できる。
さらに、以上説明した形態では、原画像に対し、１つの対象領域を切り出す場合を説明したが、対象領域は、複数であってもよい。また、背景領域が複数になってもよい。 Further, in the embodiment described above, the target area is cut out by the area expansion method, but the specific method of area expansion is not particularly limited. For example, graph cutting may be used.
Furthermore, in the embodiment described above, as shown in FIG. 7B, a deep learning model is used to more strongly infer an ambiguous contour, and a case of weakly inferring a clear contour is shown. However, it is also possible to infer clear contours. This can be achieved by devising an annotation method for the training data set.
Furthermore, in the embodiment described above, the case where one target area is cut out from the original image has been described, but there may be a plurality of target areas. Also, a plurality of background areas may be provided.

図１６は、対象領域が２つの場合について示した図である。
図１６では、原画像として、図５（ｂ）と同様に、野菜であるにんじんＮｉが、透明なビニール袋Ｆｕに入れられている画像を示している。また、図５（ｂ）とは、異なる点として、「ヤサイ」として示すラベルの画像が図中左上に加わっている。そして、ユーザは、対象領域として、透明なビニール袋Ｆｕに入れられているにんじんと、「ヤサイ」として示すラベルを選択した場合を示している。この場合、例えば、透明なビニール袋Ｆｕに入れられているにんじんにシード１を入力し、「ヤサイ」として示すラベルにシード２を入力する。そして、他の部分である背景領域にシード３を入力する。後は、上述した領域拡張法および輪郭を推定する方法を併用して、２つの対象領域を切り出すことができる。 FIG. 16 is a diagram showing a case where there are two target areas.
FIG. 16 shows, as an original image, an image in which carrots Ni, which are vegetables, are placed in a transparent plastic bag Fu, as in FIG. 5B. Also, as a different point from FIG. 5(b), an image of a label indicated as "Yasai" is added to the upper left of the figure. Then, the user selects a carrot placed in a transparent plastic bag Fu and a label indicated as "Yasai" as the target area. In this case, for example, the carrot contained in a transparent plastic bag Fu is entered with Seed 1, and the label shown as "Yasai" is entered with Seed 2. Then, the seed 3 is input to the other portion, the background area. Afterwards, the two target regions can be cut out using both the region growing method and the method of estimating the contour described above.

＜プログラムの説明＞
ここで、以上説明を行った本実施の形態における画像処理装置１０が行う処理は、例えば、アプリケーションソフトウェア等のプログラムとして用意される。 <Explanation of the program>
Here, the processing performed by the image processing apparatus 10 according to the present embodiment described above is prepared as a program such as application software, for example.

よって、本実施の形態で、画像処理装置１０が行う処理は、コンピュータに、ユーザが入力した位置であり、画像中で分離する対象となる領域である対象領域の代表位置を取得する機能と、対象領域の輪郭を推定する機能と、代表位置と推定した輪郭とを用いて対象領域を求める機能と、を実現させるためのプログラムとして捉えることもできる。 Therefore, in the present embodiment, the processing performed by the image processing apparatus 10 is a function of obtaining a representative position of a target region, which is a position input by the user and is a region to be separated in the image, to the computer, It can also be regarded as a program for realizing a function of estimating the contour of the target area and a function of obtaining the target area using the representative position and the estimated contour.

なお、本実施の形態を実現するプログラムは、通信手段により提供することはもちろん、ＣＤ－ＲＯＭ等の記録媒体に格納して提供することも可能である。 It should be noted that the program that implements the present embodiment can be provided not only by communication means, but also by being stored in a recording medium such as a CD-ROM.

以上、本実施の形態について説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、種々の変更または改良を加えたものも、本発明の技術的範囲に含まれることは、特許請求の範囲の記載から明らかである。 Although the present embodiment has been described above, the technical scope of the present invention is not limited to the range described in the above embodiment. It is clear from the scope of claims that various modifications and improvements to the above embodiment are also included in the technical scope of the present invention.

１…画像処理システム、１０…画像処理装置、１１…画像情報取得部、１２…ユーザ指示受付部、１３…シード設定部、１４…輪郭推定部、１５…画像分離部、１６…画像情報出力部、２０…表示装置、３０…入力装置、Ｇ…画像、Ｓ１…対象領域、Ｓ２…背景領域 REFERENCE SIGNS LIST 1 image processing system 10 image processing apparatus 11 image information acquisition unit 12 user instruction reception unit 13 seed setting unit 14 contour estimation unit 15 image separation unit 16 image information output unit , 20... display device, 30... input device, G... image, S1... target area, S2... background area

Claims

with a processor
The processor
Acquiring a representative position of the target region, which is the position input by the user and is the region to be separated in the image,
estimating a contour of the region of interest;
An image processing device that obtains the target area using the representative position and the estimated contour.

2. The image processing apparatus according to claim 1, wherein the contour of the target area is estimated based on a learning model that has learned the contour.

3. The image processing apparatus according to claim 2, wherein the learning model is obtained by learning ambiguous contours.

3. The image processing apparatus according to claim 2, wherein the processor accepts contour correction confirmed by a user and corrects the estimated contour of the target region.

5. The image processing apparatus according to claim 4, wherein the correction is performed by the user supplementing portions where contours have not been estimated.

2. The image processing apparatus according to claim 1, wherein when the target area is obtained using the representative position and the estimated contour, difference calculation of pixel values is performed.

7. The image processing apparatus according to claim 6, wherein the difference calculation is a calculation of a distance function using both a pixel value difference obtained based on the pixel value of the representative position and a boundary feature difference representing a contour.

2. The method according to claim 1, wherein when the target area is obtained using the representative position and the estimated contour, a temporary target area is obtained based on the representative position, and the temporary target area is corrected by the estimated contour. Image processing device.

9. The image processing apparatus according to claim 8, wherein the correction includes, in the temporary target area, an area inside the estimated contour to be included in the target area and an area outside the estimated outline to be excluded from the target area.

2. The image processing apparatus according to claim 1, wherein when the target area is obtained using the representative position and the estimated contour, the element product of the pixel value of the image and the boundary feature representing the contour is used.

11. The image processing apparatus according to claim 10, wherein when obtaining the target area, a ratio of reflecting the boundary feature is adjusted.

When the target area is obtained using the representative position, the surrounding pixels are included in the target area based on the intensity of the pixels included in the representative position and the weight applied to the surroundings of the pixels included in the representative position. determining a label indicating whether or not the target area is determined, and further repeating the determination using the pixel whose label has been determined as a new starting point pixel to predict the labels of the surrounding pixels and obtain the target area. The described image processing device.

a display device for displaying an image;
an image processing apparatus having a processor for performing image processing after a user confirms the outline of a target area, which is an area to be separated in the image displayed on the display device;
with
The processor
Obtaining a representative position of the target area, which is the position input by the user;
estimating a contour of the region of interest;
An image processing system for determining the target area using the representative position and the estimated contour.

to the computer,
A function of acquiring a representative position of a target region, which is a position input by a user and is a region to be separated in an image;
a function of estimating a contour of the region of interest;
a function of determining the target area using the representative position and the estimated contour;
program to make it happen.