JP7604804B2

JP7604804B2 - Image processing device and image processing method

Info

Publication number: JP7604804B2
Application number: JP2020135645A
Authority: JP
Inventors: 貴裕堀
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2024-12-24
Anticipated expiration: 2040-08-11
Also published as: JP2022032133A; WO2022034779A1

Description

本発明は、画像処理装置および画像処理方法に関する。 The present invention relates to an image processing device and an image processing method.

近年、リモートワークを導入する企業が増加し、オンライン会議またはビデオ通話の機会が増えてきている。オンライン会議またはビデオ通話の際、通話者の表情を確認するためには、映像を有効にすることが望ましい。映像を有効にする場合、通話者は、女性であれば化粧、男性であれば髭剃り等、通話に適した身だしなみに整えるために手間がかかる。これに対し、ビデオ通話のための映像を、通話に適した状態に補正する技術が提案されている。例えば、特許文献１は、画像データを伴うデータ通信において、瞬きの回数または眼球の充血度により使用者の疲労度を判定し、一定以上の疲労度の場合は予め取り込んでいた画像と合成する技術を開示する。また、特許文献２は、ユーザが事前に決定した「相手に見せてもよい映像」から、映像通信に利用する映像を選択または加工して出力する技術を開示する。 In recent years, the number of companies that have introduced remote work has increased, and opportunities for online meetings or video calls have increased. During online meetings or video calls, it is desirable to enable video in order to check the facial expression of the person on the other end of the line. When enabling video, the person on the other end of the line must take the time to dress appropriately for the call, such as by putting on makeup if the person is female, or shaving if the person is male. In response to this, a technology has been proposed that corrects video for video calls to a state suitable for the call. For example, Patent Document 1 discloses a technology that, in data communication involving image data, determines the degree of fatigue of a user based on the number of blinks or the degree of bloodshot eyes, and synthesizes the image with a previously captured image if the degree of fatigue is above a certain level. Patent Document 2 discloses a technology that selects or processes video to be used for video communication from "video that is acceptable to show to the other party" that the user has previously determined, and outputs the video.

特開２００１－０１６５６４号公報JP 2001-016564 A 特開２０１２－１４２９２５号公報JP 2012-142925 A

ビデオ通話のための映像は、通話者の疲労度を判定するだけでは、身だしなみが整った状態であるにもかかわらず、不要な補正がされる可能性がある。また、通信用の映像を、相手に見せてもよい映像から常時選択または加工する場合、通信用の映像は、身だしなみが整っているか否かに関わらず補正される可能性がある。さらに、相手に見せてもよい映像をユーザが事前に決定しておくことは、手間がかかる場合があり、使い勝手が最適とは言えない。 When video calling images are used solely to determine the degree of fatigue of the caller, unnecessary corrections may be made to the image even if the caller is well-groomed. In addition, when images for communication are constantly selected or processed from images that are acceptable for the other party to see, the images for communication may be corrected regardless of whether the person is well-groomed or not. Furthermore, it may be time-consuming for the user to decide in advance what images are acceptable for the other party to see, and usability may not be optimal.

本発明は、一側面では、ビデオ通話の通話者の身だしなみが整っていない場合に、通話者の撮像画像を補正する技術を提供することを目的とする。 In one aspect, the present invention aims to provide a technology for correcting a captured image of a person on a video call when that person is not well-groomed.

本発明は、上記目的を達成するために、以下の構成を採用する。 To achieve the above objective, the present invention adopts the following configuration.

本開示の第一側面は、ユーザの撮像画像およびユーザの基準画像の特徴量を抽出する抽出部と、ユーザの撮像画像の特徴量とユーザの基準画像の特徴量とを照合した結果に基づいて、ユーザの撮像画像に対する補正処理を有効にするか無効にするかを判定する判定部と、補正処理を有効にすると判定した場合に、ユーザの撮像画像の補正画像を生成する補正部と、補正処理を有効にすると判定された場合は、補正画像を出力し、補正処理を無効にすると判定された場合は、補正されていないユーザの撮像画像を出力する出力部と、を備えることを特徴とする画像処理装置を提供する。 The first aspect of the present disclosure provides an image processing device that includes an extraction unit that extracts features of a user's captured image and a user's reference image, a determination unit that determines whether to enable or disable a correction process for the user's captured image based on a result of comparing the features of the user's captured image with the features of the user's reference image, a correction unit that generates a corrected image of the user's captured image when it is determined that the correction process is enabled, and an output unit that outputs the corrected image when it is determined that the correction process is enabled, and outputs an uncorrected user's captured image when it is determined that the correction process is disabled.

「基準画像」は、例えば、ユーザの身だしなみが整った状態の画像である。画像処理装置は、ユーザの撮像画像と基準画像とを照合した結果に基づいて、撮像画像を補正するか否かを判定することができる。画像処理装置は、補正をするか否かの判定に応じて自動で撮像画像を補正するため、ユーザは、身だしなみの状態を気にしたり、画像の補正を指示
するための特別な操作をしたりすることなく、ビデオ通話に臨むことができる。 The "reference image" is, for example, an image of a user in a well-groomed state. The image processing device can determine whether or not to correct the captured image based on the result of comparing the captured image of the user with the reference image. The image processing device automatically corrects the captured image according to the determination of whether or not to correct it, so that the user can participate in a video call without worrying about the state of his/her grooming or performing a special operation to instruct image correction.

特徴量は、ユーザの顔で変化を捉えやすい部位の特徴量であってもよい。撮像画像および基準画像から、ユーザの顔で変化を捉えやすい部位の特徴量を抽出することで、画像処理装置は、同一のユーザ間で照合した場合に、撮像画像の身だしなみが整っているか否か、すなわち、補正処理を無効にするか否かを精度良く判定することができる。 The feature amount may be a feature amount of a part of the user's face where changes are easily detected. By extracting feature amounts of parts of the user's face where changes are easily detected from the captured image and the reference image, the image processing device can accurately determine whether the grooming of the captured image is well-groomed, i.e., whether to disable correction processing, when comparing images of the same user.

特徴量は、Ｈａａｒ－ｌｉｋｅ特徴量、カラーヒストグラム、カラーモーメントのうち少なくともいずれかの特徴量を含むものであってもよい。また、特徴量は、身だしなみが整った状態の画像および身だしなみが整っていない状態の画像を学習させた学習モデルを使用したアルゴリズムによって算出されてもよい。画像処理装置は、各種の特徴量またはこれらの組み合わせた特徴量を使用して補正処理を有効にするか否かを判定することができる。 The feature amount may include at least one of Haar-like features, color histograms, and color moments. The feature amount may also be calculated by an algorithm using a learning model that has been trained on images of well-groomed and ungroomed individuals. The image processing device can use various features or a combination of these features to determine whether or not to enable correction processing.

判定部は、ユーザの撮像画像の特徴量とユーザの基準画像の特徴量との一致度を算出して、一致度が所定の閾値未満である場合に補正処理を有効にし、一致度が所定の閾値以上である場合に補正処理を無効にすると判定してもよい。画像処理装置は、ユーザの撮像画像と基準画像との一致度に基づいて、撮像画像を補正するか否かを判定し、自動で撮像画像を補正することができる。これにより、ユーザは、身だしなみの状態を気にしたり、画像の補正を指示するための特別な操作をしたりすることなく、ビデオ通話に臨むことができる。 The determination unit may calculate the degree of coincidence between the features of the user's captured image and the features of the user's reference image, and determine to enable the correction process if the degree of coincidence is less than a predetermined threshold, and to disable the correction process if the degree of coincidence is equal to or greater than the predetermined threshold. The image processing device can determine whether to correct the captured image based on the degree of coincidence between the user's captured image and the reference image, and can automatically correct the captured image. This allows the user to participate in a video call without worrying about their appearance or performing any special operations to instruct image correction.

補正部は、一致度に応じてユーザの撮像画像に対する補正量を変化させてもよい。撮像画像と登録画像との一致度に応じて補正量を変化させることで、ユーザは、身だしなみがある程度整っている場合には、不要な補正を抑制することができる。 The correction unit may change the amount of correction applied to the captured image of the user depending on the degree of similarity. By changing the amount of correction depending on the degree of similarity between the captured image and the registered image, the user can suppress unnecessary correction when their appearance is relatively neat.

補正部は、ユーザの撮像画像およびユーザの基準画像に基づいて、補正画像を生成してもよい。例えば、補正部は、ユーザの撮像画像およびユーザの基準画像を学習させたＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ、敵対的生成ネットワーク）により、補正画像を生成してもよい。また、補正部は、ユーザの基準画像の一部または顔全体を切り出し、ユーザの撮像画像の対応する部位を切り出した画像で置き換えることにより、補正画像を生成してもよい。ユーザの撮像画像および基準画像に基づいて補正画像を生成することで、画像処理装置は、基準画像により近い補正画像を生成することができる。 The correction unit may generate the corrected image based on the captured image of the user and the user's reference image. For example, the correction unit may generate the corrected image using a Generative Adversarial Network (GAN) that has been trained on the captured image of the user and the user's reference image. The correction unit may also generate the corrected image by cutting out a part or the entire face of the user's reference image and replacing the corresponding part of the captured image of the user with the cut-out image. By generating the corrected image based on the captured image of the user and the reference image, the image processing device can generate a corrected image that is closer to the reference image.

補正部は、ユーザの撮像画像に対し、顔の特徴情報に基づいてノイズを除去するフィルタ処理または彩度調整をすることにより補正画像を生成してもよい。画像処理装置は、各種の顔の特徴情報に基づいて、基準画像を使用せずに補正画像を生成することができるため、ユーザは、基準画像を用意する手間を省くことができる。 The correction unit may generate a corrected image by performing a filter process to remove noise or adjust saturation on the user's captured image based on facial feature information. Since the image processing device can generate a corrected image based on various facial feature information without using a reference image, the user can be saved the trouble of preparing a reference image.

判定部は、補正処理を有効にするか無効にするかを、ユーザの撮像画像の所定のフレーム数ごとに判定してもよい。画像処理装置は、ビデオ通話中に口紅などの化粧が落ちた場合にも撮像画像を補正することができるため、ユーザは、身だしなみの崩れを気にすることなく通話を継続できる。 The determination unit may determine whether to enable or disable the correction process for each predetermined number of frames of the user's captured image. The image processing device can correct the captured image even if makeup such as lipstick comes off during a video call, allowing the user to continue the call without worrying about their appearance being ruined.

画像処理装置は、ユーザの撮像画像を撮像する撮像部を、さらに備えてもよい。画像処理装置は、撮像部と一体に構成されることにより、簡易な構成とすることができる。 The image processing device may further include an imaging unit that captures an image of the user. The image processing device can be configured simply by being integrated with the imaging unit.

本発明の第二側面は、ユーザの撮像画像およびユーザの基準画像の特徴量を抽出する抽出ステップと、ユーザの撮像画像の特徴量とユーザの基準画像の特徴量とを照合した結果
に基づいて、ユーザの撮像画像に対する補正処理を有効にするか無効にするかを判定する判定ステップと、補正処理を有効にすると判定した場合に、ユーザの撮像画像の補正画像を生成する補正ステップと、補正処理を有効にすると判定された場合は、補正画像を出力し、補正処理を無効にすると判定された場合は、補正されていないユーザの撮像画像を出力する出力ステップと、を含むことを特徴とする人体検出方法を提供する。 A second aspect of the present invention provides a human body detection method including: an extraction step of extracting features of a user's captured image and a user's reference image; a determination step of determining whether to enable or disable a correction process for the user's captured image based on a result of comparing the features of the user's captured image with the features of the user's reference image; a correction step of generating a corrected image of the user's captured image when it is determined that the correction process is enabled; and an output step of outputting the corrected image when it is determined that the correction process is enabled, and outputting an uncorrected captured image of the user when it is determined that the correction process is disabled.

本発明によれば、ビデオ通話の通話者の身だしなみが整っていない場合に、通話者の撮像画像を補正することができる。 According to the present invention, when a person on a video call is not well groomed, the captured image of the person can be corrected.

図１は、実施形態に係る画像処理装置の適用例を説明する図である。FIG. 1 is a diagram illustrating an application example of an image processing apparatus according to an embodiment. 図２は、画像処理装置の機能構成を例示する図である。FIG. 2 is a diagram illustrating an example of a functional configuration of the image processing apparatus. 図３は、画像補正処理を例示するフローチャートである。FIG. 3 is a flowchart illustrating an example of the image correction process. 図４は、顔の特徴量を抽出する第１の例を示す図である。FIG. 4 is a diagram showing a first example of how facial feature amounts are extracted. 図５は、顔の特徴量を抽出する第２の例を示す図である。FIG. 5 is a diagram showing a second example of extracting facial feature amounts. 図６は、撮像画像の補正処理の例を示す図である。FIG. 6 is a diagram illustrating an example of a correction process for a captured image.

以下、本発明の一側面に係る実施の形態を、図面に基づいて説明する。 Below, an embodiment of one aspect of the present invention will be described with reference to the drawings.

＜適用例＞
図１は、実施形態に係る画像処理装置の適用例を説明する図である。画像処理装置は、カメラから入力されるカメラ画像（撮像画像）と、予めＤＢ（データベース）に登録されている登録画像（基準画像）とを取得し、各画像から特徴を抽出する。登録画像は、例えば、ユーザの身だしなみが整った状態の画像であり、撮像画像を補正するか否かを判定するための基準となる画像である。 <Application Examples>
1 is a diagram for explaining an application example of an image processing device according to an embodiment. The image processing device acquires a camera image (captured image) input from a camera and a registered image (reference image) registered in advance in a DB (database), and extracts features from each image. The registered image is, for example, an image of a user in a well-groomed state, and serves as a reference image for determining whether or not to correct the captured image.

画像処理装置は、カメラ画像と登録画像との特徴量を照合し、一致度を評価する。一致度が所定の閾値以上である場合、画像処理装置は、ユーザの身だしなみが整っていると判断し、カメラ画像に対する補正処理を無効にする。一致度が所定の閾値未満である場合、画像処理装置は、ユーザの身だしなみが整っていないと判断して補正処理を有効にする。このように、画像処理装置は、カメラ画像の特徴量と登録画像の特徴量とを照合した結果に基づいて、カメラ画像に対する補正処理を有効にするか無効にするかを判定する。 The image processing device compares the features of the camera image and the registered image and evaluates the degree of match. If the degree of match is equal to or greater than a predetermined threshold, the image processing device determines that the user is well-groomed and disables correction processing on the camera image. If the degree of match is less than the predetermined threshold, the image processing device determines that the user is not well-groomed and enables correction processing. In this way, the image processing device determines whether to enable or disable correction processing on the camera image based on the result of comparing the features of the camera image and the registered image.

画像処理装置は、補正処理を有効にした場合、ユーザのカメラ画像を補正して表示画像（補正画像）を生成し、通話相手が通話に使用する他のコンピュータに送信して表示させる。また、補正画像は画像処理装置のディスプレイに表示されてもよい。表示画像は、ユーザの身だしなみが整った状態の登録画像に基づいて生成することができる。これにより、ユーザは、身だしなみが整っていない場合に、特別な操作をしなくても補正された画像が表示されるため、身だしなみの状態を気にすることなく、ビデオ通話に臨むことができる。 When the correction process is enabled, the image processing device corrects the user's camera image to generate a display image (corrected image), which is then transmitted to and displayed on another computer used by the call partner. The corrected image may also be displayed on the display of the image processing device. The display image may be generated based on a registered image of the user in a well-groomed state. This allows the user to participate in a video call without worrying about the state of their appearance, since the corrected image is displayed without the user having to perform any special operations when their appearance is not good.

＜実施形態＞
（装置構成）
図２を参照して、画像処理装置１の機能構成の一例について説明する。図２は、画像処理装置１の機能構成を例示する図である。画像処理装置１は、撮像部１０、登録画像データベース１１、特徴抽出部１２、補正判定部１３、補正処理部１４、出力部１５を含む。 <Embodiment>
(Device configuration)
An example of the functional configuration of the image processing device 1 will be described with reference to Fig. 2. Fig. 2 is a diagram illustrating an example of the functional configuration of the image processing device 1. The image processing device 1 includes an imaging unit 10, a registered image database 11, a feature extraction unit 12, a correction determination unit 13, a correction processing unit 14, and an output unit 15.

撮像部１０は、通話者であるユーザを撮像する。登録画像データベース１１は、ユーザ
の撮像画像を補正するか否かを判定するための基準となる登録画像（基準画像）を格納する。登録画像データベース１１は、各ユーザに対し、複数の登録画像を格納してもよい。登録画像は、例えば、ユーザが画像処理装置１で初めて通話した際の撮像画像とすることができる。また、登録画像は、ユーザが表示装置に表示された画像を確認しながら選択した画像としてもよい。なお、登録画像データベース１１は、ユーザの登録画像に限られず、身だしなみが整った状態での特徴量の情報を保持するものであってもよい。 The imaging unit 10 captures an image of the user who is making a call. The registered image database 11 stores a registered image (reference image) that serves as a reference for determining whether or not to correct the captured image of the user. The registered image database 11 may store a plurality of registered images for each user. The registered image may be, for example, an image captured when the user makes a call for the first time using the image processing device 1. The registered image may also be an image selected by the user while checking an image displayed on the display device. Note that the registered image database 11 is not limited to the registered image of the user, and may also hold information on the feature amount of a well-groomed state.

特徴抽出部１２（抽出部）は、ユーザの撮像画像および登録画像から特徴量を抽出する。特徴量は、例えば、Ｈａａｒ－ｌｉｋｅ特徴量、カラーヒストグラム、カラーモーメントである。また、特徴抽出部１２は、これらの組み合わせを、補正処理を有効にするか無効にするかを判定するための特徴量としてもよい。 The feature extraction unit 12 (extraction unit) extracts features from the user's captured image and the registered image. The features are, for example, Haar-like features, color histograms, and color moments. The feature extraction unit 12 may also use a combination of these features as features for determining whether to enable or disable the correction process.

補正判定部１３（判定部）は、特徴抽出部１２が抽出した特徴量に基づいて、撮像画像に対する補正処理を有効にするか否かを判定する。具体的には、補正判定部１３は、ユーザの撮像画像の特徴量と、ユーザの登録画像の特徴量とを照合し、一致度を算出する。 The correction determination unit 13 (determination unit) determines whether or not to enable correction processing for the captured image based on the features extracted by the feature extraction unit 12. Specifically, the correction determination unit 13 compares the features of the captured image of the user with the features of the registered image of the user, and calculates the degree of match.

例えば、補正判定部１３は、眉と額側の肌との境界周辺のＨａａｒ－ｌｉｋｅ特徴量を、撮像画像および登録画像のそれぞれで算出し、「（登録画像の特徴量－各特徴量の差分）／登録画像の特徴量」を一致度として算出することができる。補正判定部１３は、眉と額側の肌との境界以外にも、複数部位でＨａａｒ－ｌｉｋｅ特徴量を算出し、これらの平均値を一致度としてもよい。 For example, the correction determination unit 13 can calculate the Haar-like feature amount around the boundary between the eyebrows and the skin on the forehead for each of the captured image and the registered image, and calculate the degree of match as "(feature amount of registered image - difference between each feature amount)/feature amount of registered image." The correction determination unit 13 may calculate Haar-like feature amounts for multiple sites other than the boundary between the eyebrows and the skin on the forehead, and use the average value of these as the degree of match.

補正判定部１３は、一致度が所定の閾値（例えば、８０％）以上である場合、ユーザの撮像画像に対する補正処理を無効にすると判定し、一致度が所定の閾値未満である場合、補正処理を有効にすると判定することができる。 The correction determination unit 13 can determine that correction processing for the user's captured image is disabled if the degree of match is equal to or greater than a predetermined threshold (e.g., 80%), and can determine that correction processing is enabled if the degree of match is less than the predetermined threshold.

補正処理部１４（補正部）は、補正判定部１３が補正処理を有効にすると判定した場合に、ユーザの撮像画像を補正する。補正処理部１４は、例えば、ユーザの身だしなみが整った画像および整っていない画像を学習させたＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）により、補正画像を生成することができる。また、補正処理部１４は、登録画像の一部または顔全体を切り出し、ユーザの撮像画像の対応する部位を、切り出した画像で置き換えて合成することにより、補正画像を生成することも可能である。さらに、補正処理部１４は、ユーザの撮像画像に対してノイズを除去するフィルタ処理または彩度調整をすることにより、補正画像を生成することも可能である。 The correction processing unit 14 (correction unit) corrects the captured image of the user when the correction determination unit 13 determines that the correction process is enabled. The correction processing unit 14 can generate the corrected image, for example, by using a Generative Adversarial Network (GAN) that has learned images of well-groomed and poorly-groomed users. The correction processing unit 14 can also generate the corrected image by cutting out a part of the registered image or the entire face, and replacing the corresponding part of the captured image of the user with the cut-out image and synthesizing the two. Furthermore, the correction processing unit 14 can generate the corrected image by performing a filter process to remove noise or a saturation adjustment on the captured image of the user.

出力部１５は、通話者であるユーザの画像を出力する。出力部１５が出力した映像は、他のコンピュータに送信される。また、出力部１５が出力した映像は、画像処理装置１の表示装置に表示されてもよい。出力部１５は、補正判定部１３がユーザの撮像画像に対する補正処理を有効にすると判定した場合は、補正後のユーザの撮像画像（補正画像）を出力し、補正処理を無効にすると判定した場合は、ユーザの撮像画像を補正せずに出力する。補正判定部１３が補正処理を有効にすると判定した場合、出力部１５は、通話相手が通話に使用するタブレット端末等の電子機器に、補正後のユーザの撮像画像を送信して表示させる。 The output unit 15 outputs an image of the user who is the caller. The image output by the output unit 15 is transmitted to another computer. The image output by the output unit 15 may also be displayed on the display device of the image processing device 1. If the correction determination unit 13 determines that the correction process for the user's captured image is enabled, the output unit 15 outputs the user's captured image after correction (corrected image), and if the correction process is disabled, the output unit 15 outputs the user's captured image without correction. If the correction determination unit 13 determines that the correction process is enabled, the output unit 15 transmits the user's captured image after correction to an electronic device such as a tablet terminal used by the call partner for the call to display it.

本実施形態の画像処理装置１は、パーソナルコンピュータ、サーバコンピュータ、タブレット端末、スマートフォンのような汎用的なコンピュータでもよく、オンボードコンピュータのように組み込み型のコンピュータでもよい。画像処理装置１は、ＣＰＵ（プロセッサ）、ＲＡＭ（メモリ）、不揮発性のストレージ（ＨＤＤ、ＳＳＤなど）、入力装置（タッチパネルなど）、通信装置（有線又は無線のＬＡＮモジュールなど）を有する。また、画像処理装置１は、レンズおよび撮像素子（ＣＣＤやＣＭＯＳなどのイメージセンサ）
を含む撮像装置、表示装置（液晶モニタなど）などのハードウェア資源も有する。 The image processing device 1 of this embodiment may be a general-purpose computer such as a personal computer, a server computer, a tablet terminal, or a smartphone, or may be an embedded computer such as an on-board computer. The image processing device 1 has a CPU (processor), a RAM (memory), non-volatile storage (HDD, SSD, etc.), an input device (touch panel, etc.), and a communication device (wired or wireless LAN module, etc.). The image processing device 1 also has a lens and an imaging element (image sensor such as CCD or CMOS).
The image capturing device includes a display device (such as a liquid crystal monitor), and other hardware resources.

プロセッサは、ストレージに格納されたプログラムをＲＡＭに展開して実行することにより、図２で説明する各機能部の機能を実現する。なお、画像処理装置１の実現方法はこれに限られない。画像処理装置１は、例えば、複数台のコンピュータ装置による分散コンピューティングにより実現されてもよく、各機能部の一部をクラウドサーバにより実現されてもよい。また、画像処理装置１の各機能部の一部は、ＦＰＧＡまたはＡＳＩＣなどの専用のハードウェア装置によって実現されてもよい。 The processor deploys a program stored in the storage in the RAM and executes it to realize the functions of each functional unit described in FIG. 2. Note that the method of realizing the image processing device 1 is not limited to this. For example, the image processing device 1 may be realized by distributed computing using multiple computer devices, and some of the functional units may be realized by a cloud server. In addition, some of the functional units of the image processing device 1 may be realized by a dedicated hardware device such as an FPGA or ASIC.

（画像補正処理）
図３に沿って画像補正処理の全体的な流れを説明する。図３は、画像補正処理を例示するフローチャートである。画像補正処理は、例えば、ユーザが、画像処理装置１で通話に使用するアプリケーションを起動することにより開始される。なお、図３に示す画像補正処理は、カメラ画像（撮像画像）のフレームごとに実行される処理である。 (Image correction processing)
The overall flow of the image correction process will be described with reference to Fig. 3. Fig. 3 is a flowchart illustrating the image correction process. The image correction process is started, for example, when a user starts an application used for calling on the image processing device 1. Note that the image correction process shown in Fig. 3 is a process executed for each frame of a camera image (captured image).

Ｓ１０１では、特徴抽出部１２は、撮像部１０で撮像されたユーザのカメラ画像を取得する。特徴抽出部１２は、カメラ画像の１フレームごとにＳ１０２の処理に進む。以下、各処理の説明で、カメラ画像は、撮像部１０から受信するデータの１フレームの画像であるものとして説明する。 In S101, the feature extraction unit 12 acquires a camera image of the user captured by the imaging unit 10. The feature extraction unit 12 proceeds to the process of S102 for each frame of the camera image. In the following explanation of each process, the camera image will be explained as an image of one frame of data received from the imaging unit 10.

Ｓ１０２では、特徴抽出部１２は、カメラ画像が、特徴量を評価するタイミングの画像であるか否かを判定する。特徴量を評価するタイミングは、例えば、特徴抽出部１２が、各フレームのうち最初に人の顔を認識したタイミングとすることができる。なお、特徴量を評価するタイミングは、最初に人の顔を認識したタイミングに限られず、所定のフレーム数（例えば、３０フレーム）ごと、または所定時間（例えば、５分）ごとのように所定の間隔としてもよい。カメラ画像が、特徴量を評価するタイミングの画像である場合（Ｓ１０２：Ｙｅｓ）、処理はＳ１０３に進む。カメラ画像が、特徴量を評価するタイミングの画像でない場合（Ｓ１０２：Ｎｏ）、処理はＳ１０８に進む。 In S102, the feature extraction unit 12 determines whether the camera image is an image at the timing for evaluating the feature amount. The timing for evaluating the feature amount may be, for example, the timing when the feature extraction unit 12 first recognizes a human face in each frame. Note that the timing for evaluating the feature amount is not limited to the timing when a human face is first recognized, and may be a predetermined interval such as every predetermined number of frames (e.g., 30 frames) or every predetermined time (e.g., 5 minutes). If the camera image is an image at the timing for evaluating the feature amount (S102: Yes), the process proceeds to S103. If the camera image is not an image at the timing for evaluating the feature amount (S102: No), the process proceeds to S108.

Ｓ１０３では、特徴抽出部１２は、カメラ画像の特徴量を抽出する。また、登録画像データベース１１からユーザの登録画像を取得し、登録画像の特徴量を抽出する。ここで、図４および図５を用いて、特徴量の抽出について説明する。 In S103, the feature extraction unit 12 extracts the feature amounts of the camera image. It also acquires the user's registered image from the registered image database 11 and extracts the feature amounts of the registered image. Here, the extraction of the feature amounts will be explained with reference to FIG. 4 and FIG. 5.

・特徴量抽出の第１の例
図４は、顔の特徴量を抽出する第１の例を示す図である。図４の例は、ユーザ固有の経年変化しにくい特徴点を選択し、特徴点の周辺でＨａａｒ－ｌｉｋｅ特徴量を抽出する例である。図４（Ａ）のカメラ画像および図４（Ｂ）の登録画像に示す丸印は、ユーザ固有の経年変化しにくい特徴点を示す。 First Example of Feature Extraction Figure 4 is a diagram showing a first example of extracting facial features. The example in Figure 4 is an example in which feature points unique to a user that are unlikely to change over time are selected, and Haar-like features are extracted around the feature points. The circles shown in the camera image in Figure 4(A) and the registered image in Figure 4(B) indicate feature points unique to a user that are unlikely to change over time.

図４（Ａ）は、カメラ画像とともに、ユーザの左目周辺の領域でＨａａｒ－ｌｉｋｅ特徴量を算出した結果を示す。図４（Ａ）は、エッジを識別する矩形パターン（上下または左右に２分割をして一方が白、他方が黒のフィルタ）等を使用してＨａａｒ－ｌｉｋｅ特徴量を算出した例を示す。算出されたＨａａｒ－ｌｉｋｅ特徴量は、算出された値に応じた濃淡の矩形により表されている。ユーザの左目の目尻４０１aでのＨａａｒ－ｌｉｋｅ
特徴量は、グレー４０１ｂで示される。 Fig. 4A shows the results of Haar-like feature calculations in the area around the user's left eye together with a camera image. Fig. 4A shows an example of Haar-like feature calculations using a rectangular pattern for identifying edges (a filter dividing the area vertically or horizontally into two parts, one white and the other black). The calculated Haar-like feature is represented by a rectangle with a shade corresponding to the calculated value. Haar-like feature at the outer corner 401a of the user's left eye
The feature amount is indicated in gray 401b.

図４（Ｂ）は、登録画像とともに、ユーザの左目周辺の領域でＨａａｒ－ｌｉｋｅ特徴量を算出した結果を示す。Ｈａａｒ－ｌｉｋｅ特徴量は、図４（Ａ）と同様に算出されたものとする。算出されたＨａａｒ－ｌｉｋｅ特徴量は、算出された値に応じた濃淡の矩形により表されている。ユーザの左目の目尻４０２aでのＨａａｒ－ｌｉｋｅ特徴量は、図
４（Ａ）の４０１ｂよりも濃いグレー４０２ｂで示される。このように、登録画像では、アイライン等のメイクアップによる濃淡差により、カメラ画像よりもＨａａｒ－ｌｉｋｅ特徴量が大きくなるため、４０２ｂは４０１ｂよりも濃くなる。 FIG. 4B shows the registered image together with the results of calculating the Haar-like feature amount in the area around the left eye of the user. The Haar-like feature amount is calculated in the same manner as in FIG. 4A. The calculated Haar-like feature amount is represented by a rectangle of a shade corresponding to the calculated value. The Haar-like feature amount at the outer corner 402a of the user's left eye is shown by a shade 402b that is darker than 401b in FIG. 4A. In this way, in the registered image, the Haar-like feature amount is larger than in the camera image due to the difference in shade caused by makeup such as eyeliner, and so 402b is darker than 401b.

図４に示す第１の例では、カメラ画像と登録画像との一致度は、例えば、丸印で示す特徴点のそれぞれにおけるＨａａｒ－ｌｉｋｅ特徴量の一致度に基づいて算出することができる。カメラ画像と登録画像との一致度は、各特徴点での一致度の合計値または平均値としてもよい。各特徴点での一致度は、例えば、（Ｘ－｜Ｘ－Ｙ｜）／Ｘ（ただし、Ｘ：登録画像の特徴点でのＨａａｒ－ｌｉｋｅ特徴量、Ｙ：カメラ画像の対応する特徴点でのＨａａｒ－ｌｉｋｅ特徴量）として算出することができる。 In the first example shown in FIG. 4, the degree of match between the camera image and the registered image can be calculated, for example, based on the degree of match of the Haar-like features at each of the feature points indicated by circles. The degree of match between the camera image and the registered image may be the sum or average of the degrees of match at each feature point. The degree of match at each feature point can be calculated, for example, as (X-|X-Y|)/X (where X is the Haar-like feature at the feature point of the registered image, and Y is the Haar-like feature at the corresponding feature point of the camera image).

なお、ユーザ固有の経年変化しにくい特徴点で特徴量を照合する方法は、顔認証のアルゴリズムを用いて実現することも可能である。補正判定部１３は、Ｓ１０４およびＳ１０５の判定処理で、顔認証アルゴリズムによって本人であるか否かを示すスコアを算出し、カメラ画像と登録画像との一致度として使用することができる。 The method of matching features using feature points that are unique to a user and do not change easily over time can also be achieved using a facial recognition algorithm. In the determination processes of S104 and S105, the correction determination unit 13 calculates a score indicating whether or not the person is the actual person using a facial recognition algorithm, and this score can be used as the degree of match between the camera image and the registered image.

・特徴量抽出の第２の例
図５は、顔の特徴量を抽出する第２の例を示す図である。図５の例は、同一ユーザ間で変化を捉えやすい特徴点を選択し、特徴点の周辺でＨａａｒ－ｌｉｋｅ特徴量を抽出する例である。図５（Ａ）のカメラ画像および図５（Ｂ）の登録画像に示す丸印は、同一ユーザ間で変化を捉えやすい特徴点を示す。 Second Example of Feature Extraction Figure 5 is a diagram showing a second example of extracting facial features. The example in Figure 5 is an example in which feature points that are easy to capture changes between the same user are selected, and Haar-like features are extracted around the feature points. The circles shown in the camera image in Figure 5 (A) and the registered image in Figure 5 (B) indicate feature points that are easy to capture changes between the same user.

図５（Ａ）は、カメラ画像とともに、ユーザの左頬周辺の領域でＨａａｒ－ｌｉｋｅ特徴量を算出した結果を示す。図５（Ａ）は、エッジを識別する矩形パターン（上下または左右に２分割をして一方が白、他方が黒のフィルタ）等を使用してＨａａｒ－ｌｉｋｅ特徴量を算出した例を示す。算出されたＨａａｒ－ｌｉｋｅ特徴量は、算出された値に応じた濃淡の矩形により表されている。ユーザの左頬の中央５０１aでのＨａａｒ－ｌｉｋｅ
特徴量は、白５０１ｂで示される。 Fig. 5(A) shows the results of calculating Haar-like features in the area around the left cheek of the user together with a camera image. Fig. 5(A) shows an example of calculating Haar-like features using a rectangular pattern (a filter dividing the area vertically or horizontally into two parts, one side being white and the other being black) that identifies edges. The calculated Haar-like features are represented by a rectangle with a shade corresponding to the calculated value. Haar-like features at the center 501a of the user's left cheek
The feature amount is indicated by white 501b.

図５（Ｂ）は、登録画像とともに、ユーザの左頬周辺の領域でＨａａｒ－ｌｉｋｅ特徴量を算出した結果を示す。Ｈａａｒ－ｌｉｋｅ特徴量は、図５（Ａ）と同様に算出されたものとする。算出されたＨａａｒ－ｌｉｋｅ特徴量は、算出された値に応じた濃淡の矩形により表されている。ユーザの左頬の中央５０２aでのＨａａｒ－ｌｉｋｅ特徴量は、グ
レー５０２ｂで示される。このように、登録画像では、頬紅等のメイクアップによる濃淡差により、カメラ画像よりもＨａａｒ－ｌｉｋｅ特徴量が大きくなるため、５０２ｂは４０１ｂよりも濃くなる。 FIG. 5B shows the registered image together with the results of calculating the Haar-like feature amount in the area around the left cheek of the user. The Haar-like feature amount is calculated in the same manner as in FIG. 5A. The calculated Haar-like feature amount is represented by a rectangle of a shade corresponding to the calculated value. The Haar-like feature amount at the center 502a of the user's left cheek is shown in gray 502b. Thus, in the registered image, the Haar-like feature amount is larger than in the camera image due to the difference in shade caused by makeup such as blush, and so 502b is darker than 401b.

図５に示す第２の例では、カメラ画像と登録画像との一致度は、例えば、丸印で示す特徴点のそれぞれにおけるＨａａｒ－ｌｉｋｅ特徴量の一致度に基づいて、図４の第１の例と同様に算出することができる。なお、女性の頬または男性の髭のように、身だしなみを整える前後で色が変化する部位で抽出する特徴量は、Ｈａａｒ－ｌｉｋｅ特徴量に限られず、カラーヒストグラム、カラーモーメント等の特徴量、またはこれらを組み合わせた特徴量であってもよい。 In the second example shown in FIG. 5, the degree of match between the camera image and the registered image can be calculated in the same manner as in the first example of FIG. 4, for example, based on the degree of match of the Haar-like features at each of the feature points indicated by circles. Note that the features extracted from areas whose color changes before and after grooming, such as a woman's cheeks or a man's beard, are not limited to Haar-like features, but may be features such as color histograms and color moments, or a combination of these features.

第２の例では、同一ユーザ間で変化を捉えやすい特徴点での特徴量を照合するため、補正判定部１３は、身だしなみが整っているか否かを精度良く判定することができる。同一ユーザ間で変化を捉えやすい特徴点は、女性の場合は、眉、目尻、頬、口等の部位、男性の場合は、髭が生える口周り等の部位から選択すればよい。男女の性別によって照合する特徴点の部位を変えることで、身だしなみが整っているか否かは、より精度良く判定することが可能となる。 In the second example, since feature amounts are compared at feature points that are likely to capture changes between the same user, the correction determination unit 13 can accurately determine whether or not the person is well-groomed. Feature points that are likely to capture changes between the same user can be selected from areas such as the eyebrows, corners of the eyes, cheeks, and mouth for women, and areas such as around the mouth where beards grow for men. By changing the areas of the feature points to be compared depending on gender, it becomes possible to more accurately determine whether or not the person is well-groomed.

・特徴量抽出の第３の例
顔の特徴量を抽出する第３の例として、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ）等のディープラーニングにより生成された学習モデルを使用する例について説明する。第３の例で使用する学習モデルは、例えば、ＣＮＮに、身だしなみが整っている画像および身だしなみが整っていない画像を学習させて生成したモデルである。ＣＮＮに学習させる画像は、ユーザ本人以外の画像であってもよく、ユーザ本人の画像を含んでもよい。特徴抽出部１２は、生成された学習モデルを使用して、ＣＮＮのアリゴリズムにより登録画像（身だしなみが整っている画像）およびカメラ画像のスコアを特徴量として抽出する。補正判定部１３は、Ｓ１０４およびＳ１０５の判定処理で、登録画像のスコアとカメラ画像のスコアとの一致度に基づいて、カメラ画像に対する補正処理を有効にするか無効にするかを判定することができる。 Third Example of Feature Extraction As a third example of extracting facial features, an example of using a learning model generated by deep learning such as CNN (Convolution Neural Network) will be described. The learning model used in the third example is, for example, a model generated by having CNN learn images of well-groomed people and images of poorly-groomed people. The images to be learned by CNN may be images other than the user himself/herself, and may include images of the user himself/herself. The feature extraction unit 12 uses the generated learning model to extract the registered image (well-groomed image) and the score of the camera image as features by the CNN algorithm. The correction determination unit 13 can determine whether to enable or disable the correction process for the camera image based on the degree of agreement between the score of the registered image and the score of the camera image in the determination processes of S104 and S105.

図３に戻り、Ｓ１０４では、補正判定部１３は、Ｓ１０３で抽出されたカメラ画像の特徴量と登録画像の特徴量とを照合する。具体的には、補正判定部１３は、カメラ画像の特徴量と登録画像の特徴量とに基づいて、カメラ画像と登録画像との一致度を算出する。カメラ画像と登録画像との一致度は、上記の第１の例から第３の例で説明したように、各画像から抽出する特徴量の種類に応じた方法で算出される。 Returning to FIG. 3, in S104, the correction determination unit 13 compares the features of the camera image extracted in S103 with the features of the registered image. Specifically, the correction determination unit 13 calculates the degree of match between the camera image and the registered image based on the features of the camera image and the features of the registered image. The degree of match between the camera image and the registered image is calculated by a method according to the type of features extracted from each image, as described in the first to third examples above.

Ｓ１０５では、補正判定部１３は、Ｓ１０４で算出したカメラ画像と登録画像との一致度が、所定の閾値以上であるか否かを判定する。一致度が所定の閾値以上である場合（Ｓ１０５：Ｙｅｓ）、処理はＳ１０７に進む。一致度が所定の閾値未満である場合（Ｓ１０５：Ｎｏ）、処理はＳ１０６に進む。 In S105, the correction determination unit 13 determines whether the degree of match between the camera image and the registered image calculated in S104 is equal to or greater than a predetermined threshold. If the degree of match is equal to or greater than the predetermined threshold (S105: Yes), the process proceeds to S107. If the degree of match is less than the predetermined threshold (S105: No), the process proceeds to S106.

Ｓ１０６では、カメラ画像と登録画像との一致度が所定の閾値未満であるため、補正判定部１３は、補正フラグをオン（ＯＮ）に設定し、カメラ画像の補正処理を有効にする。Ｓ１０７では、カメラ画像と登録画像との一致度が所定の閾値以上であるため、補正判定部１３は、補正フラグをオフ（ＯＦＦ）に設定し、カメラ画像の補正処理を無効にする。 In S106, since the degree of match between the camera image and the registered image is less than a predetermined threshold, the correction determination unit 13 sets the correction flag to on (ON) and enables the correction process of the camera image. In S107, since the degree of match between the camera image and the registered image is equal to or greater than a predetermined threshold, the correction determination unit 13 sets the correction flag to off (OFF) and disables the correction process of the camera image.

補正フラグは、補正処理部１４が、ユーザの撮像画像に対する補正処理を実行するか否かを決定するためのフラグである。Ｓ１０６で補正フラグがオンに設定されると、後のフレームに対してＳ１０７で補正フラグがオフに設定されるまで、補正処理は有効となる。特徴量を評価するタイミングが通話の最初だけである場合、最初に補正フラグがオンに設定されると、補正処理部１４は、通話が終了するまでカメラ画像の補正処理を継続する。 The correction flag is a flag for the correction processing unit 14 to determine whether or not to execute correction processing on the captured image of the user. When the correction flag is set to on in S106, the correction processing is enabled until the correction flag is set to off for a subsequent frame in S107. If the feature amount is evaluated only at the beginning of a call, when the correction flag is initially set to on, the correction processing unit 14 continues the correction processing of the camera image until the call ends.

これに対し、Ｓ１０７で補正フラグがオフに設定されると、後のフレームに対してＳ１０６で補正フラグがオンに設定されるまで、補正処理は無効となる。特徴量を評価するタイミングが通話の最初だけである場合、最初に補正フラグがオフに設定されると、カメラ画像は補正されずに表示装置に表示される。 In contrast, if the correction flag is set to OFF in S107, the correction process is disabled until the correction flag is set to ON for a subsequent frame in S106. If the feature is evaluated only at the beginning of a call, when the correction flag is initially set to OFF, the camera image is displayed on the display device without correction.

また、特徴量を評価するタイミングが、所定のフレーム数ごとである場合、Ｓ１０６で補正フラグがオンに設定されると、次に特徴量を評価するタイミングになるまで、カメラ画像の補正処理は有効となる。反対に、Ｓ１０７で補正フラグがオフに設定されると、次に特徴量を評価するタイミングになるまで、カメラ画像の補正処理は無効となり、カメラ画像は補正されずに表示装置に表示される。 In addition, if the timing for evaluating the feature amount is every predetermined number of frames, when the correction flag is set to ON in S106, the correction process of the camera image will be valid until the next time the feature amount is evaluated. Conversely, when the correction flag is set to OFF in S107, the correction process of the camera image will be invalid until the next time the feature amount is evaluated, and the camera image will be displayed on the display device without correction.

Ｓ１０８では、補正処理部１４は、補正フラグがオン（ＯＮ）であるか否かを判定する。補正フラグがオンである場合（Ｓ１０８：Ｙｅｓ）、処理はＳ１０９に進む。補正フラグがオフである場合（Ｓ１０８：Ｎｏ）、補正処理部１４はカメラ画像を補正せずに出力部１５に出力し、処理はＳ１１０に進む。 In S108, the correction processing unit 14 determines whether the correction flag is on (ON). If the correction flag is on (S108: Yes), the process proceeds to S109. If the correction flag is off (S108: No), the correction processing unit 14 outputs the camera image to the output unit 15 without correction, and the process proceeds to S110.

Ｓ１０９では、補正処理部１４は、ユーザのカメラ画像を補正して補正画像を生成する。ここで、補正処理部１４が、補正画像を生成する３つの方法について説明する。１つ目および２つ目の方法は、カメラ画像および登録画像に基づいて補正画像を生成する方法である。３つ目の方法は、予め用意された顔の特徴情報に基づいてカメラ画像を補正することにより、補正画像を生成する方法である。 In S109, the correction processing unit 14 corrects the camera image of the user to generate a corrected image. Here, three methods by which the correction processing unit 14 generates a corrected image will be described. The first and second methods are methods in which the corrected image is generated based on the camera image and the registered image. The third method is a method in which the corrected image is generated by correcting the camera image based on facial feature information prepared in advance.

１つ目の方法は、ユーザの身だしなみが整った画像と整っていない画像とを学習させたＧＡＮによって補正画像を生成する方法である。ＧＡＮは、例えば、補正フラグがオフの場合のカメラ画像を、身だしなみが整った画像のデータとして学習させることができる。また、ＧＡＮは、補正フラグがオンの場合のカメラ画像を、身だしなみが整っていない画像のデータとして学習させることができる。補正処理部１４は、学習済みのＧＡＮにより、身だしなみが整った補正画像を生成することができる。 The first method is to generate a corrected image by using a GAN that has been trained with images of the user in which they are well-groomed and images of the user in which they are not well-groomed. For example, the GAN can learn camera images when the correction flag is off as data of an image in which the user is well-groomed. The GAN can also learn camera images when the correction flag is on as data of an image in which the user is not well-groomed. The correction processing unit 14 can generate a corrected image in which the user is well-groomed by using the trained GAN.

２つ目の方法は、ユーザの登録画像の一部または顔全体を切り出し、ユーザの撮像画像の対応する部位を、登録画像から切り出した画像に置き換えて、補正画像を生成する方法である。図１を用いて、２つ目の方法を具体的に説明する。図１の例では、補正処理部１４は、登録画像の眉、目、口の画像を切り出す。補正処理部１４は、カメラ画像での眉、目、口を、登録画像から切り出した眉、目、口の画像に置き換えて、補正画像を生成することができる。 The second method is to cut out a part of the user's registered image or the entire face, and replace the corresponding part of the user's captured image with the image cut out from the registered image to generate a corrected image. The second method will be explained in detail using Figure 1. In the example of Figure 1, the correction processing unit 14 cuts out images of the eyebrows, eyes, and mouth of the registered image. The correction processing unit 14 can replace the eyebrows, eyes, and mouth in the camera image with the images of the eyebrows, eyes, and mouth cut out from the registered image to generate a corrected image.

なお、カメラ画像の一部を登録画像から切り出した画像に置き換えてこれらの画像を合成する際、カメラ画像と登録画像との一致度に応じて、補正量（ここでは、合成する割合）を変化させてもよい。例えば、補正処理部１４は、一致度が高くなるにつれて補正量を減らし、一致度が低くなるにつれて補正量を増やせばよい。 When replacing a part of the camera image with an image cut out from the registered image and combining these images, the amount of correction (here, the rate of combining) may be changed according to the degree of match between the camera image and the registered image. For example, the correction processing unit 14 may decrease the amount of correction as the degree of match increases and increase the amount of correction as the degree of match decreases.

また、ユーザは、登録画像の顔全体を切り出して置換するのか、一部を切り出して置換するのかを設定できるようにしてもよい。また、登録画像の一部を切り出す場合、ユーザは、顔のどの部位を切り出して置換するのかを設定できるようにしてもよい。 The user may also be able to set whether to cut out and replace the entire face from the registered image, or to cut out and replace only a portion of it. If a portion of the registered image is cut out, the user may also be able to set which part of the face is to be cut out and replaced.

３つ目の方法は、登録画像は使用せずに、予め登録画像データベース１１等に格納された顔の特徴情報に基づいてカメラ画像を補正し、補正画像を生成する方法である。顔の特徴情報は、例えば、メイクアップを施した場合の眉、口、頬、肌の色または明るさ等の情報である。顔の特徴情報は、例えば、仕事用、プライベート用などビデオ通話のシーンに応じて複数のパターンが用意されてもよい。ユーザは、ビデオ通話のシーンに応じて登録画像用意する手間を省くことができる。 The third method is to correct the camera image based on facial feature information stored in advance in the registered image database 11 or the like, without using a registered image, to generate a corrected image. The facial feature information is, for example, information on the eyebrows, mouth, cheeks, skin color or brightness when makeup is applied. For example, multiple patterns of facial feature information may be prepared according to the video call scene, such as for work or personal use. This saves the user the trouble of preparing a registered image according to the video call scene.

図６を用いて、３つ目の方法を具体的に説明する。図６の例では、補正処理部１４は、カメラ画像での頬のシミを、ノイズを除去するフィルタ処理により除去している。また、補正処理部１４は、顔の特徴情報に基づいて、眉、口、頬、肌の彩度調整をすることにより、補正画像（表示画像）を生成することができる。なお、フィルタ処理または彩度調整をする場合に、カメラ画像と登録画像との一致度に応じて補正量を変化させてもよい。 The third method will be described in detail with reference to FIG. 6. In the example of FIG. 6, the correction processing unit 14 removes blemishes on the cheek in the camera image by filtering to remove noise. Furthermore, the correction processing unit 14 can generate a corrected image (display image) by adjusting the saturation of the eyebrows, mouth, cheeks, and skin based on facial feature information. Note that when performing filtering or saturation adjustment, the amount of correction may be changed depending on the degree of match between the camera image and the registered image.

図３に戻り、Ｓ１１０では、出力部１５は、補正処理部１４から出力された映像を出力する。すなわち、出力部１５は、補正フラグがオンに設定されている場合、補正処理部１４が生成した補正画像を出力する。また、出力部１５は、補正フラグがオフに設定されている場合、補正されていないユーザのカメラ画像を出力する。出力部１５が出力した映像は、他のコンピュータに送信され表示される。また、出力部１５が出力した映像は、表示装置に表示される。 Returning to FIG. 3, in S110, the output unit 15 outputs the image output from the correction processing unit 14. That is, when the correction flag is set to on, the output unit 15 outputs the corrected image generated by the correction processing unit 14. Also, when the correction flag is set to off, the output unit 15 outputs the uncorrected camera image of the user. The image output by the output unit 15 is transmitted to and displayed on another computer. Also, the image output by the output unit 15 is displayed on a display device.

画像処理装置１は、ユーザが通話を終了するまでの間、フレームごとに上記の処理を繰り返す。ユーザが通話を終了すると、Ｓ１０１でカメラ画像は取得されなくなり、図３に示す画像補正処理は終了する。 The image processing device 1 repeats the above process for each frame until the user ends the call. When the user ends the call, the camera image is no longer acquired in S101, and the image correction process shown in FIG. 3 ends.

（作用効果）
上記の実施形態において、画像処理装置１は、カメラ画像（撮像画像）と、身だしなみが整っている登録画像を取得し、各画像から特徴量を抽出して一致度を評価する。画像処理装置１は、一致度が所定の閾値以上であれば、身だしなみが整っていると判定し補正処理を無効にする。また、画像処理装置１は、一致度が所定の閾値未満であれば、身だしなみが整っていないと判定し、補正処理を有効にする。これにより、通話者（ユーザ）は、身だしなみの状態を気にしたり、画像の補正を指示するための特別な操作をしたりすることなく、ビデオ通話に臨むことができる。 (Action and Effect)
In the above embodiment, the image processing device 1 acquires a camera image (captured image) and a registered image of a well-groomed person, extracts features from each image, and evaluates the degree of match. If the degree of match is equal to or greater than a predetermined threshold, the image processing device 1 determines that the person is well-groomed and disables correction processing. If the degree of match is less than a predetermined threshold, the image processing device 1 determines that the person is not well-groomed and enables correction processing. This allows the caller (user) to participate in a video call without worrying about the state of their appearance or performing a special operation to instruct image correction.

＜その他＞
上記実施形態は、本発明の構成例を例示的に説明するものに過ぎない。本発明は上記の具体的な形態には限定されることはなく、その技術的思想の範囲内で種々の変形が可能である。＜Other＞
The above-described embodiment merely describes an exemplary configuration of the present invention. The present invention is not limited to the above-described specific embodiment, and various modifications are possible within the scope of the technical concept thereof.

例えば、上記の実施形態では、補正判定部１３は、カメラ画像と登録画像との特徴量を照合し、一致度を評価するがこれに限られない。補正判定部１３は、カメラ画像の特徴量と登録画像の特徴量との差分を評価して、差分が所定の閾値以上の場合に補正処理を有効にし、差分が所定の閾値未満の場合に補正処理を無効にしてもよい。 For example, in the above embodiment, the correction determination unit 13 compares the feature amounts of the camera image and the registered image and evaluates the degree of match, but this is not limited to this. The correction determination unit 13 may evaluate the difference between the feature amounts of the camera image and the registered image, and enable correction processing when the difference is equal to or greater than a predetermined threshold, and disable correction processing when the difference is less than the predetermined threshold.

また、例えば、上記の実施形態では、補正判定部１３は、カメラ画像と身だしなみが整った状態の画像との一致度が所定の閾値以上の場合に、補正処理を無効にするがこれに限られない。登録画像データベース１１に身だしなみが整っていない状態の画像をユーザの基準画像として格納してもよい。この場合、補正判定部１３は、カメラ画像と身だしなみが整っていない状態の画像との一致度が所定の閾値未満の場合に補正処理を無効にし、一致度が所定の閾値以上の場合に補正処理を有効にするようにしてもよい。 Also, for example, in the above embodiment, the correction determination unit 13 disables the correction process when the degree of match between the camera image and the image of a well-groomed state is equal to or greater than a predetermined threshold, but this is not limited to this. An image of an unkempt state may be stored in the registered image database 11 as a reference image of the user. In this case, the correction determination unit 13 may disable the correction process when the degree of match between the camera image and the image of an unkempt state is less than a predetermined threshold, and enable the correction process when the degree of match is equal to or greater than the predetermined threshold.

＜付記１＞
（１）ユーザの撮像画像および前記ユーザの基準画像の特徴量を抽出する抽出部（１２）と、
前記ユーザの撮像画像の特徴量と前記ユーザの基準画像の特徴量とを照合した結果に基づいて、前記ユーザの撮像画像に対する補正処理を有効にするか無効にするかを判定する判定部（１３）と、
前記補正処理を有効にすると判定した場合に、前記ユーザの撮像画像の補正画像を生成する補正部（１４）と、
前記補正処理を有効にすると判定された場合は、前記補正画像を出力し、前記補正処理を無効にすると判定された場合は、補正されていない前記ユーザの撮像画像を出力する出力部（１５）と、
を備えることを特徴とする画像処理装置（１）。 <Appendix 1>
(1) an extraction unit (12) that extracts features of a captured image of a user and a reference image of the user;
a determination unit (13) that determines whether to enable or disable a correction process for the captured image of the user based on a result of comparing a feature amount of the captured image of the user with a feature amount of a reference image of the user;
a correction unit (14) that generates a corrected image of the captured image of the user when it is determined that the correction process is enabled;
an output unit (15) that outputs the corrected image when it is determined that the correction process is enabled, and outputs an uncorrected captured image of the user when it is determined that the correction process is disabled;
An image processing device (1).

（２）ユーザの撮像画像および前記ユーザの基準画像の特徴量を抽出する抽出ステップと（Ｓ１０３）、
前記ユーザの撮像画像の特徴量と前記ユーザの基準画像の特徴量とを照合した結果に基づいて、前記ユーザの撮像画像に対する補正処理を有効にするか無効にするかを判定する判定ステップと（Ｓ１０４～Ｓ１０７）、
前記補正処理を有効にすると判定した場合に、前記ユーザの撮像画像の補正画像を生成する補正ステップと（Ｓ１０８、Ｓ１０９）、
前記補正処理を有効にすると判定された場合は、前記補正画像を出力し、前記補正処理を無効にすると判定された場合は、補正されていない前記ユーザの撮像画像を出力する出力ステップと（Ｓ１１０）、
を含むことを特徴とする画像処理方法。 (2) an extraction step of extracting features of a captured image of a user and a reference image of the user (S103);
a determination step of determining whether to enable or disable a correction process for the captured image of the user based on a result of comparing the feature amount of the captured image of the user with the feature amount of a reference image of the user (S104 to S107);
a correction step of generating a corrected image of the captured image of the user when it is determined that the correction process is enabled (S108, S109);
an output step of outputting the corrected image when it is determined that the correction process is to be enabled, and outputting an uncorrected captured image of the user when it is determined that the correction process is to be disabled (S110);
13. An image processing method comprising:

１：画像処理装置、１０：撮像部、１１：登録画像データベース、１２：特徴抽出部、１３：補正判定部、１４：補正処理部、１５：出力部 1: Image processing device, 10: Imaging unit, 11: Registered image database, 12: Feature extraction unit, 13: Correction determination unit, 14: Correction processing unit, 15: Output unit

Claims

an extraction unit that extracts feature amounts of a captured image of a user's face and a reference image of the user's face ;
a determination unit that determines whether to enable or disable a correction process for the captured image of the user's face based on a result of comparing a feature amount of the captured image of the user's face with a feature amount of a reference image of the user's face ;
a correction unit that generates a corrected image of the captured image of the face of the user when it is determined that the correction process is enabled;
an output unit that outputs the corrected image when it is determined that the correction process is enabled, and outputs an uncorrected captured image of the user's face when it is determined that the correction process is disabled;
Equipped with
The image processing device according to claim 1, wherein the reference image of the user's face is an image of the user in an ungroomed state.

The image processing apparatus according to claim 1 , wherein the feature amount is a feature amount of at least one of eyebrows and cheeks of the user.

3. The image processing apparatus according to claim 1, wherein the feature amount includes at least one of a Haar-like feature amount, a color histogram, and a color moment.

4. The image processing device according to claim 1, wherein the feature amount is calculated by an algorithm using a learning model that has been trained on images of the user in a well-groomed state and images of the user in an ungroomed state.

The image processing device according to any one of claims 1 to 4, characterized in that the determination unit calculates a degree of similarity between features of a captured image of the user's face and features of a reference image of the user's face , and determines to disable the correction process if the degree of similarity is less than a predetermined threshold, and to enable the correction process if the degree of similarity is equal to or greater than a predetermined threshold.

The image processing apparatus according to claim 5 , wherein the correction unit changes an amount of correction applied to the captured image of the user's face depending on the degree of coincidence.

The image processing device according to claim 1 , wherein the correction unit generates the corrected image based on a captured image of the user's face and a reference image of the user's face .

The image processing device according to claim 7 , wherein the correction unit generates the corrected image by a Generative Adversarial Network (GAN) that has been made to learn the captured image of the user's face and a reference image of the user's face .

The image processing device according to claim 7, characterized in that the correction unit generates the corrected image by cutting out eyebrows, eyes, mouth or the entire face of the reference image of the user's face and replacing the corresponding part of the captured image of the user's face with an image cut out of the corresponding part.

7. The image processing device according to claim 1, wherein the correction unit generates the corrected image by performing a filter process for removing noise or a saturation adjustment on the captured image of the user's face based on facial feature information.

The image processing device according to claim 1 , wherein the determination unit determines whether to enable or disable the correction process for each predetermined number of frames of a captured image of the user's face.

The image processing device according to claim 1 , further comprising an imaging unit that captures an image of the face of the user.

An extraction step of extracting feature amounts of a captured image of a user's face and a reference image of the user's face ;
a determination step of determining whether to enable or disable a correction process for the captured image of the user's face based on a result of comparing a feature amount of the captured image of the user's face with a feature amount of a reference image of the user's face ;
a correction step of generating a corrected image of the captured image of the face of the user when it is determined that the correction process is enabled;
an output step of outputting the corrected image when it is determined that the correction process is to be enabled, and outputting an uncorrected captured image of the user's face when it is determined that the correction process is to be disabled;
Including,
An image processing method, characterized in that the reference image of the user's face is an image of the user in an ungroomed state.

A program for causing a computer to execute each step of the method according to claim 13.