WO2025191769A1

WO2025191769A1 - Image retrieval device, image retrieval method, image retrieval program, and recording medium

Info

Publication number: WO2025191769A1
Application number: PCT/JP2024/009927
Authority: WO
Inventors: 弘貴櫻庭
Original assignee: Rights Tech Inc
Current assignee: Rights Tech Inc
Priority date: 2024-03-14
Filing date: 2024-03-14
Publication date: 2025-09-18
Anticipated expiration: 2026-09-14

Abstract

This image retrieval device comprises: an input/output and image comparison unit for storing face image data representing a face image of a user registered by the user and a keyword registered by the user as the cause of leakage of the face image; and an image extraction unit for extracting a plurality of pieces of still-image data from moving-image data acquired at a website that is a sexual site on the Internet and contains the keyword registered by the user, and acquiring, from the extracted still-image data, image data in which face information is included. The input/output and image comparison unit is configured to compare the image data that is acquired by the image extraction unit and contains face information with the face image data that is registered by the user and stored in the input/output and the image comparison unit, and to extract the URL of face image data having a high degree of similarity. The image extraction unit is configured to extract still-image data from the moving-image data using a plurality of mutually different processing procedures, and to integrate the plurality of pieces of still-image data extracted using the plurality of mutually different processing procedures.

Description

Image search device, image search method, image search program, and recording medium

　本発明は、インターネット上に流出したユーザの画像を検索する画像検索装置、画像検索方法、画像検索プログラム及び画像検索プログラムを記録した記録媒体に関する。 The present invention relates to an image search device, an image search method, an image search program, and a recording medium on which an image search program is recorded, for searching for user images that have been leaked onto the Internet.

　近年、カメラの高性能化やスマートフォンの普及などにより、盗撮被害やリベンジポルノ被害が増加する傾向にある。また、アダルトサイトやＳＮＳ（ソーシャルネットワーキングサービス）などを通じて、写真や動画を気軽に投稿することができるため、投稿されたユーザ自身の写真や動画が、ユーザの許可なくアップロードされ、インターネット上の性的サイトに流出されてしまうことがある。 In recent years, with the increasing performance of cameras and the widespread use of smartphones, there has been an increase in the number of cases of voyeurism and revenge porn. Furthermore, because it is easy to post photos and videos on adult sites and social networking services (SNS), users' own photos and videos can sometimes be uploaded without their permission and leaked to pornographic websites on the Internet.

　特許文献１には、依頼者の意図に反してインターネット上に公開された依頼者自身の個人画像を検索する技術が開示されている。この画像検索技術は、インターネット上の全ての公開画像から人物の部分の特徴量データを抽出すると共に依頼者の個人画像の特徴量データを抽出し、これら抽出した特徴量データを互いに照合することにより、依頼者の個人画像と複数の公開画像とが同じ画像かどうか判断して依頼者の個人画像を検索するものである。 Patent Document 1 discloses technology for searching for personal images of a client that have been made public on the Internet against the client's will. This image search technology extracts feature data of people from all public images on the Internet, as well as feature data of the client's personal image, and compares these extracted feature data with each other to determine whether the client's personal image is the same as multiple public images, thereby searching for the client's personal image.

特許第５１５０５７２号公報Patent No. 5150572

　特許文献１に記載された画像検索技術は、インターネット全体を検索して、依頼者自身の画像が存在するか否かを照合、検索するものである。しかしながら、インターネット上に掲載されている画像は膨大な量であり、その全ての公開画像から人物の部分を抽出し、抽出した人物部分から特徴量を抽出して照合することは、著しく多大な演算資産を要するのみならず、著しく多大な時間を要するため、簡易な構成で画像検索を実現することは極めて難しかった。 The image search technology described in Patent Document 1 searches the entire Internet to check whether or not an image of the requester exists. However, the amount of images posted on the Internet is enormous, and extracting features of people from all of these public images, extracting features from the extracted features, and then matching them not only requires a significant amount of computing resources, but also takes a significant amount of time, making it extremely difficult to implement image search with a simple configuration.

　さらに、特許文献１に記載された画像検索技術は、依頼者のプライバシーの侵害を防止するための検索技術であることから、依頼者の盗撮被害やリベンジポルノ被害を防止するために、依頼者の顔画像の流出を、特にその初期段階で、迅速に検出することはできなかった。 Furthermore, the image search technology described in Patent Document 1 is a search technology designed to prevent the violation of the client's privacy, and therefore is unable to quickly detect the leaking of the client's facial image, especially in the early stages, in order to prevent the client from becoming a victim of voyeurism or revenge porn.

　従って本発明の目的は、ユーザの顔画像がインターネット上に流出したことを簡易な構成で確実に検出できる画像検索装置、画像検索方法、画像検索プログラム及び記録媒体を提供することにある。 Therefore, an object of the present invention is to provide an image search device, image search method, image search program, and recording medium that can reliably detect when a user's facial image has been leaked onto the Internet using a simple configuration.

　本発明の他の目的は、ユーザの顔画像がインターネット上に流出したことを初期段階で素早く検出できる画像検索装置、画像検索方法、画像検索プログラム及び記録媒体を提供することにある。 Another object of the present invention is to provide an image search device, an image search method, an image search program, and a recording medium that can quickly detect, at an early stage, that a user's facial image has been leaked onto the Internet.

　本発明によれば、ユーザが登録した自己の顔画像を表す顔画像データ及び顔画像が流出した原因としてユーザが登録したキーワードを記憶する入出力及び画像比較部と、インターネット上の性的サイトでありかつユーザが登録したキーワードを含むＷＥＢサイトで取得した動画データから複数の静止画データを抽出し、抽出した静止画データから顔情報が含まれる画像データを取得する画像抽出部とを備えている画像検索装置が提供される。入出力及び画像比較部は、画像抽出部によって取得された顔情報が含まれる画像データと、この入出力及び画像比較部に記憶されているユーザの登録した顔画像データとを比較し、高い類似度を有する顔画像データのＵＲＬ（ウェブページアドレス）を抽出するように構成されており、画像抽出部は、複数の互いに異なる処理手順を用いて動画データから静止画データを抽出し、これら複数の互いに異なる処理手順によって抽出した静止画データを統合するように構成されている。 The present invention provides an image search device that includes an input/output and image comparison unit that stores facial image data representing the user's own facial image registered by the user and keywords registered by the user as the cause of the facial image leak, and an image extraction unit that extracts multiple still image data from video data acquired from websites that are sexually explicit on the Internet and include keywords registered by the user, and acquires image data containing facial information from the extracted still image data. The input/output and image comparison unit is configured to compare the image data containing facial information acquired by the image extraction unit with the user's registered facial image data stored in the input/output and image comparison unit, and extract the URL (web page address) of facial image data that has a high degree of similarity, and the image extraction unit is configured to extract still image data from the video data using multiple different processing procedures and integrate the still image data extracted by these multiple different processing procedures.

　本発明では、顔画像流出の原因となるとして登録したキーワードを含むＷＥＢサイトで動画データを取得している。インターネット全体を検索することなく、盗撮被害やリベンジポルノ被害を防止するためのＷＥＢサイトのみを検索している。このため、インターネット上に流出したユーザの顔画像をその初期段階で迅速に検出することができると共に画像検索装置の構成が簡易となる。また、本発明では、取得した動画データから複数の静止画データを抽出し、抽出した静止画データから顔情報が含まれる画像データを取得する際に、複数の互いに異なる処理手順を用いて動画データから静止画データを抽出し、これら複数の互いに異なる処理手順によって抽出した静止画データを統合している。このように、処理手順が異なることにより最終的に得られる画像も異なってくることから、１つの手順のみによって顔画像の抽出を行った場合に比べて顔画像検出の確実性が大幅に高くなる。このため、ユーザの顔画像がインターネット上に流出したことを簡易な構成で確実に検出することができる。 In this invention, video data is obtained from websites that contain keywords registered as being the cause of facial image leaks. Instead of searching the entire Internet, only websites for preventing voyeurism and revenge pornography are searched. This allows for rapid detection of user facial images leaked on the Internet at an early stage and simplifies the configuration of the image search device. Furthermore, in this invention, multiple still image data are extracted from the acquired video data, and when image data containing facial information is obtained from the extracted still image data, multiple different processing procedures are used to extract still image data from the video data, and the still image data extracted by these multiple different processing procedures is integrated. In this way, different processing procedures result in different final images, making facial image detection significantly more reliable than when facial images are extracted using only a single procedure. This makes it possible to reliably detect when a user's facial image has been leaked on the Internet with a simple configuration.

　画像抽出部による複数の互いに異なる処理手順が、動画データを高速で再生し、機械学習モデルによって顔が映っていると判定された静止画データを抽出する処理と、動画データを高速で再生し、任意の一定フレーム間隔で静止画データを抽出する処理とを含んでいることが好ましい。 It is preferable that the multiple different processing procedures performed by the image extraction unit include a process of playing back video data at high speed and extracting still image data determined by the machine learning model to contain a face, and a process of playing back video data at high speed and extracting still image data at any fixed frame interval.

　画像抽出部による複数の互いに異なる処理手順が、動画データを高速で再生し、機械学習モデルによって顔が映ったと判定した場合の静止画を抽出する処理と、動画データを高速で再生し、機械学習モデルによって最も顔が映っていると判定したタイミングの顔の静止画を抽出する処理と、動画データを高速で再生し、機械学習モデルによって最も顔が映っていると判定したタイミングの顔の特徴量を抽出して顔の静止画を抽出する処理と、動画データを高速で再生し、一定フレーム間隔で動画全体の静止画を抽出し、抽出した静止画毎に機械学習モデルによって顔が映っている確率が最も高い静止画を抽出する処理とを含んでいることも好ましい。 It is also preferable that the multiple different processing procedures performed by the image extraction unit include a process of playing back video data at high speed and extracting a still image when the machine learning model determines that a face is visible; a process of playing back video data at high speed and extracting a still image of a face at the timing determined by the machine learning model that a face is most visible; a process of playing back video data at high speed and extracting facial features at the timing determined by the machine learning model that a face is most visible, thereby extracting a still image of the face; and a process of playing back video data at high speed and extracting still images of the entire video at regular frame intervals, and extracting the still image with the highest probability of showing a face for each extracted still image using the machine learning model.

　画像抽出部が、顔情報が含まれる画像データのコンフィデンス値を閾値と比較して顔画像が最も映っている画像データを抽出するように構成されていることも好ましい。 It is also preferable that the image extraction unit is configured to compare the confidence value of image data containing facial information with a threshold value and extract image data that most closely resembles a facial image.

　この場合、画像抽出部が、顔情報が含まれる画像データを先頭から所定数毎に分割し、分割した所定数の画像データの中で最も良い画像データを選択し、選択した画像データのコンフィデンス値を閾値と比較するように構成されていることがより好ましい。 In this case, it is more preferable that the image extraction unit be configured to divide the image data containing facial information into a predetermined number of pieces starting from the beginning, select the best image data from the predetermined number of divided pieces of image data, and compare the confidence value of the selected image data with a threshold value.

　インターネット上の性的サイトでありかつユーザが登録したキーワードを含むＷＥＢサイトのＵＲＬを収集するＵＲＬクローラ部をさらに備えており、画像抽出部はこのＵＲＬクローラ部が収集したＵＲＬに対応するＷＥＢサイトで動画データを取得するように構成されていることも好ましい。 It is also preferable that the system further includes a URL crawler unit that collects URLs of websites on the Internet that are sexually explicit and contain keywords registered by the user, and that the image extraction unit is configured to obtain video data from websites that correspond to the URLs collected by the URL crawler unit.

　入出力及び画像比較部が、ユーザ端末から送信された顔画像データ及びキーワードを記憶するように構成されており、抽出したＵＲＬをユーザ端末に送信するように構成されていることも好ましい。 It is also preferable that the input/output and image comparison unit is configured to store facial image data and keywords sent from the user terminal, and to send the extracted URL to the user terminal.

　本発明によれば、さらに、ユーザが登録した自己の顔画像を表す顔画像データ及び顔画像が流出した原因としてユーザが登録したキーワードを記憶し、インターネット上の性的サイトでありかつユーザが登録したキーワードを含むＷＥＢサイトで取得した動画データから複数の静止画データを抽出し、抽出した静止画データから顔情報が含まれる画像データを取得する画像検索方法が提供される。取得した顔情報が含まれる画像データと、ユーザの登録した顔画像データとを比較し、高い類似度を有する顔画像データのＵＲＬを抽出し、複数の互いに異なる処理手順を用いて動画データから静止画データを抽出し、複数の互いに異なる処理手順によって抽出した静止画データを統合する。 The present invention further provides an image search method that stores facial image data representing the user's own facial image registered by the user and keywords registered by the user as the cause of the facial image leak, extracts multiple still image data from video data acquired from websites on the Internet that are sexually explicit and include the keywords registered by the user, and acquires image data containing facial information from the extracted still image data. The acquired image data containing the facial information is compared with the facial image data registered by the user, URLs of facial image data with a high degree of similarity are extracted, still image data is extracted from the video data using multiple different processing procedures, and the still image data extracted using the multiple different processing procedures is integrated.

　複数の互いに異なる処理手順が、動画データを高速で再生し、機械学習モデルによって顔が映っていると判定された静止画データを抽出する処理と、動画データを高速で再生し、任意の一定フレーム間隔で静止画データを抽出する処理とを含んでいることが好ましい。 It is preferable that the multiple different processing procedures include a process of playing back video data at high speed and extracting still image data determined by the machine learning model to contain a face, and a process of playing back video data at high speed and extracting still image data at any fixed frame interval.

　複数の互いに異なる処理手順が、動画データを高速で再生し、機械学習モデルによって顔が映ったと判定した場合の静止画を抽出する処理と、動画データを高速で再生し、機械学習モデルによって最も顔が映っていると判定したタイミングの顔の静止画を抽出する処理と、動画データを高速で再生し、機械学習モデルによって最も顔が映っていると判定したタイミングの顔の特徴量を抽出して顔の静止画を抽出する処理と、動画データを高速で再生し、一定フレーム間隔で動画全体の静止画を抽出し、抽出した静止画毎に機械学習モデルによって顔が映っている確率が最も高い静止画を抽出する処理とを含んでいることも好ましい。 It is also preferable that the multiple different processing procedures include a process of playing back video data at high speed and extracting a still image when a machine learning model determines that a face is shown; a process of playing back video data at high speed and extracting a still image of a face at the timing determined by the machine learning model that a face is most likely to be shown; a process of playing back video data at high speed and extracting facial features at the timing determined by the machine learning model that a face is most likely to be shown, thereby extracting a still image of the face; and a process of playing back video data at high speed and extracting still images from the entire video at regular frame intervals, and extracting the still image with the highest probability of showing a face for each extracted still image using the machine learning model.

　顔情報が含まれる画像データのコンフィデンス値を閾値と比較して顔画像が最も映っている画像データを抽出することも好ましい。 It is also preferable to compare the confidence value of image data containing facial information with a threshold value to extract the image data that most closely resembles a facial image.

　この場合、顔情報が含まれる画像データを先頭から所定数毎に分割し、分割した所定数の画像データの中で最も精度の高い画像データを選択し、選択した画像データのコンフィデンス値を閾値と比較することがより好ましい。 In this case, it is more preferable to divide the image data containing facial information into a predetermined number of parts starting from the beginning, select the image data with the highest accuracy from among the predetermined number of divided image data parts, and compare the confidence value of the selected image data with a threshold value.

　インターネット上の性的サイトでありかつユーザが登録したキーワードを含むＷＥＢサイトのＵＲＬを収集し、収集したＵＲＬに対応するサイトで動画データを取得することも好ましい。 It is also preferable to collect URLs of websites on the Internet that are sexually explicit and contain keywords registered by the user, and obtain video data from the sites corresponding to the collected URLs.

　ユーザ端末から送信された顔画像データ及びキーワードを記憶し、抽出したＵＲＬをユーザ端末に送信することも好ましい。 It is also preferable to store the facial image data and keywords sent from the user terminal and send the extracted URL to the user terminal.

　本発明によれば、またさらに、コンピュータを、ユーザが登録した自己の顔画像を表す顔画像データ及び顔画像が流出した原因としてユーザが登録したキーワードを記憶する入出力及び画像比較手段と、インターネット上の性的サイトでありかつユーザが登録したキーワードを含むＷＥＢサイトで取得した動画データから複数の静止画データを抽出し、抽出した静止画データから顔情報が含まれる画像データを取得する画像抽出手段として機能させるためのプログラムが提供される。入出力及び画像比較手段は、取得した顔情報が含まれる画像データと、ユーザの登録した顔画像データとを比較し、高い類似度を有する顔画像データのＵＲＬを抽出する手段であり、画像抽出手段は、複数の互いに異なる処理手順を用いて動画データから静止画データを抽出し、複数の互いに異なる処理手順によって抽出した静止画データを統合する手段である。 The present invention also provides a program for causing a computer to function as an input/output and image comparison means for storing facial image data representing a user's own facial image registered by the user and keywords registered by the user as the cause of the facial image leak, and an image extraction means for extracting multiple still image data from video data acquired from a website on the Internet that is a sexually explicit site and includes keywords registered by the user, and acquiring image data containing facial information from the extracted still image data. The input/output and image comparison means compares the acquired image data containing facial information with the facial image data registered by the user and extracts the URL of facial image data that has a high degree of similarity, and the image extraction means extracts still image data from video data using multiple different processing procedures and integrates the still image data extracted by the multiple different processing procedures.

　本発明によれば、さらに、コンピュータを、ユーザが登録した自己の顔画像を表す顔画像データ及び顔画像が流出した原因としてユーザが登録したキーワードを記憶する入出力及び画像比較手段と、インターネット上の性的サイトでありかつユーザが登録したキーワードを含むＷＥＢサイトで取得した動画データから複数の静止画データを抽出し、抽出した静止画データから顔情報が含まれる画像データを取得する画像抽出手段として機能させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体が提供される。入出力及び画像比較手段は、取得した顔情報が含まれる画像データと記憶したユーザの登録した顔画像データとを比較し、高い類似度を有する顔画像データのＵＲＬを抽出する手段であり、画像抽出手段は、複数の互いに異なる処理手順を用いて動画データから静止画データを抽出し、複数の互いに異なる処理手順によって抽出した静止画データを統合する手段である。 The present invention also provides a computer-readable recording medium having recorded thereon a program for causing a computer to function as an input/output and image comparison means for storing facial image data representing the user's own facial image registered by the user and keywords registered by the user as the cause of the facial image leak, and an image extraction means for extracting multiple still image data from video data acquired from websites on the Internet that are sexually explicit and include keywords registered by the user, and acquiring image data containing facial information from the extracted still image data. The input/output and image comparison means compares the acquired image data containing facial information with stored facial image data registered by the user, and extracts the URL of facial image data that has a high degree of similarity, and the image extraction means extracts still image data from video data using multiple different processing procedures and integrates the still image data extracted by the multiple different processing procedures.

　本発明では、盗撮被害やリベンジポルノ被害を防止するためのＷＥＢサイトのみが検索されるため、インターネット上に流出したユーザの顔画像をその初期段階で迅速に検出することができると共に画像検索装置の構成が簡易となる。また、処理手順が異なることにより最終的に得られる画像も異なってくることから、顔画像検出の確実性が大幅に高くなるため、ユーザの顔画像がインターネット上に流出したことを簡易な構成で確実に検出することができる。 In this invention, only websites for preventing voyeurism and revenge pornography are searched, making it possible to quickly detect a user's facial image leaked onto the Internet at an early stage and simplifying the configuration of the image search device. Furthermore, since different processing procedures result in different images being ultimately obtained, the reliability of facial image detection is significantly increased, making it possible to reliably detect when a user's facial image has been leaked onto the Internet with a simple configuration.

本発明の一実施形態における画像検索装置の全体構成を概略的に示すブロック図である。1 is a block diagram schematically illustrating the overall configuration of an image search device according to an embodiment of the present invention. 図１の画像検索装置の機能を説明する説明図である。FIG. 2 is an explanatory diagram illustrating the function of the image search device of FIG. 1; 図１の画像検索装置における入出力及び画像比較部の処理動作を示すフローチャートである。2 is a flowchart showing the processing operation of an input/output and image comparison unit in the image search device of FIG. 1 . 図１の画像検索装置における入出力及び画像比較部の処理動作を示すフローチャートである。2 is a flowchart showing the processing operation of an input/output and image comparison unit in the image search device of FIG. 1 . 図１の画像検索装置におけるＵＲＬクローラ部の処理動作を示すフローチャートである。2 is a flowchart showing the processing operation of a URL crawler unit in the image search device of FIG. 1 . 図１の画像検索装置における静止画クローラ部の処理動作を示すフローチャートである。10 is a flowchart showing the processing operation of a still image crawler unit in the image search device of FIG. 1 . 図１の画像検索装置における動画クローラ部の処理動作を示すフローチャートである。10 is a flowchart showing the processing operation of a video crawler unit in the image search device of FIG. 1 . 図１の画像検索装置における動画クローラ部の処理動作の一部の変更態様を示すフローチャートである。10 is a flowchart showing a modified example of a part of the processing operation of the video crawler unit in the image search device of FIG. 1 . ユーザ端末における自己の顔画像の登録画面を表す図である。FIG. 10 is a diagram illustrating a registration screen for a user's own face image on a user terminal. ユーザ端末におけるキーワードの登録画面を表す図である。FIG. 10 is a diagram illustrating a keyword registration screen on a user terminal. ユーザ端末における検索結果表示画面を表す図である。FIG. 10 is a diagram illustrating a search result display screen on a user terminal.

　図１は本発明の一実施形態における画像検索装置の全体構成を概略的に示しており、図２は本実施形態の画像検索装置の機能を説明している。 FIG. 1 shows a schematic diagram of the overall configuration of an image search device according to one embodiment of the present invention, and FIG. 2 explains the functions of the image search device according to this embodiment.

　図１において、１０はユーザが操作する例えばスマートフォン、タブレット端末又はコンピュータ端末等から構成され、本実施形態の画像検索アプリがインストールされているユーザ端末、１１はユーザ端末１０と通信が可能なクラウド上の画像検索サーバ、１２は画像検索サーバ１１がアクセス可能なインターネットをそれぞれ示している。 In Figure 1, 10 denotes a user terminal, such as a smartphone, tablet terminal, or computer terminal operated by a user, on which the image search app of this embodiment is installed, 11 denotes an image search server on the cloud that can communicate with the user terminal 10, and 12 denotes the Internet, which can be accessed by the image search server 11.

　画像検索サーバ１１は、図１に示すように、入出力及び画像比較部１３と、クラウドサーバ１４と、ＵＲＬクローラ部１５と、静止画クローラ部１６と、動画クローラ部１７とを少なくとも備えている。 As shown in Figure 1, the image search server 11 includes at least an input/output and image comparison unit 13, a cloud server 14, a URL crawler unit 15, a still image crawler unit 16, and a video crawler unit 17.

　図２に示すように、入出力及び画像比較部１３は、クラウドコンピューティングサービス（ＡＷＳ）によるデータベース（ＡＷＳ／ＲＤＳ）１３ａ及びストレージ（ＡＷＳ／Ｓ３）１３ｂを備えている。この入出力及び画像比較部１３は、ユーザ端末１０に通信接続可能であり、ユーザ端末１０から送信されたユーザの顔画像データ及びこの顔画像データに紐づけてユーザが登録したキーワードを記憶するように構成されている。入出力及び画像比較部１３は、さらに、クラウドコンピューティングサービスによるクラウドサーバ（Ａｚｕｒｅ）１４が抽出した、ユーザの顔画像と類似度の高い顔画像が存在する動画のＵＲＬをユーザに紐づけて記憶するように構成されている。 As shown in FIG. 2, the input/output and image comparison unit 13 includes a database (AWS/RDS) 13a and storage (AWS/S3) 13b provided by a cloud computing service (AWS). This input/output and image comparison unit 13 is communicatively connectable to the user terminal 10, and is configured to store the user's facial image data transmitted from the user terminal 10 and keywords registered by the user in association with this facial image data. The input/output and image comparison unit 13 is further configured to store, in association with the user, the URLs of videos containing facial images highly similar to the user's facial image, extracted by a cloud server (Azure) 14 provided by the cloud computing service.

　前述したように、ユーザ端末１０から送信されたユーザの顔画像データはストレージ（ＡＷＳ／Ｓ３）１３ｂに、ユーザが登録したキーワードはデータベース（ＡＷＳ／ＲＤＳ）１３ａにそれぞれ記憶されている。クラウドサーバ（Ａｚｕｒｅ）１４は、静止画クローラ部１６又は動画クローラ部１７が収集した画像を学習してＦａｃｅ　ＩＤを付与すると共に、これら収集した画像とユーザが登録した画像とを比較し、その類似度を求めるように構成されている。 As mentioned above, the user's facial image data sent from the user terminal 10 is stored in storage (AWS/S3) 13b, and the keywords registered by the user are stored in database (AWS/RDS) 13a. The cloud server (Azure) 14 is configured to learn the images collected by the still image crawler unit 16 or video crawler unit 17, assign a Face ID, and compare these collected images with images registered by the user to determine their similarity.

　ＵＲＬクローラ部１５は、ＡＷＳによるクラウドサーバ（ＡＷＳ／ＥＣ２）１５ａ上に存在するＷｅｂクローラの１つを利用した自動巡回プログラムであるライブラリＳｅｌｅｎｉｕｍを使用している。このＵＲＬクローラ部１５は、インターネット１２上の性的な動画を扱っておりかつ顔画像流出の可能性のあるあらかじめ決められた複数のＷＥＢサイトについて、登録されたキーワードを含む動画のＵＲＬを収集するように構成されている。あらかじめ決められたＷＥＢサイトとしては、例えば、Ｐｏｒｎｈｕｂ、ＦＣ２、Ｔｏｋｙｏ　Ｍｏｔｉｏｎ、Ｔｗｉｔｔｅｒ　Ｖｉｄｅｏ　Ｔｏｏｌｓ、ＸＶＩＤＥＯＳ、Ｔｗｉｔｔｅｒ等が存在する。収集されたＵＲＬのリストは、ＵＲＬクローラ部１５内のＵＲＬリスト部１５ｂに記憶される。 The URL crawler unit 15 uses Library Selenium, an automatic crawling program that utilizes one of the web crawlers present on the AWS cloud server (AWS/EC2) 15a. This URL crawler unit 15 is configured to collect URLs of videos containing registered keywords from multiple predetermined websites on the Internet 12 that deal with sexually explicit videos and have the potential for facial image leaks. Examples of predetermined websites include Pornhub, FC2, Tokyo Motion, Twitter Video Tools, XVIDEOS, and Twitter. A list of the collected URLs is stored in the URL list unit 15b within the URL crawler unit 15.

　ＵＲＬリスト部１５ｂは、ＡＷＳによるデータベース（ＡＷＳ／ＲＤＳ）とローカルのファイル（ローカルのプログラムファイル内のコンフィグファイル）の記憶部とによって構成されており、収集されたＵＲＬはこのＲＤＳとローカルファイル記憶部とに記憶される。ＲＤＳに書き込む理由は、動画クローラやフェイスシミラー等の種々のシステムからＵＲＬとそれに結び付く動画像データ等の情報にアクセスする要請があるためである。 The URL list unit 15b is made up of an AWS database (AWS/RDS) and a local file (configuration file within local program files) storage unit, and collected URLs are stored in this RDS and local file storage unit. The reason for writing to the RDS is that various systems such as video crawlers and face similarity systems require access to URLs and information such as associated video data.

　静止画クローラ部１６は、本実施形態においては、ＡＷＳによるストレージ（ＡＷＳ／Ｓ３）１６ａと、画像サーチ部１６ｂと、機械学習部１６ｃとから主に構築されている。 In this embodiment, the still image crawler unit 16 is primarily composed of AWS storage (AWS/S3) 16a, an image search unit 16b, and a machine learning unit 16c.

　画像サーチ部１６ｂは、ローカルのプログラムファイル内のコンフィグファイル内に記述されている検索エンジンを用い、入出力記憶部に記憶されている指定されたキーワードでＷＥＢサイトを画像検索し、表示される画像をストレージ（ＡＷＳ／Ｓ３）１６ａに記憶する。検索エンジンとしては主にＧｏｏｇｌｅ、Ｙａｎｄｅｘ、Ｙａｈｏｏ、Ｂａｉｄｕ等を利用している。 The image search unit 16b uses a search engine described in a configuration file within the local program files to perform an image search on websites using specified keywords stored in the input/output memory unit, and stores the displayed images in storage (AWS/S3) 16a. The search engines used are primarily Google, Yandex, Yahoo, Baidu, etc.

　機械学習部１６ｃは、深層学習アルゴリズムを利用し、人の顔検出に特化したライブラリＦａｃｅＢｏｘｅｓモデルを用いて構築されている。この機械学習モデルは、膨大な顔画像から人の顔の特徴について学習し、入力された未知の画像データに含まれる顔情報の検出を可能にしている。検出された顔情報はコンフィデンスという数値を伴い、この値が、検出された領域に顔が含まれている確実性を表す。つまり、この数値が高いほど、検出された領域が顔である可能性が高いということである。なお、ＦａｃｅＢｏｘｅｓモデルは、一般に公開されている機械学習モデルのライブラリであり、西洋人の顔画像を学習して構築されているため、後述するように、アジア人の顔画像でファインチューニングを実行したモデルを使用し、オリジナルのモデルに比して顔画像の検出精度を高めている。 The machine learning unit 16c utilizes a deep learning algorithm and is built using the FaceBoxes model, a library specialized for human face detection. This machine learning model learns about human facial features from a vast number of face images, making it possible to detect facial information contained in unknown input image data. The detected facial information is accompanied by a numerical value called confidence, which indicates the certainty that a face is contained in the detected area. In other words, the higher this numerical value, the more likely the detected area is to be a face. The FaceBoxes model is a publicly available machine learning model library and was built by learning from images of Western faces. As described below, a model that has been fine-tuned using images of Asian faces is used to improve the accuracy of facial image detection compared to the original model.

　機械学習部１６ｃは、さらに、コンフィデンス値があらかじめ定めた閾値を超える全ての画像データを、ストレージ（ＡＷＳ／Ｓ３）１６ａに記憶するように構成されている。 The machine learning unit 16c is further configured to store all image data whose confidence value exceeds a predetermined threshold in storage (AWS/S3) 16a.

　動画クローラ部１７は、本実施形態においては、クラウドサーバ（ＡＷＳ／ＥＣ２）１５ａに存在する動画クローラと、ＡＷＳによるストレージ（ＡＷＳ／Ｓ３）１７ａと、画像抽出部１７ｂと、前述した機械学習部１６ｃとから主に構築されている。 In this embodiment, the video crawler unit 17 is primarily composed of a video crawler located on the cloud server (AWS/EC2) 15a, AWS storage (AWS/S3) 17a, an image extraction unit 17b, and the aforementioned machine learning unit 16c.

　動画クローラは、ローカルのプログラムファイル内のコンフィグファイル内に記述されている検索エンジン（Ｇｏｏｇｌｅ　Ｃｈｒｏｍｅ）を用いてＵＲＬリスト部１５ｂに記憶されているＵＲＬにおける多数の動画を再生し、１つの動画データに対して、膨大な数の静止画データを取得する。そして、得られた画像群に対して、人の顔検出に特化したライブラリＦａｃｅＢｏｘｅｓモデルを用いて顔情報が含まれる画像のみを取得し、その中でも最もコンフィデンス値が高い（顔が最も鮮明に映っている）１枚の画像を選出するように構成されている。 The video crawler uses a search engine (Google Chrome) described in a configuration file within the local program files to play numerous videos at URLs stored in the URL list section 15b, and obtains a huge number of still image data for each video data. It then uses the FaceBoxes model, a library specialized for human face detection, to obtain only images containing facial information from the resulting image group, and is configured to select the image with the highest confidence value (the image with the clearest face).

　ＵＲＬリスト部１５ｂに記憶されているＵＲＬにおける多数の動画から静止画を取得する方法として、動画をダウンロードして再生する方法と、ＷＥＢサイトをクローリングして再生する方法があるが、どちらを用いても良い。画像抽出部１７ｂは、取得した多数の動画像を再生し、多数の静止画を取得する。 There are two ways to obtain still images from the many videos at the URLs stored in the URL list unit 15b: downloading and playing the videos, or crawling and playing the videos from a website. Either method is acceptable. The image extraction unit 17b plays the many obtained videos and obtains many still images.

　機械学習部１６ｃは、深層学習アルゴリズムを利用し、人の顔検出に特化したライブラリＦａｃｅＢｏｘｅｓモデルを用いて構築されている。この機械学習モデルは、膨大な顔画像から人の顔の特徴について学習し、入力された未知の画像データに含まれる顔情報の検出を可能にする。この際に検出された顔情報はコンフィデンスという数値を伴い、この値が、検出された領域に顔が含まれている確実性を表す。つまり、この数値が高いほど、検出された領域が顔である可能性が高いということである。なお、ＦａｃｅＢｏｘｅｓモデルは、一般に公開されている機械学習モデルのライブラリであり、西洋人の顔画像を学習して構築されているため、アジア人に対しての顔の検出精度はさほど高くない。そこで、本実施形態では、独自に収集したアジア人の顔画像を学習データとしてこのＦａｃｅＢｏｘｅｓモデルに再度入力してモデルの重みを調整するファインチューニングを行っている。これにより、オリジナルのモデルに比して顔画像の検出精度を高めている。 The machine learning unit 16c utilizes a deep learning algorithm and is built using the FaceBoxes model, a library specialized for human face detection. This machine learning model learns about human facial features from a vast number of facial images, enabling it to detect facial information contained in input, unknown image data. The detected facial information is accompanied by a numerical value called confidence, which indicates the certainty that a face is contained in the detected area. In other words, the higher this numerical value, the more likely the detected area is to be a face. Note that the FaceBoxes model is a publicly available machine learning model library and was built by learning from images of Western faces, so its face detection accuracy for Asian people is not very high. Therefore, in this embodiment, independently collected facial images of Asian people are input back into the FaceBoxes model as training data, and fine-tuning is performed to adjust the model weights. This improves the face image detection accuracy compared to the original model.

　機械学習部１６ｃは、さらに、コンフィデンス値があらかじめ定めた閾値を超える全ての画像データをストレージ（ＡＷＳ／Ｓ３）１７ａに記憶するように構成されている。 The machine learning unit 16c is further configured to store all image data whose confidence value exceeds a predetermined threshold in storage (AWS/S3) 17a.

　図３及び図４は入出力及び画像比較部１３の処理動作を表しており、以下、これらの図を用いてこの入出力及び画像比較部１３の処理動作を説明する。 Figures 3 and 4 show the processing operations of the input/output and image comparison unit 13, and these figures will be used below to explain the processing operations of this input/output and image comparison unit 13.

　まず、最初に、ユーザが、ユーザ端末１０を介して自己の顔画像を登録する。即ち、図９に示すように、真顔のみならず笑顔や横顔を含む、できれば、複数の顔画像データを画像検索サーバ１１に送信する。さらに、図１０に示すように、「誰に」、「いつ」、「どこで」、「何をした」等を示すキーワードを、ユーザ端末１０を介して画像検索サーバ１１に送信する。例えば、撮影者（分かる場合）として「元カレ」、「同級生」、「リスナー」、「個人撮影」等のキーワード、撮影時の身分（分かる場合）として「ＪＤ」、「女子大生」、「会社員」、「デリヘル」、「パパ活」等のキーワード、撮影場所（分かる場合）として「ホテル」、「渋谷」、「新宿」、「お店の名前」、「ライブチャット」、「ハメ撮り」、「盗撮」等のキーワード、「マッチングアプリ」、「ＳＮＳ」、「Ｔｗｉｔｔｅｒ」、「コスプレ」等のキーワード、ユーザの源氏名や愛称等のキーワードを画像検索サーバ１１に送信する。 First, the user registers an image of their face via the user terminal 10. That is, as shown in FIG. 9, multiple pieces of face image data, including not only straight faces but also smiling and profile images, are sent to the image search server 11, if possible. Furthermore, as shown in FIG. 10, keywords indicating "who," "when," "where," "what," etc. are sent to the image search server 11 via the user terminal 10. For example, keywords such as "ex-boyfriend," "classmate," "listener," and "personal photo" are sent to the image search server 11 as the photographer (if known); keywords such as "JD," "female college student," "office worker," "delivery health," and "sugar daddy" are sent to the image search server 11 as the identity of the photographer (if known); keywords such as "hotel," "Shibuya," "Shinjuku," "shop name," "live chat," "amateur video," and "voyeur" are sent to the image search server 11 as the location of the photo (if known); keywords such as "matching app," "SNS," "Twitter," and "cosplay," and keywords such as the user's stage name or nickname.

　図３に示すように、入出力及び画像比較部１３は、ユーザ端末１０から送信された複数の顔画像データ及びこの顔画像データと共に送信されたキーワードを受信する（ステップＳ１）。 As shown in FIG. 3, the input/output and image comparison unit 13 receives multiple pieces of facial image data sent from the user terminal 10 and keywords sent together with this facial image data (step S1).

　次いで、入出力及び画像比較部１３は、受信したユーザの顔画像データをストレージ（ＡＷＳ／Ｓ３）１３ｂに記憶し、これに紐づけてユーザが登録したキーワードをデータベース（ＡＷＳ／ＲＤＳ）１３ａに記憶する（ステップＳ２）。 Next, the input/output and image comparison unit 13 stores the received user's facial image data in storage (AWS/S3) 13b, and stores the keywords registered by the user in association with this data in database (AWS/RDS) 13a (step S2).

　さらに、ユーザが登録したキーワードに類似し、クロール処理により適切であると自動又は手動で判断した類似キーワードをデータベース（ＡＷＳ／ＲＤＳ）１３ａに記憶する（ステップＳ３）。 Furthermore, similar keywords that are similar to the keywords registered by the user and that are determined to be appropriate through crawling processing, either automatically or manually, are stored in database (AWS/RDS) 13a (step S3).

　一方、図４に示すように、入出力及び画像比較部１３は、ストレージ（ＡＷＳ／Ｓ３）１３ｂに記憶されているユーザの顔画像をクラウドサーバ１４のＡｚｕｒｅのシミラーに入力することにより、類似度の高い顔画像のＩＤを出力する。このＩＤに基づいて、データベース（ＡＷＳ／ＲＤＳ）１３ａを参照し、その顔画像が存在する動画のＵＲＬ及び本人である確率を抽出する（ステップＳ１１）。即ち、Ａｚｕｒｅからは、ＵＲＬ及び確率が出力され、データベース（ＡＷＳ／ＲＤＳ）からは、投稿日時が読み出されて出力される。 Meanwhile, as shown in Figure 4, the input/output and image comparison unit 13 inputs the user's facial image stored in storage (AWS/S3) 13b into Azure Similar on the cloud server 14, and outputs the ID of the facial image with the highest degree of similarity. Based on this ID, the database (AWS/RDS) 13a is referenced, and the URL of the video in which the facial image exists and the probability that it is the user themselves are extracted (step S11). That is, the URL and probability are output from Azure, and the posting date and time are read and output from the database (AWS/RDS).

　次いで、入出力及び画像比較部１３は、このようにして得られた、類似度の高い顔画像が存在する動画のＵＲＬ、本人である確率、投稿された日時及び発見した日時をユーザ端末１０に送信する（ステップＳ１２）。ユーザ端末１０は、これにより、図１１に示すような検索結果（発見日、この情報をチェックしたか否か、投稿日、ＵＲＬ、本人確率）を表示する。 The input/output and image comparison unit 13 then transmits the URL of the video containing the highly similar facial image, the probability that the person is the same person, the posting date and time, and the discovery date and time to the user terminal 10 (step S12). The user terminal 10 then displays the search results (discovery date, whether or not the information was checked, posting date, URL, and identity probability) as shown in FIG. 11.

　図５はＵＲＬクローラ部１５の処理動作を表しており、以下、同図を用いてこのＵＲＬクローラ部１５の処理動作を説明する。 Figure 5 shows the processing operations of the URL crawler unit 15, and the processing operations of this URL crawler unit 15 will be explained below using this figure.

　ＵＲＬクローラ部１５は、まず、アクセスする複数のＷＥＢサイトをあらかじめ決定しておく（ステップＳ２１）。 The URL crawler unit 15 first determines in advance the multiple websites to be accessed (step S21).

　次いで、決定したこれら複数のＷＥＢサイトについて、データベース（ＡＷＳ／ＲＤＳ）１３ａに記憶されているユーザが登録したキーワード及びこれに類似するキーワードに基づいてＵＲＬクローリングし、これらキーワードを含む動画のＵＲＬを取得する（ステップＳ２２）。 Next, for these determined websites, the URLs are crawled based on the keywords registered by the user and similar keywords stored in the database (AWS/RDS) 13a, and URLs of videos containing these keywords are obtained (step S22).

　次いで、取得したＵＲＬをＵＲＬリストとして、ＵＲＬリスト部１５ｂに記憶する（ステップＳ２３）。 The acquired URLs are then stored as a URL list in the URL list section 15b (step S23).

　図６は静止画クローラ部１６の処理動作を表しており、以下、同図を用いてこの静止画クローラ部１６の処理動作を説明する。 Figure 6 shows the processing operation of the still image crawler unit 16, and the processing operation of this still image crawler unit 16 will be explained below using this figure.

　静止画クローラ部１６は、画像サーチ部１６ｂにおいて、あらかじめ定めた検索エンジンを用い、データベース（ＡＷＳ／ＲＤＳ）１３ａに記憶されている指定されたキーワードで静止画を画像検索する（ステップＳ３１）。画像の収集対象となるＷＥＢサイトは非常に多岐にわたるため、指定されたキーワード、例えば、盗撮された画像の場合、「盗撮」というキーワードで検索する。 The still image crawler unit 16 uses a predetermined search engine in the image search unit 16b to perform an image search for still images using specified keywords stored in the database (AWS/RDS) 13a (step S31). Because there is a wide variety of websites from which images can be collected, the specified keyword, for example, "secretly taken" in the case of images taken secretly, is used in the search.

　次いで、検索された多数の静止画像を、ローカルのキャッシュメモリに一時的に記憶させる（ステップＳ３２）。 Next, the retrieved still images are temporarily stored in local cache memory (step S32).

　次いで、このキャッシュメモリに記憶された多数の静止画像について、顔検出に関するディープラーニングによる機械学習アルゴリズム（例えばＦａｃｅＢｏｘｅｓモデル）を適用し、顔が含まれている静止画像と顔が含まれている確率を表すコンフィデンス値とを抽出する（ステップＳ３３）。 Next, a deep learning machine learning algorithm for face detection (e.g., the FaceBoxes model) is applied to the numerous still images stored in this cache memory to extract still images containing faces and confidence values representing the probability that a face is included (step S33).

　次いで、抽出したコンフィデンス値が閾値を超える静止画像（画像データ）について、ストレージ（ＡＷＳ／Ｓ３）１６ａに記憶させて、リスト登録処理を行う（ステップＳ３４）。 Next, still images (image data) whose extracted confidence values exceed the threshold are stored in storage (AWS/S3) 16a and a list registration process is performed (step S34).

　一方、ステップＳ３１～３４の処理とは非同期に、ユーザが自己の顔画像データ及びキーワードを登録した際に、ＡｚｕｒｅのＦｉｎｄ　Ｓｉｍｉｌａｒが呼び出され、画像をこのＡｚｕｒｅのＦｉｎｄ　Ｓｉｍｉｌａｒモデルに学習させると共に、その画像元のＵＲＬ、投稿日時、キーワード、Ａｚｕｒｅに学習させた画像のＩＤなどをデータベース（ＡＷＳ／ＲＤＳ）１３ａに記録し、画像データをストレージ（ＡＷＳ／Ｓ３）１３ｂにアップロードして記憶させる（ステップＳ３５）。 Meanwhile, asynchronously with the processing of steps S31-34, when a user registers their own facial image data and keywords, Azure's Find Similar is called up, and the image is trained in this Azure Find Similar model. The image's original URL, posting date and time, keywords, and the image ID trained by Azure are recorded in database (AWS/RDS) 13a, and the image data is uploaded to and stored in storage (AWS/S3) 13b (step S35).

　図７は動画クローラ部１７の処理動作を表しており、以下、同図を用いてこの動画クローラ部１７の処理動作を説明する。 Figure 7 shows the processing operation of the video crawler unit 17, and the processing operation of this video crawler unit 17 will be explained below using this figure.

　動画クローラ部１７は、まず、ＵＲＬリスト部１５ｂに記憶されているＵＲＬにおいて、あらかじめ定めた画像検索エンジン（Ｇｏｏｇｌｅ　Ｃｈｒｏｍｅ）を用い、指定されたキーワードで多数の動画像を取得する（ステップＳ４１）。 The video crawler unit 17 first uses a predetermined image search engine (Google Chrome) to retrieve a large number of videos using the specified keywords from the URLs stored in the URL list unit 15b (step S41).

　次いで、取得した多数の動画像を高速で再生し、機械学習モデルが判定した顔が映っている静止画を取得する（ステップＳ４２）。 Next, the acquired multiple video images are played back at high speed to acquire still images that show the faces identified by the machine learning model (step S42).

　一方、取得した多数の動画像を高速で再生し、任意の一定フレーム間隔で静止画を取得する（ステップＳ４３）。 Meanwhile, the acquired multiple video images are played back at high speed, and still images are acquired at any fixed frame interval (step S43).

　次いで、ステップＳ４２の処理手順（ステップＳ４３とは異なる処理手順）で抽出した多数の静止画と、ステップＳ４３の処理手順（ステップＳ４２とは異なる処理手順）で抽出した多数の静止画とを統合し、ローカルのキャッシュメモリに一時的に記憶させる（ステップＳ４４）。このように、本実施形態では、２つの互いに異なる処理手順を使用して動画像から静止画をそれぞれ抽出し、これら２つの処理手順によって抽出した静止画を統合することによって、静止画の抽出を行っている。処理手順が異なることにより最終的に得られる画像も異なってくることから、１つの手順のみによって顔画像の抽出を行った場合に比べて顔画像検出の確実性が非常に高くなる。 Next, the large number of still images extracted using the processing procedure of step S42 (a processing procedure different from step S43) and the large number of still images extracted using the processing procedure of step S43 (a processing procedure different from step S42) are integrated and temporarily stored in local cache memory (step S44). In this way, in this embodiment, still images are extracted from moving images using two different processing procedures, and the still images extracted using these two processing procedures are integrated to extract still images. Because the final images obtained differ depending on the processing procedures, the reliability of facial image detection is significantly higher than when facial images are extracted using only one procedure.

　多数の画像データはストレージに直接記録するには大容量すぎるため、ステップＳ４２及びＳ４３においては、顔が写っているとは限らない画像群から顔が写っているもののみを選択し、さらにその中で最もコンフィデンス値の高いものを最終的に残す。ステップＳ４４においても、ほぼ同様で顔が写っている画像群から最も顔がよく写っている（コンフィデンス値の高い）画像を最終的に残す。これは最終的にストレージにアップロードするが、正しくはＵＲＬクローラに記載される全ての動画に対して、各コンフィデンス値の最も高い画像を取得し終えてから、リスト登録処理を通してＡｚｕｒｅで学習させ、ストレージ（ＡＷＳ／Ｓ３）１７ａに記憶させる。 As the volume of image data is too large to record directly in storage, in steps S42 and S43, only images that show faces are selected from a group of images that do not necessarily show faces, and the image with the highest confidence value is ultimately retained. Step S44 is similar: from the group of images that do show faces, the image that best shows the face (with the highest confidence value) is ultimately retained. This is ultimately uploaded to storage, but more precisely, after the image with the highest confidence value for each of the videos listed in the URL crawler has been obtained, it is trained by Azure through a list registration process and stored in storage (AWS/S3) 17a.

　コンフィデンス値の最も高い画像を取得する場合に、全ての画像について評価するのではなく、動画から得られた複数の画像群を先頭から所定数毎（例えば１００枚毎）に分割して、その中で最も精度の高い画像を選択することが望ましい。そして、各画像群から得られた複数の顔画像に対して、さらに、その中で最もコンフィデンスの高いものを最終的に選出する。このように、全ての画像から最も精度の高い画像を選択するのではなく、画像群に分割し各分割単位について評価する理由は、閾値を超えた場合に処理を途中で切り上げられるので、全体としては効率的に処理を行えるためである。例えば１０００枚の画像から精度の最も高いものを選出する場合、単純に行えば１０００枚の画像データを評価する必要があるが、１００枚毎に１０分割した場合、例えば１０枚目で閾値を超えれば１０×１０＝１００枚の画像評価で済むこととなる。 When obtaining the image with the highest confidence value, rather than evaluating all images, it is desirable to divide the multiple image groups obtained from the video into groups of a predetermined number (for example, every 100 images) from the beginning and select the image with the highest accuracy from among them. Then, for the multiple facial images obtained from each image group, the one with the highest confidence is finally selected. The reason for dividing into image groups and evaluating each division unit in this way, rather than selecting the image with the highest accuracy from all images, is that processing can be stopped midway if a threshold is exceeded, allowing for efficient processing overall. For example, if the image with the highest accuracy is selected from 1,000 images, simply evaluating the image data of 1,000 images would be necessary, but if the group is divided into 10 groups of 100 images each, and the threshold is exceeded on the 10th image, then it will be sufficient to evaluate 10 x 10 = 100 images.

　ステップＳ４４の処理の後、キャッシュメモリに記憶された多数の静止画像について、顔検出に関するディープラーニングによる機械学習アルゴリズム（例えばライブラリのＦａｃｅＢｏｘｅｓモデル）を適用し、顔が含まれている静止画像と顔が含まれている確率を表すコンフィデンス値とを抽出する（ステップＳ４５）。 After processing in step S44, a deep learning machine learning algorithm for face detection (e.g., the FaceBoxes model from the library) is applied to the numerous still images stored in the cache memory to extract still images containing faces and confidence values representing the probability that a face is included (step S45).

　次いで、コンフィデンス値が閾値を超える静止画像（画像データ）についてストレージ（ＡＷＳ／Ｓ３）１７ａに記憶する（ステップＳ４６）。 Next, still images (image data) whose confidence values exceed the threshold are stored in storage (AWS/S3) 17a (step S46).

　一方、ステップＳ４１～４６の処理とは非同期に、ユーザが自己の顔画像データ及びキーワードを登録した際に、ＡｚｕｒｅのＦｉｎｄ　Ｓｉｍｉｌａｒが呼び出され、画像をこのＡｚｕｒｅのＦｉｎｄ　Ｓｉｍｉｌａｒモデルに学習させると共に、その画像元のＵＲＬ、投稿日時、キーワード、Ａｚｕｒｅに学習させた画像のＩＤなどをデータベース（ＡＷＳ／ＲＤＳ）１３ａに記録し、画像データをストレージ（ＡＷＳ／Ｓ３）１３ｂにアップロードして記憶させる（ステップＳ４７）。 Meanwhile, asynchronously with the processing of steps S41-46, when a user registers their own facial image data and keywords, Azure's Find Similar is called up, and the image is trained in this Azure Find Similar model. The image's original URL, posting date and time, keywords, and the image ID trained by Azure are recorded in database (AWS/RDS) 13a, and the image data is uploaded and stored in storage (AWS/S3) 13b (step S47).

　図８は動画クローラ部１７の一部の処理動作を変更した変更態様の処理動作を表している。 Figure 8 shows a modified version of the processing operations in which some of the processing operations of the video crawler unit 17 have been modified.

　この変更態様においては、図７のステップＳ４２における処理動作に代えて、図８のステップＳ４２ａに示すように、取得した多数の動画像を高速で再生し、機械学習モデルを利用して、顔が映ったと判定した場合の静止画を抽出する処理、図８のステップＳ４２ｂに示すように、取得した多数の動画像を高速で再生し、機械学習モデルを利用して、最も顔が映っていると判定したタイミングの顔の静止画を抽出する処理、及び図８のステップＳ４２ｃに示すように、取得した多数の動画像を高速で再生し、機械学習モデルを利用して、最も顔が映っていると判定したタイミングの顔の特徴量を抽出し、顔の静止画を抽出する処理を並行して行う。 In this modified embodiment, instead of the processing operation in step S42 of FIG. 7, the following processes are performed in parallel: as shown in step S42a of FIG. 8, a process of playing back a large number of acquired moving images at high speed and using a machine learning model to extract still images when it is determined that a face is shown; as shown in step S42b of FIG. 8, a process of playing back a large number of acquired moving images at high speed and using a machine learning model to extract still images of a face at the timing when it is determined that a face is most likely to be shown; and as shown in step S42c of FIG. 8, a process of playing back a large number of acquired moving images at high speed and using a machine learning model to extract facial features at the timing when it is determined that a face is most likely to be shown and extract still images of a face.

　また、図７のステップＳ４３における処理動作に代えて、図８のステップＳ４３ａに示すように、取得した多数の動画像を高速で再生し、一定フレーム間隔で動画全体の静止画を抽出し、抽出した静止画毎に機械学習モデルを利用して、顔が映っている確率が最も高い静止画を抽出する処理を行う。 In addition, instead of the processing operation in step S43 in Figure 7, as shown in step S43a in Figure 8, a large number of acquired moving images are played back at high speed, still images from the entire moving image are extracted at regular frame intervals, and a machine learning model is used for each extracted still image to extract the still image with the highest probability of showing a face.

　その後、ステップＳ４２ａ～４２ｃの処理手順（他のステップとは異なる処理手順）で抽出した多数の静止画と、ステップＳ４３ａの処理手順（他のステップとは異なる処理手順）で抽出した多数の静止画とを統合してローカルのキャッシュメモリに一時的に記憶させる。このように、本変更態様では、４つの互いに異なる処理手順を使用して動画像から静止画をそれぞれ抽出し、これら４つの処理手順によって抽出した静止画を統合することによって、静止画の抽出を行っている。処理手順が異なることにより最終的に得られる画像も異なってくることから、１つの手順のみによって顔画像の抽出を行った場合に比べて確実性が大幅に高くなる。 Then, the multiple still images extracted using the processing procedure of steps S42a to S42c (a processing procedure different from the other steps) and the multiple still images extracted using the processing procedure of step S43a (a processing procedure different from the other steps) are integrated and temporarily stored in local cache memory. In this modified form, still images are extracted from moving images using four different processing procedures, and the still images extracted using these four processing procedures are integrated to extract still images. Because the final image obtained differs depending on the processing procedures, this provides significantly higher reliability than when facial images are extracted using only one procedure.

　以上説明したように、本実施形態では、動画クローラ部１７の動画クローラによって、ＵＲＬリスト部１５ｂに記憶されているＵＲＬに対応するＷＥＢサイト、即ち、顔画像流出の原因となるとして登録したキーワードを含むＷＥＢサイト、で動画データを取得している。即ち、インターネット全体を検索していない。このため、インターネット上に流出したユーザの顔画像をその初期段階で迅速に検出することができると共に画像検索装置の構成が簡易となる。また、本実施形態では、取得した動画データから複数の静止画データを抽出し、抽出した静止画データから顔情報が含まれる画像データを取得する際に、複数の互いに異なる処理手順を用いて動画データから静止画データを抽出し、これら複数の互いに異なる処理手順によって抽出した静止画データを統合している。このように、処理手順が異なることにより最終的に得られる画像も異なってくることから、１つの手順のみによって顔画像の抽出を行った場合に比べて顔画像検出の確実性が大幅に高くなる。このため、ユーザの顔画像がインターネット上に流出したことを簡易な構成で確実に検出することができる。 As explained above, in this embodiment, the video crawler of the video crawler unit 17 acquires video data from websites corresponding to URLs stored in the URL list unit 15b, i.e., websites containing keywords registered as the cause of facial image leaks. In other words, the entire Internet is not searched. This allows for rapid detection of user facial images leaked onto the Internet at an early stage and simplifies the configuration of the image search device. Furthermore, in this embodiment, when multiple still image data are extracted from the acquired video data and image data containing facial information is obtained from the extracted still image data, multiple different processing procedures are used to extract the still image data from the video data, and the still image data extracted by these multiple different processing procedures is integrated. In this way, different processing procedures result in different final images, which significantly increases the reliability of facial image detection compared to when facial images are extracted using only a single procedure. This makes it possible to reliably detect that a user's facial image has been leaked onto the Internet with a simple configuration.

　上述した実施形態及び変更態様においては、クラウドコンピューティングサービスによるサーバ、ストレージ及びデータベースを使用して本発明の画像検索装置を構築しているが、本発明は、クラウドコンピューティングサービスを用いることなく、ローカルのサーバ、ストレージ及びデータベースを組み上げて構築した専用の画像検索装置を用いて実現しても良い。 In the above-described embodiment and modified aspects, the image search device of the present invention is constructed using servers, storage, and databases provided by a cloud computing service, but the present invention may also be realized using a dedicated image search device constructed by assembling local servers, storage, and databases without using a cloud computing service.

　以上述べた実施形態は全て本発明を例示的に示すものであって限定的に示すものではなく、本発明は他の種々の変形態様及び変更態様で実施することができる。従って本発明の範囲は特許請求の範囲及びその均等範囲によってのみ規定されるものである。 The above-described embodiments are intended to be illustrative of the present invention and are not limiting, and the present invention can be implemented in a variety of other modified and altered forms. Therefore, the scope of the present invention is defined only by the claims and their equivalents.

　インターネット上に流出したユーザの画像を検索し、ユーザの盗撮被害やリベンジポルノ被害を防止するために利用できる。 It can be used to search for images of users that have been leaked onto the Internet and prevent users from becoming victims of voyeurism or revenge porn.

　１０　ユーザ端末
　１１　画像検索サーバ
　１２　インターネット
　１３　入出力及び画像比較部
　１３ａ　データベース（ＡＷＳ／ＲＤＳ）
　１３ｂ、１６ａ、１７ａ　ストレージ（ＡＷＳ／Ｓ３）
　１４　クラウドサーバ
　１５　ＵＲＬクローラ部
　１５ａ　クラウドサーバ（ＡＷＳ／ＥＣ２）
　１５ｂ　ＵＲＬリスト部
　１６　静止画クローラ部
　１６ｂ　画像サーチ部
　１６ｃ　機械学習部
　１７　動画クローラ部
　１７ｂ　画像抽出部 10 User terminal 11 Image search server 12 Internet 13 Input/output and image comparison unit 13a Database (AWS/RDS)
13b, 16a, 17a Storage (AWS/S3)
14 Cloud server 15 URL crawler unit 15a Cloud server (AWS/EC2)
15b URL list unit 16 Still image crawler unit 16b Image search unit 16c Machine learning unit 17 Video crawler unit 17b Image extraction unit

Claims

The device comprises an input/output and image comparison unit that stores facial image data representing the user's own facial image registered by the user and keywords registered by the user as the cause of the leakage of the facial image, and an image extraction unit that extracts a plurality of still image data from video data acquired from a website that is a sexually explicit site on the Internet and includes the keyword registered by the user, and acquires image data including facial information from the extracted still image data,
the input/output and image comparison unit is configured to compare the image data including the face information acquired by the image extraction unit with the face image data registered by the user and stored in the input/output and image comparison unit, and to extract a URL of the face image data having a high degree of similarity;
The image extraction unit is configured to extract still image data from the video data using a plurality of different processing procedures, and to integrate the still image data extracted by the plurality of different processing procedures.

The image search device of claim 1, characterized in that the multiple different processing procedures performed by the image extraction unit include a process of playing back the video data at high speed and extracting still image data determined by a machine learning model to contain a face, and a process of playing back the video data at high speed and extracting still image data at any fixed frame interval.

The image search device of claim 1, wherein the multiple different processing procedures performed by the image extraction unit include: a process of playing back the video data at high speed and extracting a still image when a machine learning model determines that a face is shown; a process of playing back the video data at high speed and extracting a still image of a face at the timing determined by the machine learning model that a face is most likely to be shown; a process of playing back the video data at high speed and extracting facial features at the timing determined by the machine learning model that a face is most likely to be shown, thereby extracting a still image of a face; and a process of playing back the video data at high speed, extracting still images from the entire video at regular frame intervals, and extracting, for each extracted still image, a still image that has the highest probability of showing a face according to the machine learning model.

The image search device of claim 1, characterized in that the image extraction unit is configured to compare the confidence value of the image data containing the facial information with a threshold value to extract the image data that most closely resembles a facial image.

The image search device of claim 4, wherein the image extraction unit is configured to divide the image data containing the facial information into a predetermined number of parts starting from the beginning, select the image data with the highest accuracy among the predetermined number of divided image data, and compare the confidence value of the selected image data with a threshold value.

The image search device of claim 1 further comprises a URL crawler unit that collects URLs of websites that are sexually explicit on the Internet and that contain keywords registered by the user, and the image extraction unit is configured to acquire video data from websites that correspond to the URLs collected by the URL crawler unit.

The image search device described in claim 1, characterized in that the input/output and image comparison unit is configured to store facial image data and keywords sent from a user terminal, and to send the extracted URL to the user terminal.

An image search method comprising: storing face image data representing a user's own face image registered by the user and a keyword registered by the user as a cause of the face image being leaked; extracting a plurality of still image data from video data acquired from a web site on the Internet that is a sexually explicit site and includes the keyword registered by the user; and acquiring image data including face information from the extracted still image data,
The image data including the acquired face information is compared with the face image data registered by the user, and the URL of the face image data having a high degree of similarity is extracted.
An image retrieval method comprising: extracting still image data from moving image data using a plurality of mutually different processing procedures; and integrating the still image data extracted by the plurality of mutually different processing procedures.

The image search method of claim 8, wherein the plurality of different processing procedures include a process of playing back the video data at high speed and extracting still image data determined by a machine learning model to contain a face, and a process of playing back the video data at high speed and extracting still image data at any fixed frame interval.

The image search method of claim 8, wherein the multiple different processing procedures include: a process of playing back the video data at high speed and extracting a still image when a machine learning model determines that a face is shown; a process of playing back the video data at high speed and extracting a still image of a face at the timing determined by the machine learning model that a face is most likely to be shown; a process of playing back the video data at high speed and extracting facial features at the timing determined by the machine learning model that a face is most likely to be shown, thereby extracting a still image of the face; and a process of playing back the video data at high speed, extracting still images from the entire video at regular frame intervals, and extracting, for each extracted still image, a still image that has the highest probability of showing a face according to the machine learning model.

The image search method described in claim 8, characterized in that the confidence value of the image data containing the facial information is compared with a threshold value to extract the image data that most closely shows the facial image.

The image search method described in claim 11, characterized in that the image data containing the facial information is divided into a predetermined number of parts starting from the beginning, the best image data is selected from the predetermined number of divided image data parts, and the confidence value of the selected image data is compared with a threshold value.

The image search method described in claim 8, characterized in that the URLs of websites on the Internet that are sexually explicit and contain keywords registered by the user are collected, and the video data is obtained from the sites corresponding to the collected URLs.

The image search method described in claim 8, characterized in that facial image data and keywords sent from a user terminal are stored, and the extracted URL is sent to the user terminal.

Computer,
A program for functioning as an input/output and image comparison means for storing facial image data representing a user's own facial image registered by the user and a keyword registered by the user as the cause of the leakage of the facial image, and an image extraction means for extracting a plurality of still image data from video data acquired from a website on the Internet that is a sexual site and includes a keyword registered by the user, and for acquiring image data including facial information from the extracted still image data,
the input/output and image comparison means is means for comparing image data including the acquired face information with face image data registered by the user, and extracting a URL of face image data having a high degree of similarity;
The image extraction means extracts still image data from moving image data using a plurality of different processing procedures, and integrates the still image data extracted by the plurality of different processing procedures.

Computer,
A computer-readable recording medium having recorded thereon a program for functioning as an input/output and image comparison means for storing facial image data representing a user's own facial image registered by the user and a keyword registered by the user as the cause of the leakage of the facial image, and an image extraction means for extracting a plurality of still image data from video data acquired from a website on the Internet that is a sexually explicit site and includes a keyword registered by the user, and for acquiring image data including facial information from the extracted still image data,
the input/output and image comparison means is means for comparing image data including the acquired face information with face image data registered by the user, and extracting a URL of face image data having a high degree of similarity;
A computer-readable recording medium characterized in that the image extraction means is a means for extracting still image data from video data using a plurality of different processing procedures and integrating the still image data extracted by the plurality of different processing procedures.