[go: up one dir, main page]

US20240211952A1 - Information processing program, information processing method, and information processing device - Google Patents

Information processing program, information processing method, and information processing device Download PDF

Info

Publication number
US20240211952A1
US20240211952A1 US18/532,225 US202318532225A US2024211952A1 US 20240211952 A1 US20240211952 A1 US 20240211952A1 US 202318532225 A US202318532225 A US 202318532225A US 2024211952 A1 US2024211952 A1 US 2024211952A1
Authority
US
United States
Prior art keywords
product
image data
region
registration machine
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/532,225
Inventor
Yuya Obinata
Takuma Yamamoto
Daisuke Uchida
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Assigned to FUJITSU LIMITED reassignment FUJITSU LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: UCHIDA, DAISUKE, Obinata, Yuya, YAMAMOTO, TAKUMA
Publication of US20240211952A1 publication Critical patent/US20240211952A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07GREGISTERING THE RECEIPT OF CASH, VALUABLES, OR TOKENS
    • G07G3/00Alarm indicators, e.g. bells
    • G07G3/003Anti-theft control
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/18Payment architectures involving self-service terminals [SST], vending machines, kiosks or multimedia terminals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/20Point-of-sale [POS] network systems
    • G06Q20/202Interconnection or interaction of plural electronic cash registers [ECR] or to host computer, e.g. network details, transfer of information from host to ECR or from ECR to ECR
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/20Point-of-sale [POS] network systems
    • G06Q20/208Input by product or record sensing, e.g. weighing or scanner processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07GREGISTERING THE RECEIPT OF CASH, VALUABLES, OR TOKENS
    • G07G1/00Cash registers
    • G07G1/0009Details of the software in the checkout register, electronic cash register [ECR] or point of sale terminal [POS]
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07GREGISTERING THE RECEIPT OF CASH, VALUABLES, OR TOKENS
    • G07G1/00Cash registers
    • G07G1/0036Checkout procedures
    • G07G1/0045Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07GREGISTERING THE RECEIPT OF CASH, VALUABLES, OR TOKENS
    • G07G1/00Cash registers
    • G07G1/0036Checkout procedures
    • G07G1/0045Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader
    • G07G1/0054Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader with control of supplementary check-parameters, e.g. weight or number of articles
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07GREGISTERING THE RECEIPT OF CASH, VALUABLES, OR TOKENS
    • G07G1/00Cash registers
    • G07G1/12Cash registers electronically operated
    • G07G1/14Systems including one or more distant stations co-operating with a central processing unit
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07GREGISTERING THE RECEIPT OF CASH, VALUABLES, OR TOKENS
    • G07G3/00Alarm indicators, e.g. bells
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/57Mechanical or electrical details of cameras or camera modules specially adapted for being embedded in other devices

Definitions

  • the embodiments discussed herein are related to an information processing program, an information processing method, and an information processing device.
  • Image recognition technology for recognizing a specific object from an image has been widely used.
  • a region of the specific object in the image is specified as a bounding box (Bbox).
  • Bbox a bounding box
  • image recognition technology is considered to be applied, for example, to monitoring of a customer's purchasing behavior in a store or work management of workers in a factory.
  • the self-checkout machine is a point of sale (POS) cash register system by which a user who purchases a product himself/herself performs operations from reading of a barcode of the product to payment.
  • POS point of sale
  • the force majeure error includes a scan omission in which a user forgets to scan a product and moves the product from a basket to a plastic bag, for example, a reading error for erroneously reading a barcode on a can when barcodes are attached to a beer box, including a set of six cans, and each of cans, for example.
  • the intentional fraud includes barcode concealment for pretending to scan a product while hiding only the barcode with the finger by the user, or the like.
  • an object is to provide an information processing program, an information processing method, and an information processing device capable of identifying a product registered in an accounting machine.
  • a non-transitory computer-readable storage medium storing an information processing program that causes at least one computer to execute a process, the process includes acquiring video data each image data of which includes a registration machine used to register a product by a user; extracting, from the acquired video data, image data that include products by specifying a first region that includes a hand of the user, a second region that includes a product, and a relationship between the first region and the second region, for the image data of the acquired video data; specifying a timing when first information regarding a first product registered to the registration machine by the user; specifying certain image data of the image data that includes a second product held in the hand of the user within a certain time period from the timing and placed in a place in an angle of view of the video data that is not a place where a product that has been registered to the registration machine is placed for most of the certain time period, based on the first region for the image data, the second region for the image data, and the relationship for the image data; specifying
  • FIG. 1 is a diagram illustrating an overall configuration example of a self-checkout machine system according to a first embodiment
  • FIG. 2 is a diagram for explaining an example of detection of an abnormal behavior according to the first embodiment
  • FIG. 3 is a functional block diagram illustrating a functional configuration of an information processing device according to the first embodiment
  • FIG. 4 is a diagram for explaining training data
  • FIG. 5 is a diagram for explaining machine learning of a first machine learning model
  • FIG. 6 is a diagram for explaining machine learning of a second machine learning model
  • FIG. 7 is a diagram for explaining extraction of a product region
  • FIG. 8 is a diagram for explaining specification of a coordinate position of the product region
  • FIG. 9 is a diagram for explaining specification of information to be a determination target of fraud.
  • FIG. 10 is a diagram for explaining specification of a product region used to determine the fraud
  • FIG. 11 is a diagram for explaining specification of a product region used to determine the fraud using HOID
  • FIG. 12 is a diagram for explaining specification of a product region used to determine the fraud using a distribution
  • FIG. 13 is a diagram for explaining specification of a product item
  • FIG. 14 is a diagram for explaining detection of a fraudulent behavior
  • FIG. 15 is a diagram illustrating an alert display example on a self-checkout machine
  • FIG. 16 is a diagram illustrating an alert display example to a clerk
  • FIG. 17 is a flowchart illustrating a flow of processing of the information processing device
  • FIG. 18 is a flowchart illustrating a flow of processing of the self-checkout machine
  • FIG. 19 is a diagram for explaining a hardware configuration example.
  • FIG. 20 is a diagram for explaining a hardware configuration example of the self-checkout machine.
  • embodiments may be appropriately combined with each other in a range without contradiction.
  • FIG. 1 is a diagram illustrating an overall configuration example of a self-checkout machine system 5 according to a first embodiment.
  • the self-checkout machine system 5 includes a camera 30 , a self-checkout machine 50 , an administrator's terminal 60 , and an information processing device 100 .
  • the information processing device 100 is an example of a computer coupled to the camera 30 and the self-checkout machine 50 .
  • the information processing device 100 is coupled to the administrator's terminal 60 , via a network 3 for which various wired and wireless communication networks can be adopted.
  • the camera 30 and the self-checkout machine 50 may be coupled to the information processing device 100 , via the network 3 .
  • the camera 30 is an example of a camera that captures a video of a region including the self-checkout machine 50 .
  • the camera 30 transmits data of a video to the information processing device 100 .
  • the data of the video is referred to as “video data” or is simply referred to as a “video”.
  • the video data includes a plurality of time-series image frames. To each image frame, a frame number is assigned in a time-series ascending order.
  • One image frame is image data of a still image captured by the camera 30 at a certain timing. In the following description, there is a case where the image data is simply referred to as an “image”.
  • the self-checkout machine 50 is an example of a POS cash register system or an accounting machine with which a user 2 who purchases a product performs operations from reading a barcode of the product to payment. For example, when the user 2 moves a product to be purchased to a scan region of the self-checkout machine 50 , the self-checkout machine 50 scans a barcode of the product and registers the product as a product to be purchased.
  • the self-checkout machine 50 is an example of a self-checkout machine that registers (register operation) a product to be purchased by a customer and makes a payment, and is referred to as, for example, Self checkout, automated checkout, self-checkout machine, self-check-out register, or the like.
  • the barcode is one type of an identifier representing a numerical value or a character depending on thicknesses of striped lines, and the self-checkout machine 50 can specify the price, the type (for example, food), or the like of the product by scanning (reading) the barcode.
  • the barcode is an example of a code, and two dimensional codes such as a quick response (QR) code having the same function can be used, in addition to the barcode.
  • QR quick response
  • the user 2 repeatedly performs the operation of the product registration described above, and when the scan of the product is completed, the user 2 operates a touch panel or the like of the self-checkout machine 50 , and makes a settlement request.
  • the self-checkout machine 50 Upon receiving the settlement request, the self-checkout machine 50 presents the number of products to be purchased, the purchase price, or the like, and executes settlement processing.
  • the self-checkout machine 50 stores information regarding the products that have been scanned from when the user 2 starts scanning to when the settlement request is issued, in a storage unit and transmits the information to the information processing device 100 as self-checkout machine data (product information).
  • the administrator's terminal 60 is an example of a terminal device used by an administrator of a store.
  • the administrator's terminal 60 receives an alert notification indicating that fraud has been performed regarding purchase of a product or the like, from the information processing device 100 .
  • the information processing device 100 acquires video data of a predetermined area including the self-checkout machine 50 with which a person registers a product and inputs the acquired video data into a first machine learning model, so as to detect a product region from the video data.
  • the information processing device 100 stores time-series coordinate positions of the detected product region in the storage unit.
  • the information processing device 100 specifies a timing based on an operation of the person for registering the product in the self-checkout machine 50 , and specifies a product region related to the product registered in the self-checkout machine 50 , based on the specified timing based on the operation and the time-series coordinate positions stored in the storage unit.
  • FIG. 2 is a diagram for explaining an example of detection of an abnormal behavior according to the first embodiment.
  • the information processing device 100 acquires image data from the video data captured by the camera 30 that images the self-checkout machine 50 and acquires a Human-Object Interaction Detection (HOID) result from the acquired image data, using the HOID or the like. That is, the information processing device 100 acquires a region of a person, a region of an object, and a relationship between the person and the object, from the video data.
  • HOID Human-Object Interaction Detection
  • the information processing device 100 generates hand-held product image data (hereinafter, may be referred to as hand-held product image) obtained by extracting a region portion of the object (product) related to the person, from the image data of the HOID result. Then, the information processing device 100 analyzes the hand-held product image and identifies an image of a product (for example, wine) imaged in the hand-held product image.
  • hand-held product image obtained by extracting a region portion of the object (product) related to the person, from the image data of the HOID result.
  • the information processing device 100 analyzes the hand-held product image and identifies an image of a product (for example, wine) imaged in the hand-held product image.
  • the information processing device 100 acquires a scan result (for example, egg) that is information regarding the product scanned by the self-checkout machine 50 , from the self-checkout machine 50 .
  • a scan result for example, egg
  • the information processing device 100 compares the product item (for example, wine) specified from the video data with the product item (for example, egg) actually scanned by the self-checkout machine 50 , and in a case where the product items do not match, the information processing device 100 determines that an abnormal behavior (fraud) is performed and notifies of an alert.
  • the product item for example, wine
  • the product item for example, egg
  • the information processing device 100 analyzes the image data captured at the scanned timing and determines whether or not a product to be scanned and an actually scanned product match. As a result, since the information processing device 100 can detect fraud (for example, banana trick) in which, after a product with no barcode on the product itself is held, another inexpensive product is registered on a registration screen of the self-checkout machine 50 , the information processing device 100 can identify the product registered in the self-checkout machine 50 .
  • fraud for example, banana trick
  • FIG. 3 is a functional block diagram illustrating a functional configuration of the information processing device 100 according to the first embodiment.
  • the information processing device 100 includes a communication unit 101 , a storage unit 102 , and a control unit 110 .
  • the communication unit 101 is a processing unit that controls communication with another device and, for example, is implemented by a communication interface or the like.
  • the communication unit 101 receives video data from the camera 30 and transmits a processing result by the control unit 110 to the administrator's terminal 60 .
  • the storage unit 102 is a processing unit that stores various types of data, programs executed by the control unit 110 , or the like, and is implemented by a memory, a hard disk, or the like.
  • the storage unit 102 stores a training data database (DB) 103 , a first machine learning model 104 , a second machine learning model 105 , a video data DB 106 , and a coordinate position DB 107 .
  • DB training data database
  • the training data DB 103 is a database that stores training data used to train the first machine learning model 104 and training data used to train the second machine learning model 105 .
  • HOID Human-Object Interaction Detection
  • FIG. 4 is a diagram for explaining training data. As illustrated in FIG. 4 , each piece of the training data includes image data to be input data and correct answer information (label) set to the image data.
  • classes of a person and an object to be detected classes of a person and an object to be detected, a class indicating an interaction between the person and the object, and a bounding box (Bbox: object region information) indicating a region of each class are set.
  • region information of a Something class indicating an object which is an object such as a product, other than a plastic bag
  • region information of a class of a person indicating a user who purchases the product and a relationship (holding class) indicating an interaction between the Something class and the class of the person are set. That is, information regarding the object held by the person is set, as the correct answer information.
  • the class of the person is an example of a first class
  • the Something class is an example of a second class
  • the region information of the class of the person is an example of a first region
  • the region information of the Something class is an example of a second region
  • the interaction between the person and the object is an example of an interaction.
  • region information of a class of a plastic bag indicating the plastic bag region information of a class of a person indicating a user who uses the plastic bag, and a relationship (holding class) indicating an interaction between the class of the plastic bag and the class of the person are set. That is, information regarding the plastic bag held by the person is set, as the correct answer information.
  • the Something class is created by normal object identification (object recognition)
  • object recognition object recognition
  • all objects that have no relation with a task such as all backgrounds, clothes, or accessories are detected.
  • all of these are Somethings only a large number of Bboxes are identified in the image data, and nothing is found.
  • the information can be used for a task (for example, fraud detection task of self-checkout machine) as meaningful information.
  • the plastic bag or the like is identified as a unique class of Bag (plastic bag).
  • plastic bag is valuable information in a fraud detection task of the self-checkout machine
  • the plastic bag is not important information in other tasks. Therefore, it is worth using the information based on unique knowledge of the fraud detection task of the self-checkout machine indicating that the product is taken out from a basket (shopping basket) and is put into a bag, and a useful effect is obtained.
  • the first machine learning model 104 is an example of a machine learning model that is trained to identify a person and an object imaged in training data (for example, person and storage (plastic bag or the like)).
  • the first machine learning model 104 is a machine learning model that identifies the person, the product, and the relationship between the person and the product from the input image data, and outputs an identification result.
  • the first machine learning model 104 can adopt the HOID and can also adopt a machine learning model using various neural networks or the like. In a case of the HOID, “the class and the region information of the person, the class and the region information of the product (object), and the interaction between the person and the product” are output.
  • the second machine learning model 105 is an example of a machine learning model trained to specify an item of a product imaged in training data.
  • the second machine learning model 105 may be implemented by a zero-shot image classifier.
  • the second machine learning model 105 uses a list of texts and an image as inputs and outputs a text having the highest similarity to the image, in the list of the texts, as a label of the image.
  • contrastive language-image pre-training (CLIP) is exemplified.
  • the CLIP implements embedding of a plurality types of, so-called multimodal images and texts into a feature space. That is, with the CLIP, by training an image encoder and a text encoder, embedding, in which a vector distance between a pair of an image and a text having close meanings is shortened, is implemented.
  • the image encoder may be implemented by a vision transformer (ViT) or may be implemented by a convolutional neural network, for example, a ResNet or the like.
  • the text encoder may be implemented by a generative pre-trained transformer (GPT) based Transformer or may be implemented by a recurrent neural network, for example, a long short-term memory (LSTM).
  • GPST generative pre-trained transformer
  • LSTM long short-term memory
  • the video data DB 106 is a database that stores the video data captured by the camera 30 provided in the self-checkout machine 50 .
  • the video data DB 106 stores the video data for each self-checkout machine 50 or each camera 30 .
  • the coordinate position DB 107 is a database that stores coordinate positions that are position information of a product acquired from the video data, in time series.
  • the coordinate position DB 107 stores coordinate positions of a product in time series, for each tracked product.
  • an origin to be the reference of the coordinate position can be arbitrarily set, for example, to be the center of the image data, a corner of the image data (for example, lower left corner (angle), or the like.
  • the control unit 110 is a processing unit that performs overall control of the information processing device 100 and, for example, is implemented by a processor or the like.
  • the control unit 110 includes a machine learning unit 111 , a video acquisition unit 112 , a region extraction unit 113 , a coordinate position specification unit 114 , a product region specification unit 115 , a fraud detection unit 116 , and a warning control unit 117 .
  • the machine learning unit 111 , the video acquisition unit 112 , the region extraction unit 113 , the coordinate position specification unit 114 , the product region specification unit 115 , the fraud detection unit 116 , and the warning control unit 117 are implemented by an electronic circuit included in a processor, a process executed by the processor, or the like.
  • the machine learning unit 111 is a processing unit that performs machine learning of the first machine learning model 104 and the second machine learning model 105 , using each piece of the training data stored in the training data DB 103 .
  • the first machine learning model 104 and the second machine learning model 105 may be machine learned in advance, and the machine learning unit 111 can execute the following processing as fine tuning in a case where accuracy of the machine-learned first machine learning model 104 and second machine learning model 105 is insufficient.
  • FIG. 5 is a diagram for explaining machine learning of the first machine learning model 104 .
  • the machine learning unit 111 inputs input data of the training data into the HOID and acquires an output result of the HOID.
  • the output result includes a class of a person, a class of an object, an interaction between the person and the object, or the like detected by the HOID.
  • the machine learning unit 111 calculates error information between the correct answer information of the training data and the output result of the HOID and performs machine learning (training) for updating a parameter of the HOID through backpropagation, so as to reduce an error.
  • FIG. 6 is a diagram for explaining machine learning of the second machine learning model 105 .
  • a CLIP model 10 is illustrated.
  • a pair of image data hereinafter, may be referred to as image
  • a text is used as training data.
  • a dataset obtained by extracting a pair of an image and a text described as a caption of the image from a Web page on the Internet so-called WebImageText (WIT) can be used.
  • WIT WebImageText
  • a pair of an image such as a photograph of a dog or a picture in which an illustration of a dog is drawn and a text “dog picture” described as a caption of the image is used as the training data.
  • WIT the training data
  • the image is input into an image encoder 10 I, and the text is input into a text encoder 10 T.
  • the image encoder 10 I to which the image is input in this way outputs a vector in which the image is embedded into a feature space.
  • the text encoder 10 T to which the text is input outputs a vector in which the text is embedded into a feature space.
  • a mini batch having a batch size N including training data of a pair of an image 1 and a text 1 , a pair of an image 2 and a text 2 , . . . , and N pairs of N images and N texts is illustrated.
  • a similarity matrix M 1 of N ⁇ N embedding vectors can be obtained.
  • the “similarity” used herein may be an inner product or cosine similarity between the embedding vectors, as merely an example.
  • Contrastive objective is used.
  • an i-th text corresponds to a correct pair. Therefore, the i-th text is a positive example, and all texts other than the i-th text are negative examples.
  • N positive examples and N 2 ⁇ N negative examples are generated in the entire mini batch.
  • elements of N diagonal components with black and white inversion display are positive examples
  • elements of N 2 ⁇ N with white background display are negative examples.
  • parameters of the image encoder 10 I and the text encoder 10 T for maximizing a similarity between the N pairs corresponding to the positive example and minimizing a similarity between the N 2 ⁇ N pairs corresponding to the negative example are trained.
  • the first text is a positive example
  • second and subsequent texts are negative examples
  • a loss for example, a cross entropy error is calculated in a row direction of the similarity matrix M.
  • a loss related to an image is obtained.
  • the second image is a positive example
  • all images other than the second image are negative examples
  • the loss is calculated in a column direction of the similarity matrix M.
  • the image encoder 10 I and the text encoder 10 T update the parameter for minimizing a statistic value, for example, an average of the losses related to the images and the losses related to the texts.
  • the trained CLIP model 10 (for example, second machine learning model 105 ) is generated.
  • the video acquisition unit 112 is a processing unit that acquires video data from the camera 30 .
  • the video acquisition unit 112 acquires video data from the camera 30 provided in the self-checkout machine 50 as needed and stores the video data in the video data DB 106 .
  • the region extraction unit 113 is a processing unit that extracts a product region from the video data, by inputting the video data acquired by the video acquisition unit 112 into the first machine learning model 104 .
  • the region extraction unit 113 specifies a first region including a hand of a person, a second region including a product, and a relationship between the first region and the second region, from the video data, by inputting the video data into the first machine learning model 104 that is the HOID.
  • the region extraction unit 113 extracts a region of a product that is a target of a behavior of a person in the video data. For example, the region extraction unit 113 extracts a region of a product taken out from a shopping basket, a product held by the person, and a product put into a plastic bag.
  • FIG. 7 is a diagram for explaining extraction of the product region.
  • the image data to be input into the HOID and the output result of the HOID are illustrated.
  • a Bbox of a person is indicated by a frame of a solid line
  • a Bbox of an object is indicated by a frame of a broken line.
  • the output result of the HOID includes the Bbox of the person, the Bbox of the object, a probability value of the interaction between the person and the object, a class name, or the like.
  • the region extraction unit 113 extracts the region of the product held by the person, by extracting the Bbox of the object, that is, a partial image corresponding to the frame of the broken line in FIG. 7 , from the image data.
  • the region extraction unit 113 tracks the product, in a case where the product held with the hand of the person is detected. That is, the region extraction unit 113 tracks a movement related to the same product and a region of the same product, with consecutive frames in and subsequent to a certain frame from which the product region is extracted, in the video data. For example, for each product detected by the HOID, the region extraction unit 113 tracks the product from when the product is detected by the HOID to when the product put into the plastic bag is detected by the HOID. Then, the region extraction unit 113 stores a tracking result to the storage unit 102 .
  • the coordinate position specification unit 114 is a processing unit that specifies time-series coordinate positions of the product region extracted by the region extraction unit 113 and stores the coordinate positions in the storage unit. Specifically, the coordinate position specification unit 114 acquires coordinates of a product region of the tracked product in time series, from the start to the end of the tracking by the region extraction unit 113 . For example, the coordinate position specification unit 114 acquires a center coordinate of the tracked product or each of coordinates of four corners used to specify the product region of the tracked product in time series.
  • FIG. 8 is a diagram for explaining specification of a coordinate position of a product region.
  • image data 1 to 7 that is input data into the HOID and detection content of the HOID when the image data 1 to 7 is sequentially input are illustrated.
  • description written on the image data is information imaged in the image data, unknown information as the input into the HOID, and information to be detected by the HOID.
  • the region extraction unit 113 acquires the image data 1 in which no person and no object is imaged, inputs the image data 1 into the HOID, and acquires the output result. However, the region extraction unit 113 determines that there is no detection result of persons and objects. Subsequently, the region extraction unit 113 acquires the image data 2 in which a person holding a shopping basket is imaged, inputs the image data 2 into the HOID, and detects the user 2 (person) and the shopping basket held by the user 2 , according to an output result.
  • the region extraction unit 113 acquires the image data 3 in which a person who takes out a product from a shopping basket is imaged, inputs the image data 3 into the HOID, and detects a behavior of the user 2 for moving the holding product on the shopping basket, according to an output result. Then, the region extraction unit 113 starts tracking because the product is detected.
  • the coordinate position specification unit 114 acquires a coordinate position A 1 of the product taken out from the shopping basket or a coordinate position A 1 of a product region of the product taken out from the shopping basket.
  • the region extraction unit 113 can start tracking at a timing of the image data 2 in which only the shopping basket is detected. In this case, the region extraction unit 113 extracts a region as assuming the shopping basket as the product, and the coordinate position specification unit 114 acquires a coordinate position.
  • the region extraction unit 113 acquires the image data 4 in which a person who scans a product is imaged, inputs the image data 4 into the HOID, and detects a behavior of the user 2 for moving the holding product to a scan position, according to an output result.
  • the coordinate position specification unit 114 acquires a coordinate position A 2 of the held product or a coordinate position A 2 of a product region of the held product.
  • the region extraction unit 113 acquires the image data 5 in which a person who puts a product in a plastic bag is imaged, inputs the image data 5 into the HOID, and detects a behavior of the user 2 for putting the holding product into the holding plastic bag, according to an output result.
  • the coordinate position specification unit 114 acquires a coordinate position A 3 of the product held in the plastic bag or a coordinate position A 3 of a product region of the product held in the plastic bag.
  • the region extraction unit 113 detects that the product has been put into the plastic bag, by analyzing the image data 5 , the region extraction unit 113 ends the tracking of the product. Then, the coordinate position specification unit 114 stores the coordinate position A 1 , the coordinate position A 2 , and the coordinate position A 3 that are the coordinate positions of the tracked product in time series, in the coordinate position DB 107 .
  • the coordinate position specification unit 114 specifies the coordinate position of the product, generates time-series data of the coordinate positions, and stores the data in the coordinate position DB 107 .
  • the product region specification unit 115 is a processing unit that specifies a timing when the person performs an operation for registering the product in the self-checkout machine 50 and specifies a product region related to the product registered in the self-checkout machine 50 based on the specified operation timing and the time-series coordinate positions stored in the coordinate position DB 107 .
  • the product region specification unit 115 specifies the product region, based on a coordinate position immediately before the timing when the person performs the operation for registering the product in the self-checkout machine 50 , from among the time-series coordinate positions stored in the coordinate position DB 107 .
  • the product region specification unit 115 specifies the product region, based on a coordinate position immediately after the timing when the person performs the operation for registering the product in the self-checkout machine 50 , from among the time-series coordinate positions stored in the coordinate position DB 107 .
  • the product region specification unit 115 specifies the product region of the product placed around the self-checkout machine 50 by the person who has held the product as a fraud determination target.
  • the product region specification unit 115 specifies the product region of the product placed around the self-checkout machine 50 by the person who has held the product as a fraud determination target.
  • fraud is considered such that, the person causes the self-checkout machine 50 to scan a barcode attached to a single product included in a set product, not a position of a barcode attached to the set product and purchases the set product with a low price region of the single product.
  • the set product is collectively packaged in a state where cans are arranged in two rows by three using a packaging material, so as to collectively carry six alcoholic beverage cans.
  • a barcode is attached to each of the packaging material used to package the set of the plurality of alcoholic beverage cans and the can of the alcoholic beverage packaged using the packaging material.
  • Fraud is considered such that a person causes the self-checkout machine 50 to scan the barcode of the alcoholic beverage packaged in the packaging material, not the barcode of the packaging material.
  • the single product included in the set product is registered in the self-checkout machine 50 .
  • the product held by the user is the set product. Therefore, the product region specification unit 115 specifies the product region of the product placed around the self-checkout machine 50 by the person who has held the product as a fraud determination target.
  • the operation for registering the product in the self-checkout machine 50 will be described.
  • As the operation for registering the product there is an operation for registering an item of a product in the self-checkout machine 50 , via an operation on a selection screen in which a list of products with no barcode is displayed. Furthermore, there is an operation for registering an item of a product in the self-checkout machine 50 by scanning a barcode of a product with the barcode by the self-checkout machine 50 .
  • the self-checkout machine 50 registers a product with no barcode in the cash register through manual input of a person.
  • the self-checkout machine 50 receives the register registration of the item of the product, from a selection screen in which the items of the products with no barcode are displayed.
  • the self-checkout machine 50 registers an item of a product selected by a user from the list of the items of the products with no barcodes in a recoding medium of the self-checkout machine 50 , based on a user's touch operation on the selection screen.
  • the product region specification unit 115 of the information processing device 100 specifies a product region of a product, with respect to a timing when the item of the product with no barcode is registered in the self-checkout machine 50 .
  • the self-checkout machine 50 transmits a notification of scan information indicating that the operation for registering the product has been performed, to the information processing device 100 , via the network.
  • the product region specification unit 115 identifies the registration timing, based on the notification of the scan information from the self-checkout machine 50 via the network. Specifically, when the item of the product with no barcode is registered in the self-checkout machine 50 , the product region specification unit 115 specifies the product region of the product from among the time-series coordinate positions that have been stored, with respect to the timing when the item of the product with no barcode is registered in the self-checkout machine 50 . Note that the product region specification unit 115 may specify the product region of the product, with reference to a timing when the touch operation is performed on a display of the self-checkout machine 50 .
  • the self-checkout machine 50 registers the product with the barcode in the cash register by scanning the barcode.
  • the self-checkout machine 50 identifies an item of the product by scanning the barcode.
  • the self-checkout machine 50 registers the identified item of the product in the recoding medium of the self-checkout machine 50 .
  • the product region specification unit 115 of the information processing device 100 specifies the product region of the product, with reference to the timing when the item of the product is registered in the self-checkout machine 50 , through scanning of the barcode.
  • the self-checkout machine 50 transmits a notification of scan information indicating that the operation for registering the product has been performed, to the information processing device 100 , via the network.
  • the product region specification unit 115 identifies the registration timing, based on the notification of the scan information from the self-checkout machine 50 via the network. Specifically, when the item of the product with the barcode is registered in the self-checkout machine 50 , the product region specification unit 115 specifies the product region of the product from among the time-series coordinate positions that have been stored, with reference to the timing when the item of the product with the barcode is registered in the self-checkout machine 50 .
  • FIG. 9 is a diagram for explaining specification of information to be a determination target of fraud.
  • FIG. 9 as in FIG. 8 , each of pieces of image data subsequent to image data n that is the input data into the HOID and a detection content of the HOID when each of the pieces of the image data subsequent to the image data n is sequentially input are illustrated.
  • the region extraction unit 113 acquires the image data n in which a person who takes out a product from a shopping basket is imaged, inputs the image data n into the HOID, and detects a behavior of the user 2 for moving the holding product on the shopping basket, according to an output result. Then, the region extraction unit 113 starts tracking because the product is detected.
  • the coordinate position specification unit 114 acquires a coordinate position M of a product region of the tracked product.
  • the region extraction unit 113 acquires image data n 1 in which a person holding a product is imaged, inputs the image data n 1 into the HOID, and detects a behavior of the user 2 for taking out the product from the shopping basket and holding the product, according to an output result.
  • the coordinate position specification unit 114 acquires a coordinate position M 1 of the product region of the tracked and held product.
  • the region extraction unit 113 acquires image data n 2 in which a product held by a person around the self-checkout machine 50 is imaged, inputs the image data n 2 into the HOID, and detects a behavior of the user 2 for placing the product around the self-checkout machine 50 , according to an output result.
  • the coordinate position specification unit 114 acquires a coordinate position M 2 of the product region of the tracked and placed product.
  • the region extraction unit 113 acquires image data n 3 in which a product placed around the self-checkout machine 50 by a person is imaged, inputs the image data n 3 into the HOID, and detects the product kept placed around the self-checkout machine 50 , according to an output result.
  • the coordinate position specification unit 114 acquires a coordinate position M 3 of the product region of the tracked and kept placed product.
  • the region extraction unit 113 acquires image data n 4 in which a person is holding a product, inputs the image data n 4 into the HOID, and detects a behavior of the user 2 for holding the product placed around the self-checkout machine 50 , according to an output result.
  • the coordinate position specification unit 114 acquires a coordinate position M 4 of the product region of the tracked and held product.
  • the region extraction unit 113 acquires image data n 5 in which a person who puts a product in a plastic bag is imaged, inputs the image data n 5 into the HOID, and detects a behavior of the user 2 for putting the holding product into the holding plastic bag, according to an output result. Then, the coordinate position specification unit 114 acquires the coordinate position M 4 of the product region of the tracked product that is in the plastic bag, and tracking performed by the region extraction unit 113 ends.
  • the product region specification unit 115 receives a scan result from the self-checkout machine 50 . Then, the product region specification unit 115 specifies the coordinate position M 3 immediately before a scan time included in the scan result and the coordinate position M 4 immediately after the scan time. As a result, the product region specification unit 115 specifies the coordinate position of the product corresponding to the timing when the person has performed the operation for registering the product in the self-checkout machine 50 , as the coordinate position M 3 or the coordinate position M 4 .
  • the product region specification unit 115 specifies image data of a region corresponding to the specified coordinate position that is a product region to be the determination target of the fraud.
  • a specification example of the product region to be the determination target of the fraud is described as an example using the coordinate position M 3 .
  • the coordinate position M 4 may be used.
  • the product region specification unit 115 specifies a region of a product including a coordinate position, from image data that is a coordinate position specification source, as the determination target of the fraud.
  • FIG. 10 is a diagram for explaining specification of a product region used to determine fraud. As illustrated in FIG. 10 , the product region specification unit 115 specifies a region of a product C 2 including the coordinate position M 3 , in the image data n 3 that is the specification source image data. Then, the product region specification unit 115 extracts image data including the region of the product C 2 from the image data n 3 , as the image data of the product region to be the determination target of the fraud.
  • the product region specification unit 115 can specify the specified region of the product including the coordinate position, from among a plurality of product regions extracted by the HOID, as the determination target of the fraud.
  • FIG. 11 is a diagram for explaining specification of a product region used to determine fraud using the HOID.
  • the product region specification unit 115 specifies the region of the product C 2 including the coordinate position M 3 , from among a person region, a region of a product C 1 , and the region of the product C 2 extracted from the image data n 3 by the HOID. Then, the product region specification unit 115 extracts image data including the region of the product C 2 from the image data n 3 , as the image data of the product region to be the determination target of the fraud.
  • the product region specification unit 115 can specify a product region to be the determination target of the fraud, based on a distribution of the time-series coordinate positions.
  • FIG. 12 is a diagram for explaining specification of a product region used to determine fraud using a distribution. As illustrated in FIG. 12 , the product region specification unit 115 plots each coordinate position (coordinate position M, coordinate position M 1 , . . . ) of the product to be tracked on the x axis and the y axis. Then, the product region specification unit 115 performs clustering and specifies a cluster including the largest number of coordinate positions.
  • the product region specification unit 115 calculates a coordinate position S, based on the center in the cluster, an average value of all coordinate positions in the cluster, or the like. Then, the product region specification unit 115 extracts image data including the coordinate position S from the image data n 3 , as the image data of the product region to be the determination target of the fraud. Note that a size of the image data to be extracted (size of region) can be preset.
  • the product region specification unit 115 can use a distribution of coordinate positions before the timing when the person has performed the operation for registering the product in the self-checkout machine 50 , among all the coordinate positions, not limiting to a distribution of all the coordinate positions of the tracked product.
  • the product region specification unit 115 can use a distribution of the coordinate positions including the coordinate position M, the coordinate position M 1 , the coordinate position M 2 , and the coordinate position M 3 .
  • the fraud detection unit 116 is a processing unit that specifies an item of a product by inputting the product region related to the product registered in the self-checkout machine 50 into the second machine learning model 105 , and detects a fraudulent behavior when the item of the product registered in the self-checkout machine 50 by the person and the item of the product specified using the second machine learning model 105 do not match. That is, in a case where a scanned product is different from a product specified from a video, the fraud detection unit 116 determines that a fraudulent behavior occurs.
  • FIG. 13 is a diagram for explaining specification of a product item.
  • image data 20 of a product region specified as the determination target of the fraud by the product region specification unit 115 is input into the image encoder 10 I of the CLIP model 10 .
  • the image encoder 10 I outputs an embedding vector I 1 of the image data 20 of the product region.
  • texts such as “melon”, “rice”, “wine”, and “beer” that have been prepared in advance are input, as a list of class captions, into the text encoder 10 T of the CLIP model 10 .
  • the texts “melon”, “rice”, “wine”, and “beer” can be input into the text encoder 10 T.
  • “Prompt Engineering” can be performed to convert a class caption format at the time of inference into a class caption format at the time of training. For example, it is possible to insert a text corresponding to an attribute of a product, for example, “drink” into a portion of ⁇ object ⁇ in “photograph of ⁇ object ⁇ ” and makes an input as “photograph of drink”.
  • the text encoder 10 T outputs an embedding vector T 1 of the text “melon”, an embedding vector T 2 of the text “rice”, an embedding vector T 3 of the text “wine”, . . . and an embedding vector T N of the text “beer”.
  • the CLIP model 10 outputs “wine” as a prediction result of the class of the image data 20 of the product region.
  • the fraud detection unit 116 compares the product item “wine” specified using the second machine learning model 105 in this way and the product item registered in the self-checkout machine 50 and determines whether or not a fraudulent behavior has occurred.
  • FIG. 14 is a diagram for explaining detection of a fraudulent behavior.
  • the fraud detection unit 116 specifies the product item “wine” from the video data by the method illustrated in FIG. 13 .
  • the fraud detection unit 116 acquires a product item “banana” registered in the self-checkout machine 50 , from the self-checkout machine 50 .
  • the fraud detection unit 116 determines that a fraudulent behavior has occurred, and notifies the warning control unit 117 of an alarm notification instruction including an identifier of the self-checkout machine 50 or the like.
  • the warning control unit 117 is a processing unit that generates an alert and performs alert notification control in a case where the fraud detection unit 116 detects the fraudulent behavior (fraudulent operation). For example, the warning control unit 117 generates an alert indicating that the product registered in the self-checkout machine 50 by the person is abnormal and outputs the alert to the self-checkout machine 50 and the administrator's terminal 60 .
  • FIG. 15 is a diagram illustrating an alert display example on the self-checkout machine 50 .
  • an alert displayed on the self-checkout machine 50 when banana trick is detected is illustrated.
  • an alert window 230 is displayed on a touch panel 51 of the self-checkout machine 50 .
  • a product item “banana” registered in the cash register through manual input and the product item “wine” specified through image analysis by the second machine learning model 105 are displayed in a comparable state.
  • the alert window 230 can include a notification that prompts to correct and input again.
  • the warning control unit 117 can output content of the alert illustrated in FIG. 15 by voice.
  • the warning control unit 117 turns on a warning light provided in the self-checkout machine 50 , displays the identifier of the self-checkout machine 50 and a message indicating a possibility of the occurrence of the fraud on the administrator's terminal 60 , or transmits the identifier of the self-checkout machine 50 and a message indicating the occurrence of the fraud and necessity of confirmation to a terminal of a clerk in the store.
  • FIG. 16 is a diagram illustrating an alert display example to a clerk.
  • an alert displayed on a display unit of the administrator's terminal 60 at the time when the banana trick is detected is illustrated.
  • an alert window 250 is displayed on the display unit of the administrator's terminal 60 .
  • a product item “banana” and a price “ 350 yen” registered in the cash register through manual input, the product item “wine” and a price “4500 yen” specified through image analysis are displayed in a comparable state.
  • GUI graphical user interface
  • the warning control unit 117 causes the camera 30 included in the self-checkout machine 50 to image the person and stores the image data of the imaged person and the alert in the storage unit in association with each other.
  • the information can be used for various countermeasures to prevent a fraud in advance, for example, by detecting a visitor who has performed a fraudulent behavior at an entrance of the store.
  • the warning control unit 117 generates a machine learning model through supervised learning using the image data of the fraudulent person so as to detect the fraudulent person from the image data of the person who uses the self-checkout machine 50 , detect the fraudulent person at the entrance of the store, or the like. Furthermore, the warning control unit 117 can acquire information regarding a credit card of a person who has performed a fraudulent behavior from the self-checkout machine 50 and hold the information.
  • the self-checkout machine 50 receives a checkout of an item of a registered product.
  • the self-checkout machine 50 receives money used for the settlement of the product and pays change.
  • the self-checkout machine 50 may execute the settlement processing using not only cash but also various credit cards, prepaid cards, or the like. Note that, when the alert regarding the abnormality in the behavior for registering the product is issued, the self-checkout machine 50 stops the settlement processing.
  • the self-checkout machine 50 scans user's personal information, and executes settlement processing of the product registered in the self-checkout machine 50 , based on the scanned result.
  • the self-checkout machine 50 receives registration of an age-restricted product such as alcoholic beverages or cigarettes, as the operation for registering the product.
  • the self-checkout machine 50 identifies the age-restricted product, by scanning a barcode of the product.
  • the self-checkout machine 50 scans a my number card of a user or personal information stored in a terminal having a my number card function and specifies an age of the user from the date of birth. Then, when the age of the user is an age that is an age-restricted product sales target, the self-checkout machine 50 can permit to settle the product to be purchased by the user.
  • the self-checkout machine 50 when the age of the user is not the age that is the age-restricted product sales target, the self-checkout machine 50 outputs an alert indicating that the registered product cannot be sold. As a result, the self-checkout machine 50 can permit sales of alcoholic beverages, cigarettes, or the like, in consideration of the age restriction of the user.
  • FIG. 17 is a flowchart illustrating a flow of processing of the information processing device 100 . As illustrated in FIG. 17 , the information processing device 100 acquires video data as needed (S 101 ).
  • the information processing device 100 acquires a frame in the video data (S 103 ), and extracts a region of a product using the first machine learning model 104 (S 104 ).
  • the information processing device 100 starts tracking (S 106 ).
  • the information processing device 100 specifies a coordinate position and holds the coordinate position as time-series data (S 107 ).
  • the information processing device 100 repeats the processing in and subsequent to S 103 , and when tracking ends (S 108 : Yes), the information processing device 100 acquires scan information (scan result) including a scan time and a product item from the self-checkout machine 50 (S 109 ).
  • the information processing device 100 specifies a scan timing, based on the scan information (S 110 ) and specifies a product region to be a fraud behavior determination target based on the scan timing (S 111 ).
  • the information processing device 100 inputs image data of the product region into the second machine learning model 105 and specifies the product item (S 112 ).
  • the information processing device 100 notifies of an alert (S 114 ), and in a case where the product items match (S 113 : Yes), the information processing device 100 ends the processing.
  • FIG. 18 is a flowchart illustrating a flow of processing of the self-checkout machine 50 .
  • the self-checkout machine 50 identifies an operation for registering a product by a user. Specifically, the self-checkout machine 50 identifies the operation for registering the product, through an operation on a selection screen in which a list of products with no barcode is displayed. Furthermore, the self-checkout machine 50 identifies the operation for registering the product, by scanning a barcode of a product with the barcode (S 201 ). Subsequently, the self-checkout machine 50 specifies a product item and a scan time. Specifically, the self-checkout machine 50 specifies the product item, based on the operation for registering the product.
  • the self-checkout machine 50 specifies a time when the operation for registering the product is identified as the scan time, based on the operation for registering the product (S 202 ).
  • the self-checkout machine 50 transmits the scan information including the product item and the scan time, to the information processing device 100 (S 203 ).
  • the self-checkout machine 50 determines whether or not there is an alert notified from the information processing device 100 . In a case of determining that there is the alert, the self-checkout machine 50 proceeds to S 205 (S 204 : Yes). On the other hand, in a case of determining that there is no alert, the self-checkout machine 50 proceeds to S 206 (S 204 : No).
  • the self-checkout machine 50 stops the settlement processing of the product item (S 206 ).
  • the self-checkout machine 50 executes the settlement processing of the product item (S 205 ).
  • the information processing device 100 acquires video data in a predetermined area including an accounting machine in which a person registers a product and inputs the video data into the first machine learning model 104 so as to extract a product region from the video data.
  • the information processing device 100 stores time-series coordinate positions of the extracted product region, specifies a timing when the person performs the operation for registering the product in the self-checkout machine 50 , and specifies a product region related to the product registered in the self-checkout machine 50 , based on the specified timing of the operation and the time-series coordinate positions.
  • the information processing device 100 can specify the region of the product that is a fraud target from the video data, it is possible to recognize the product before the person ends the payment or before the person leaves the store, and it is possible to detect fraud in the self-checkout machine 50 .
  • the information processing device 100 specifies an item of the product, by inputting the product region related to the product registered in the self-checkout machine 50 into the second machine learning model 105 .
  • the information processing device 100 generates an alert. Therefore, the information processing device 100 can detect fraud of scanning a barcode of an inexpensive product instead of that of an expensive product.
  • the information processing device 100 specifies the product region to be the fraud determination target, based on the coordinate position immediately before or immediately after the timing when the person performs the operation for registering the product in the self-checkout machine 50 , from among the time-series coordinate positions. Therefore, since the information processing device 100 can accuracy specify the held product before and after the timing when the operation for registering the product is performed, the information processing device 100 can improve fraud detection accuracy.
  • the information processing device 100 specifies the product region to be the fraud determination target, from a distribution of the time-series coordinate positions. Therefore, even in a situation where it is difficult to make determination using the image data, for example, since the image data is unclear, the information processing device 100 can accurately specify the held product before and after the timing when the operation for registering the product is performed.
  • the information processing device 100 generates an alert indicating that the product registered in the self-checkout machine 50 by the person is abnormal. Therefore, the information processing device 100 can take measures such as asking circumstances before the person who has performed a fraudulent behavior goes out of the store.
  • the information processing device 100 outputs voice or a screen indicating alert content from the self-checkout machine 50 to a person positioned by the self-checkout machine 50 . Therefore, even in case of a force majeure mistake or an intentional fraud, the information processing device 100 can directly call attention to the person who is scanning. Therefore, it is possible to reduce mistakes and intentional fraud.
  • the information processing device 100 causes the camera of the self-checkout machine 50 to image the person and stores image data of the imaged person and the alert in the storage unit in association with each other. Therefore, since the information processing device 100 can collect and hold information regarding the fraudulent person who performs the fraudulent behavior, the information processing device 100 can use the information for various measures to prevent the fraud in advance, by detecting entrance of the fraudulent person from data captured by a camera that images customers. Furthermore, since the information processing device 100 can acquire and hold credit card information of the person who has performed the fraudulent behavior from the self-checkout machine 50 , in a case where the fraudulent behavior is confirmed, it is possible to charge a fee via a credit card company.
  • the numbers of self-checkout machines and cameras, numerical examples, training data examples, the number of pieces of training data, machine learning models, each class name, the number of classes, data formats, or the like used in the above embodiments are merely examples and can be arbitrarily changed.
  • the processing flow described in each flowchart may be appropriately changed in a range without contradiction.
  • a model generated by various algorithms such as a neural network may be adopted.
  • the shopping basket is an example of a conveyance tool such as a shopping basket or a product cart used to carry a product to be purchased selected by a user in the store to a self-checkout machine, for example.
  • the information processing device 100 can use known techniques such as another machine learning model for detecting a position, object detection techniques, or position detection techniques, for the scan position and the position of the shopping basket. For example, since the information processing device 100 can detect the position of the shopping basket based on a time-series change of the frame that is a difference between the frames (image data), the information processing device 100 may perform detection using the position and generate another model using the position. Furthermore, by designating the size of the shopping basket in advance, in a case where an object having that size is detected from the image data, the information processing device 100 can identify the object as the position of the shopping basket. Note that, since the scan position is a position fixed to some extent, the information processing device 100 can identify a position designated by an administrator or the like as the scan position.
  • Pieces of information including a processing procedure, a control procedure, a specific name, various types of data, and parameters described above or illustrated in the drawings may be optionally changed unless otherwise specified.
  • region extraction unit 113 and the coordinate position specification unit 114 may be integrated. That is, all or some of the components may be functionally or physically dispersed or integrated in optional units, depending on various kinds of loads, use situations, or the like. Moreover, all or some of the respective processing functions of the respective devices may be implemented by a central processing unit (CPU) and a program to be analyzed and executed by the CPU, or may be implemented as hardware by wired logic.
  • CPU central processing unit
  • processing functions individually performed in each device can be implemented by a central processing unit (CPU) and a program analyzed and executed by the CPU, or may be implemented as hardware by wired logic.
  • CPU central processing unit
  • program analyzed and executed by the CPU or may be implemented as hardware by wired logic.
  • FIG. 19 is a diagram for explaining a hardware configuration example.
  • the information processing device 100 will be described as an example.
  • the information processing device 100 includes a communication device 100 a, a hard disk drive (HDD) 100 b, a memory 100 c, and a processor 100 d.
  • the individual units illustrated in FIG. 19 are mutually coupled by a bus or the like.
  • the communication device 100 a is a network interface card or the like and communicates with another device.
  • the HDD 100 b stores programs for operating the functions illustrated in FIG. 3 and databases (DBs).
  • the processor 100 d reads a program that executes processing similar to the processing of each processing unit illustrated in FIG. 3 from the HDD 100 b or the like, and develops the read program in the memory 100 c to operate a process that executes each function described with reference to FIG. 3 or the like. For example, this process executes a function similar to the function of each processing unit included in the information processing device 100 .
  • the processor 100 d reads a program having functions similar to those of the machine learning unit 111 , the video acquisition unit 112 , the region extraction unit 113 , the coordinate position specification unit 114 , the product region specification unit 115 , the fraud detection unit 116 , the warning control unit 117 , or the like from the HDD 100 b or the like.
  • the processor 100 d executes a process for executing processing similar to those of the machine learning unit 111 , the video acquisition unit 112 , the region extraction unit 113 , the coordinate position specification unit 114 , the product region specification unit 115 , the fraud detection unit 116 , the warning control unit 117 , or the like.
  • the information processing device 100 works as an information processing device that executes an information processing method by reading and executing the program.
  • the information processing device 100 can also implement functions similar to the functions of the above-described embodiments by reading the program described above from a recording medium by a medium reading device and executing the above read program.
  • other programs mentioned in the embodiments are not limited to being executed by the information processing device 100 .
  • the embodiments described above may be similarly applied also to a case where another computer or server executes the program or a case where these computer and server cooperatively execute the program.
  • This program may be distributed via a network such as the Internet.
  • this program may be recorded in a computer-readable recording medium such as a hard disk, a flexible disk (FD), a compact disc read only memory (CD-ROM), a magneto-optical disk (MO), or a digital versatile disc (DVD) and may be executed by being read from the recording medium by a computer.
  • a computer-readable recording medium such as a hard disk, a flexible disk (FD), a compact disc read only memory (CD-ROM), a magneto-optical disk (MO), or a digital versatile disc (DVD)
  • FIG. 20 is a diagram for explaining a hardware configuration example of the self-checkout machine 50 .
  • the self-checkout machine 50 includes a communication interface 400 a, an HDD 400 b, a memory 400 c, a processor 400 d, an input device 400 e, and an output device 400 f.
  • the individual units illustrated in FIG. 20 are mutually coupled by a bus or the like.
  • the communication interface 400 a is a network interface card or the like, and communicates with other information processing devices.
  • the HDD 400 b stores a program for operating each function of the self-checkout machine 50 and data.
  • the processor 400 d is a hardware circuit that reads the program that executes processing of each function of the self-checkout machine 50 from the HDD 400 b or the like and develops the read program in the memory 400 c to operate a process that executes each function of the self-checkout machine 50 . That is, this process executes a function similar to each processing unit included in the self-checkout machine 50 .
  • the self-checkout machine 50 operates as an information processing device that executes operation control processing by reading and executing the program that executes processing of each function of the self-checkout machine 50 . Furthermore, the self-checkout machine 50 can implement each function of the self-checkout machine 50 by reading a program from a recoding medium by a medium reading device and executing the read program. Note that other programs mentioned in the embodiments are not limited to being executed by the self-checkout machine 50 . For example, the present embodiment may be similarly applied to a case where another computer or server executes the program, or a case where these computer and server cooperatively execute the program.
  • the program that executes the processing of each function of the self-checkout machine 50 can be distributed via a network such as the Internet. Furthermore, this program can be recorded in a computer-readable recording medium such as a hard disk, an FD, a CD-ROM, an MO, or a DVD, and can be executed by being read from the recording medium by a computer.
  • a computer-readable recording medium such as a hard disk, an FD, a CD-ROM, an MO, or a DVD
  • the input device 400 e detects various input operations by the user, such as an input operation for the program executed by the processor 400 d .
  • the input operation includes, for example, a touch operation or the like.
  • the self-checkout machine 50 further includes a display unit, and the input operation detected by the input device 400 e may be a touch operation on the display unit.
  • the input device 400 e may be, for example, a button, a touch panel, a proximity sensor, or the like.
  • the input device 400 e reads a barcode.
  • the input device 400 e is, for example, a barcode reader.
  • the barcode reader includes a light source and an optical sensor and scans a barcode.
  • the output device 400 f outputs data output from the program executed by the processor 400 d via an external device coupled to the self-checkout machine 50 , for example, an external display device or the like. Note that, in a case where the self-checkout machine 50 includes the display unit, the self-checkout machine 50 does not need to include the output device 400 f .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Cash Registers Or Receiving Machines (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Image Analysis (AREA)

Abstract

A storage medium storing an information processing program that causes a computer to execute a process that includes acquiring video data that includes a registration machine; extracting image data that include products;
specifying a timing when first information regarding a first product registered to the registration machine; specifying certain image data of the image data that includes a second product held in the hand of the user within a certain time period from the timing and placed in a place in an angle of view of the video data that is not a place where a product that has been registered to the registration machine is placed for most of the certain time period; specifying second information regarding the second product by inputting the certain image data to a machine learning model; and generating an alert when the first information and the second information do not match.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2022-207689, filed on Dec. 23, 2022, the entire contents of which are incorporated herein by reference.
  • FIELD
  • The embodiments discussed herein are related to an information processing program, an information processing method, and an information processing device.
  • BACKGROUND
  • Image recognition technology for recognizing a specific object from an image has been widely used. With this technology, for example, a region of the specific object in the image is specified as a bounding box (Bbox). Furthermore, there is technology for performing object image recognition using machine learning. Then, such image recognition technology is considered to be applied, for example, to monitoring of a customer's purchasing behavior in a store or work management of workers in a factory.
  • In stores such as supermarkets and convenience stores, self-checkout machines are becoming popular. The self-checkout machine is a point of sale (POS) cash register system by which a user who purchases a product himself/herself performs operations from reading of a barcode of the product to payment. For example, by introducing the self-checkout machine, it is possible to overcome shortage of labor caused by population reduction and suppress labor cost.
  • Japanese Laid-open Patent Publication No. 2019-29021 is disclosed as related art.
  • SUMMARY Technical Problem
  • However, since a positional relationship of Bboxes extracted from a video is based on a two-dimensional space, for example, the depth between the Bboxes cannot be analyzed, and it is difficult to detect a relationship between an accounting machine such as a self-checkout machine and a product to be registered in the accounting machine. Furthermore, it is difficult for the accounting machine to detect a force majeure error and intentional fraud by a user.
  • The force majeure error includes a scan omission in which a user forgets to scan a product and moves the product from a basket to a plastic bag, for example, a reading error for erroneously reading a barcode on a can when barcodes are attached to a beer box, including a set of six cans, and each of cans, for example. Furthermore, the intentional fraud includes barcode concealment for pretending to scan a product while hiding only the barcode with the finger by the user, or the like.
  • Note that, although it is considered to automatically count the number of products and detect fraud, by introducing a weight sensor or the like in each self-checkout machine. However, cost is excessive, and it is not realistic, particularly, for large stores and stores located across the country.
  • In one aspect, an object is to provide an information processing program, an information processing method, and an information processing device capable of identifying a product registered in an accounting machine.
  • Solution to Problem
  • According to an aspect of the embodiments, a non-transitory computer-readable storage medium storing an information processing program that causes at least one computer to execute a process, the process includes acquiring video data each image data of which includes a registration machine used to register a product by a user; extracting, from the acquired video data, image data that include products by specifying a first region that includes a hand of the user, a second region that includes a product, and a relationship between the first region and the second region, for the image data of the acquired video data; specifying a timing when first information regarding a first product registered to the registration machine by the user; specifying certain image data of the image data that includes a second product held in the hand of the user within a certain time period from the timing and placed in a place in an angle of view of the video data that is not a place where a product that has been registered to the registration machine is placed for most of the certain time period, based on the first region for the image data, the second region for the image data, and the relationship for the image data; specifying second information regarding the second product by inputting the certain image data to a machine learning model; and generating an alert when the first information and the second information do not match.
  • The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
  • It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
  • Advantageous Effects of Invention
  • According to one embodiment, it is possible to identify a product registered in an accounting machine.
  • BRIEF DESCRIPTION OF DRAWINGS
  • FIG. 1 is a diagram illustrating an overall configuration example of a self-checkout machine system according to a first embodiment;
  • FIG. 2 is a diagram for explaining an example of detection of an abnormal behavior according to the first embodiment;
  • FIG. 3 is a functional block diagram illustrating a functional configuration of an information processing device according to the first embodiment;
  • FIG. 4 is a diagram for explaining training data;
  • FIG. 5 is a diagram for explaining machine learning of a first machine learning model;
  • FIG. 6 is a diagram for explaining machine learning of a second machine learning model;
  • FIG. 7 is a diagram for explaining extraction of a product region;
  • FIG. 8 is a diagram for explaining specification of a coordinate position of the product region;
  • FIG. 9 is a diagram for explaining specification of information to be a determination target of fraud;
  • FIG. 10 is a diagram for explaining specification of a product region used to determine the fraud;
  • FIG. 11 is a diagram for explaining specification of a product region used to determine the fraud using HOID;
  • FIG. 12 is a diagram for explaining specification of a product region used to determine the fraud using a distribution;
  • FIG. 13 is a diagram for explaining specification of a product item;
  • FIG. 14 is a diagram for explaining detection of a fraudulent behavior;
  • FIG. 15 is a diagram illustrating an alert display example on a self-checkout machine;
  • FIG. 16 is a diagram illustrating an alert display example to a clerk;
  • FIG. 17 is a flowchart illustrating a flow of processing of the information processing device;
  • FIG. 18 is a flowchart illustrating a flow of processing of the self-checkout machine;
  • FIG. 19 is a diagram for explaining a hardware configuration example; and
  • FIG. 20 is a diagram for explaining a hardware configuration example of the self-checkout machine.
  • DESCRIPTION OF EMBODIMENTS
  • Hereinafter, embodiments of an information processing program, an information processing method, and an information processing device disclosed in the present application will be described in detail with reference to the drawings. Note that these embodiments do not limit the present disclosure.
  • Furthermore, the embodiments may be appropriately combined with each other in a range without contradiction.
  • First Embodiment <Description of Self-checkout Machine System>
  • FIG. 1 is a diagram illustrating an overall configuration example of a self-checkout machine system 5 according to a first embodiment. As illustrated in FIG. 1 , the self-checkout machine system 5 includes a camera 30, a self-checkout machine 50, an administrator's terminal 60, and an information processing device 100.
  • The information processing device 100 is an example of a computer coupled to the camera 30 and the self-checkout machine 50. The information processing device 100 is coupled to the administrator's terminal 60, via a network 3 for which various wired and wireless communication networks can be adopted. The camera 30 and the self-checkout machine 50 may be coupled to the information processing device 100, via the network 3.
  • The camera 30 is an example of a camera that captures a video of a region including the self-checkout machine 50. The camera 30 transmits data of a video to the information processing device 100. In the following description, there is a case where the data of the video is referred to as “video data” or is simply referred to as a “video”.
  • The video data includes a plurality of time-series image frames. To each image frame, a frame number is assigned in a time-series ascending order. One image frame is image data of a still image captured by the camera 30 at a certain timing. In the following description, there is a case where the image data is simply referred to as an “image”.
  • The self-checkout machine 50 is an example of a POS cash register system or an accounting machine with which a user 2 who purchases a product performs operations from reading a barcode of the product to payment. For example, when the user 2 moves a product to be purchased to a scan region of the self-checkout machine 50, the self-checkout machine 50 scans a barcode of the product and registers the product as a product to be purchased.
  • Note that, as described above, the self-checkout machine 50 is an example of a self-checkout machine that registers (register operation) a product to be purchased by a customer and makes a payment, and is referred to as, for example, Self checkout, automated checkout, self-checkout machine, self-check-out register, or the like. The barcode is one type of an identifier representing a numerical value or a character depending on thicknesses of striped lines, and the self-checkout machine 50 can specify the price, the type (for example, food), or the like of the product by scanning (reading) the barcode. The barcode is an example of a code, and two dimensional codes such as a quick response (QR) code having the same function can be used, in addition to the barcode.
  • The user 2 repeatedly performs the operation of the product registration described above, and when the scan of the product is completed, the user 2 operates a touch panel or the like of the self-checkout machine 50, and makes a settlement request. Upon receiving the settlement request, the self-checkout machine 50 presents the number of products to be purchased, the purchase price, or the like, and executes settlement processing. The self-checkout machine 50 stores information regarding the products that have been scanned from when the user 2 starts scanning to when the settlement request is issued, in a storage unit and transmits the information to the information processing device 100 as self-checkout machine data (product information).
  • The administrator's terminal 60 is an example of a terminal device used by an administrator of a store. The administrator's terminal 60 receives an alert notification indicating that fraud has been performed regarding purchase of a product or the like, from the information processing device 100.
  • With such a configuration, the information processing device 100 acquires video data of a predetermined area including the self-checkout machine 50 with which a person registers a product and inputs the acquired video data into a first machine learning model, so as to detect a product region from the video data. The information processing device 100 stores time-series coordinate positions of the detected product region in the storage unit. The information processing device 100 specifies a timing based on an operation of the person for registering the product in the self-checkout machine 50, and specifies a product region related to the product registered in the self-checkout machine 50, based on the specified timing based on the operation and the time-series coordinate positions stored in the storage unit.
  • FIG. 2 is a diagram for explaining an example of detection of an abnormal behavior according to the first embodiment. As illustrated in FIG. 2 , the information processing device 100 acquires image data from the video data captured by the camera 30 that images the self-checkout machine 50 and acquires a Human-Object Interaction Detection (HOID) result from the acquired image data, using the HOID or the like. That is, the information processing device 100 acquires a region of a person, a region of an object, and a relationship between the person and the object, from the video data.
  • Subsequently, the information processing device 100 generates hand-held product image data (hereinafter, may be referred to as hand-held product image) obtained by extracting a region portion of the object (product) related to the person, from the image data of the HOID result. Then, the information processing device 100 analyzes the hand-held product image and identifies an image of a product (for example, wine) imaged in the hand-held product image.
  • On the other hand, the information processing device 100 acquires a scan result (for example, egg) that is information regarding the product scanned by the self-checkout machine 50, from the self-checkout machine 50.
  • Here, the information processing device 100 compares the product item (for example, wine) specified from the video data with the product item (for example, egg) actually scanned by the self-checkout machine 50, and in a case where the product items do not match, the information processing device 100 determines that an abnormal behavior (fraud) is performed and notifies of an alert.
  • That is, the information processing device 100 analyzes the image data captured at the scanned timing and determines whether or not a product to be scanned and an actually scanned product match. As a result, since the information processing device 100 can detect fraud (for example, banana trick) in which, after a product with no barcode on the product itself is held, another inexpensive product is registered on a registration screen of the self-checkout machine 50, the information processing device 100 can identify the product registered in the self-checkout machine 50.
  • <Functional Configuration>
  • FIG. 3 is a functional block diagram illustrating a functional configuration of the information processing device 100 according to the first embodiment. As illustrated in FIG. 3 , the information processing device 100 includes a communication unit 101, a storage unit 102, and a control unit 110.
  • The communication unit 101 is a processing unit that controls communication with another device and, for example, is implemented by a communication interface or the like. For example, the communication unit 101 receives video data from the camera 30 and transmits a processing result by the control unit 110 to the administrator's terminal 60.
  • The storage unit 102 is a processing unit that stores various types of data, programs executed by the control unit 110, or the like, and is implemented by a memory, a hard disk, or the like. The storage unit 102 stores a training data database (DB) 103, a first machine learning model 104, a second machine learning model 105, a video data DB 106, and a coordinate position DB 107.
  • The training data DB 103 is a database that stores training data used to train the first machine learning model 104 and training data used to train the second machine learning model 105. For example, an example will be described where Human-Object Interaction Detection (HOID) is adopted for the first machine learning model 104, with reference to FIG. 4 . FIG. 4 is a diagram for explaining training data. As illustrated in FIG. 4 , each piece of the training data includes image data to be input data and correct answer information (label) set to the image data.
  • To the correct answer information, classes of a person and an object to be detected, a class indicating an interaction between the person and the object, and a bounding box (Bbox: object region information) indicating a region of each class are set. For example, as the correct answer information, region information of a Something class indicating an object, which is an object such as a product, other than a plastic bag, region information of a class of a person indicating a user who purchases the product, and a relationship (holding class) indicating an interaction between the Something class and the class of the person are set. That is, information regarding the object held by the person is set, as the correct answer information. Note that, the class of the person is an example of a first class, the Something class is an example of a second class, the region information of the class of the person is an example of a first region, the region information of the Something class is an example of a second region, and the interaction between the person and the object is an example of an interaction.
  • Furthermore, as the correct answer information, region information of a class of a plastic bag indicating the plastic bag, region information of a class of a person indicating a user who uses the plastic bag, and a relationship (holding class) indicating an interaction between the class of the plastic bag and the class of the person are set. That is, information regarding the plastic bag held by the person is set, as the correct answer information.
  • Typically, when the Something class is created by normal object identification (object recognition), all objects that have no relation with a task such as all backgrounds, clothes, or accessories are detected. In addition, since all of these are Somethings, only a large number of Bboxes are identified in the image data, and nothing is found. In a case of the HOID, a special relationship such that a person holds things (may be other relationships such as sitting or operating) is found. Therefore, the information can be used for a task (for example, fraud detection task of self-checkout machine) as meaningful information. After detecting the object with the Something, the plastic bag or the like is identified as a unique class of Bag (plastic bag). Although this plastic bag is valuable information in a fraud detection task of the self-checkout machine, the plastic bag is not important information in other tasks. Therefore, it is worth using the information based on unique knowledge of the fraud detection task of the self-checkout machine indicating that the product is taken out from a basket (shopping basket) and is put into a bag, and a useful effect is obtained.
  • Returning to FIG. 3 , the first machine learning model 104 is an example of a machine learning model that is trained to identify a person and an object imaged in training data (for example, person and storage (plastic bag or the like)). Specifically, the first machine learning model 104 is a machine learning model that identifies the person, the product, and the relationship between the person and the product from the input image data, and outputs an identification result. For example, the first machine learning model 104 can adopt the HOID and can also adopt a machine learning model using various neural networks or the like. In a case of the HOID, “the class and the region information of the person, the class and the region information of the product (object), and the interaction between the person and the product” are output.
  • The second machine learning model 105 is an example of a machine learning model trained to specify an item of a product imaged in training data. For example, the second machine learning model 105 may be implemented by a zero-shot image classifier. In this case, the second machine learning model 105 uses a list of texts and an image as inputs and outputs a text having the highest similarity to the image, in the list of the texts, as a label of the image.
  • Here, as an example of the zero-shot image classifier described above, contrastive language-image pre-training (CLIP) is exemplified. The CLIP implements embedding of a plurality types of, so-called multimodal images and texts into a feature space. That is, with the CLIP, by training an image encoder and a text encoder, embedding, in which a vector distance between a pair of an image and a text having close meanings is shortened, is implemented. For example, the image encoder may be implemented by a vision transformer (ViT) or may be implemented by a convolutional neural network, for example, a ResNet or the like. Furthermore, the text encoder may be implemented by a generative pre-trained transformer (GPT) based Transformer or may be implemented by a recurrent neural network, for example, a long short-term memory (LSTM).
  • The video data DB 106 is a database that stores the video data captured by the camera 30 provided in the self-checkout machine 50. For example, the video data DB 106 stores the video data for each self-checkout machine 50 or each camera 30.
  • The coordinate position DB 107 is a database that stores coordinate positions that are position information of a product acquired from the video data, in time series. For example, the coordinate position DB 107 stores coordinate positions of a product in time series, for each tracked product. Note that an origin to be the reference of the coordinate position can be arbitrarily set, for example, to be the center of the image data, a corner of the image data (for example, lower left corner (angle), or the like.
  • The control unit 110 is a processing unit that performs overall control of the information processing device 100 and, for example, is implemented by a processor or the like. The control unit 110 includes a machine learning unit 111, a video acquisition unit 112, a region extraction unit 113, a coordinate position specification unit 114, a product region specification unit 115, a fraud detection unit 116, and a warning control unit 117. Note that the machine learning unit 111, the video acquisition unit 112, the region extraction unit 113, the coordinate position specification unit 114, the product region specification unit 115, the fraud detection unit 116, and the warning control unit 117 are implemented by an electronic circuit included in a processor, a process executed by the processor, or the like.
  • (Machine Learning)
  • The machine learning unit 111 is a processing unit that performs machine learning of the first machine learning model 104 and the second machine learning model 105, using each piece of the training data stored in the training data DB 103. Note that the first machine learning model 104 and the second machine learning model 105 may be machine learned in advance, and the machine learning unit 111 can execute the following processing as fine tuning in a case where accuracy of the machine-learned first machine learning model 104 and second machine learning model 105 is insufficient.
  • First, training of the first machine learning model 104 will be described. FIG. 5 is a diagram for explaining machine learning of the first machine learning model 104. In FIG. 5 , an example in which the HOID is used for the first machine learning model 104 is illustrated. As illustrated in FIG. 5 , the machine learning unit 111 inputs input data of the training data into the HOID and acquires an output result of the HOID. The output result includes a class of a person, a class of an object, an interaction between the person and the object, or the like detected by the HOID. Then, the machine learning unit 111 calculates error information between the correct answer information of the training data and the output result of the HOID and performs machine learning (training) for updating a parameter of the HOID through backpropagation, so as to reduce an error.
  • Next, training of the second machine learning model 105 will be described. FIG. 6 is a diagram for explaining machine learning of the second machine learning model 105. In FIG. 6 , as an example of the second machine learning model 105, a CLIP model 10 is illustrated. As illustrated in FIG. 6 , to train the CLIP model 10, a pair of image data (hereinafter, may be referred to as image) and a text is used as training data. For such training data, a dataset obtained by extracting a pair of an image and a text described as a caption of the image from a Web page on the Internet, so-called WebImageText (WIT) can be used. For example, a pair of an image such as a photograph of a dog or a picture in which an illustration of a dog is drawn and a text “dog picture” described as a caption of the image is used as the training data. By using the WIT as the training data in this way, a labeling work is not needed, and a large amount of training data can be acquired.
  • Among these pairs of the images and the texts, the image is input into an image encoder 10I, and the text is input into a text encoder 10T. The image encoder 10I to which the image is input in this way outputs a vector in which the image is embedded into a feature space. On the other hand, the text encoder 10T to which the text is input outputs a vector in which the text is embedded into a feature space.
  • For example, in FIG. 6 , a mini batch having a batch size N including training data of a pair of an image 1 and a text 1, a pair of an image 2 and a text 2, . . . , and N pairs of N images and N texts is illustrated. In this case, by inputting each of the images and the texts of the N pairs into the image encoder 10I and the text encoder 10T, a similarity matrix M1 of N×N embedding vectors can be obtained. Note that, the “similarity” used herein may be an inner product or cosine similarity between the embedding vectors, as merely an example.
  • Here, in the training of the CLIP model 10, a label is unstable since caption formats of Web texts varies. Therefore, an objective function called Contrastive objective is used.
  • In the Contrastive objective, in a case of an i-th image in mini batch, an i-th text corresponds to a correct pair. Therefore, the i-th text is a positive example, and all texts other than the i-th text are negative examples.
  • That is, since a single positive example and N−1 negative examples are set for each piece of training data, N positive examples and N2−N negative examples are generated in the entire mini batch. For example, in the example of the similarity matrix M1, elements of N diagonal components with black and white inversion display are positive examples, and elements of N2−N with white background display are negative examples.
  • Under such a similarity matrix M1, parameters of the image encoder 10I and the text encoder 10T for maximizing a similarity between the N pairs corresponding to the positive example and minimizing a similarity between the N2−N pairs corresponding to the negative example are trained.
  • For example, in an example of the first image 1, the first text is a positive example, second and subsequent texts are negative examples, and a loss, for example, a cross entropy error is calculated in a row direction of the similarity matrix M. By calculating such a loss for each of the N images, a loss related to an image is obtained. On the other hand, in an example of the second text 2, the second image is a positive example, and all images other than the second image are negative examples, and the loss is calculated in a column direction of the similarity matrix M. By calculating such a loss for each of the N texts, a loss related to a text is obtained. The image encoder 10I and the text encoder 10T update the parameter for minimizing a statistic value, for example, an average of the losses related to the images and the losses related to the texts.
  • Through such training of the image encoder 10I and the text encoder 10T for minimizing the Contrastive objective, the trained CLIP model 10 (for example, second machine learning model 105) is generated.
  • (Video Acquisition)
  • The video acquisition unit 112 is a processing unit that acquires video data from the camera 30. For example, the video acquisition unit 112 acquires video data from the camera 30 provided in the self-checkout machine 50 as needed and stores the video data in the video data DB 106.
  • (Region Extraction)
  • The region extraction unit 113 is a processing unit that extracts a product region from the video data, by inputting the video data acquired by the video acquisition unit 112 into the first machine learning model 104. Specifically, the region extraction unit 113 specifies a first region including a hand of a person, a second region including a product, and a relationship between the first region and the second region, from the video data, by inputting the video data into the first machine learning model 104 that is the HOID.
  • That is, the region extraction unit 113 extracts a region of a product that is a target of a behavior of a person in the video data. For example, the region extraction unit 113 extracts a region of a product taken out from a shopping basket, a product held by the person, and a product put into a plastic bag.
  • FIG. 7 is a diagram for explaining extraction of the product region. In FIG. 7 , the image data to be input into the HOID and the output result of the HOID are illustrated. Moreover, in FIG. 7 , a Bbox of a person is indicated by a frame of a solid line, and a Bbox of an object is indicated by a frame of a broken line. As illustrated in FIG. 7 , the output result of the HOID includes the Bbox of the person, the Bbox of the object, a probability value of the interaction between the person and the object, a class name, or the like. With reference to the Bbox of the object of these, the region extraction unit 113 extracts the region of the product held by the person, by extracting the Bbox of the object, that is, a partial image corresponding to the frame of the broken line in FIG. 7 , from the image data.
  • Furthermore, the region extraction unit 113 tracks the product, in a case where the product held with the hand of the person is detected. That is, the region extraction unit 113 tracks a movement related to the same product and a region of the same product, with consecutive frames in and subsequent to a certain frame from which the product region is extracted, in the video data. For example, for each product detected by the HOID, the region extraction unit 113 tracks the product from when the product is detected by the HOID to when the product put into the plastic bag is detected by the HOID. Then, the region extraction unit 113 stores a tracking result to the storage unit 102.
  • (Specification of Coordinate Position)
  • The coordinate position specification unit 114 is a processing unit that specifies time-series coordinate positions of the product region extracted by the region extraction unit 113 and stores the coordinate positions in the storage unit. Specifically, the coordinate position specification unit 114 acquires coordinates of a product region of the tracked product in time series, from the start to the end of the tracking by the region extraction unit 113. For example, the coordinate position specification unit 114 acquires a center coordinate of the tracked product or each of coordinates of four corners used to specify the product region of the tracked product in time series.
  • FIG. 8 is a diagram for explaining specification of a coordinate position of a product region. In FIG. 8 , image data 1 to 7 that is input data into the HOID and detection content of the HOID when the image data 1 to 7 is sequentially input are illustrated. Note that, in FIG. 8 , description written on the image data is information imaged in the image data, unknown information as the input into the HOID, and information to be detected by the HOID.
  • As illustrated in FIG. 8 , the region extraction unit 113 acquires the image data 1 in which no person and no object is imaged, inputs the image data 1 into the HOID, and acquires the output result. However, the region extraction unit 113 determines that there is no detection result of persons and objects. Subsequently, the region extraction unit 113 acquires the image data 2 in which a person holding a shopping basket is imaged, inputs the image data 2 into the HOID, and detects the user 2 (person) and the shopping basket held by the user 2, according to an output result.
  • Subsequently, the region extraction unit 113 acquires the image data 3 in which a person who takes out a product from a shopping basket is imaged, inputs the image data 3 into the HOID, and detects a behavior of the user 2 for moving the holding product on the shopping basket, according to an output result. Then, the region extraction unit 113 starts tracking because the product is detected. Here, the coordinate position specification unit 114 acquires a coordinate position A1 of the product taken out from the shopping basket or a coordinate position A1 of a product region of the product taken out from the shopping basket. Note that the region extraction unit 113 can start tracking at a timing of the image data 2 in which only the shopping basket is detected. In this case, the region extraction unit 113 extracts a region as assuming the shopping basket as the product, and the coordinate position specification unit 114 acquires a coordinate position.
  • Subsequently, the region extraction unit 113 acquires the image data 4 in which a person who scans a product is imaged, inputs the image data 4 into the HOID, and detects a behavior of the user 2 for moving the holding product to a scan position, according to an output result. Here, the coordinate position specification unit 114 acquires a coordinate position A2 of the held product or a coordinate position A2 of a product region of the held product.
  • Subsequently, the region extraction unit 113 acquires the image data 5 in which a person who puts a product in a plastic bag is imaged, inputs the image data 5 into the HOID, and detects a behavior of the user 2 for putting the holding product into the holding plastic bag, according to an output result. Here, the coordinate position specification unit 114 acquires a coordinate position A3 of the product held in the plastic bag or a coordinate position A3 of a product region of the product held in the plastic bag.
  • Note that, since the region extraction unit 113 detects that the product has been put into the plastic bag, by analyzing the image data 5, the region extraction unit 113 ends the tracking of the product. Then, the coordinate position specification unit 114 stores the coordinate position A1, the coordinate position A2, and the coordinate position A3 that are the coordinate positions of the tracked product in time series, in the coordinate position DB 107.
  • In this way, the coordinate position specification unit 114 specifies the coordinate position of the product, generates time-series data of the coordinate positions, and stores the data in the coordinate position DB 107.
  • (Product Region)
  • Returning to FIG. 3 , the product region specification unit 115 is a processing unit that specifies a timing when the person performs an operation for registering the product in the self-checkout machine 50 and specifies a product region related to the product registered in the self-checkout machine 50 based on the specified operation timing and the time-series coordinate positions stored in the coordinate position DB 107.
  • For example, the product region specification unit 115 specifies the product region, based on a coordinate position immediately before the timing when the person performs the operation for registering the product in the self-checkout machine 50, from among the time-series coordinate positions stored in the coordinate position DB 107. Alternatively, the product region specification unit 115 specifies the product region, based on a coordinate position immediately after the timing when the person performs the operation for registering the product in the self-checkout machine 50, from among the time-series coordinate positions stored in the coordinate position DB 107.
  • It is expected that the person performs fraud for registering an inexpensive product by operating the self-checkout machine 50, without scanning the product, in a state where the held product is placed around the self-checkout machine 50. Therefore, the product region specification unit 115 specifies the product region of the product placed around the self-checkout machine 50 by the person who has held the product as a fraud determination target.
  • When purchasing a product with no barcode, a person operates the self-checkout machine 50 and registers the product to be purchased. At this time, fraud is considered such that, although a product to be purchased is a melon, the person registers a bunch of bananas that is cheaper than a melon, as the product to be purchased. Therefore, the product region specification unit 115 specifies the product region of the product placed around the self-checkout machine 50 by the person who has held the product as a fraud determination target.
  • Furthermore, fraud is considered such that, the person causes the self-checkout machine 50 to scan a barcode attached to a single product included in a set product, not a position of a barcode attached to the set product and purchases the set product with a low price region of the single product. For example, the set product is collectively packaged in a state where cans are arranged in two rows by three using a packaging material, so as to collectively carry six alcoholic beverage cans. At this time, a barcode is attached to each of the packaging material used to package the set of the plurality of alcoholic beverage cans and the can of the alcoholic beverage packaged using the packaging material. Fraud is considered such that a person causes the self-checkout machine 50 to scan the barcode of the alcoholic beverage packaged in the packaging material, not the barcode of the packaging material. The single product included in the set product is registered in the self-checkout machine 50.
  • On the other hand, the product held by the user is the set product. Therefore, the product region specification unit 115 specifies the product region of the product placed around the self-checkout machine 50 by the person who has held the product as a fraud determination target.
  • (Operation for Registering Product)
  • Here, the operation for registering the product in the self-checkout machine 50 will be described. As the operation for registering the product, there is an operation for registering an item of a product in the self-checkout machine 50, via an operation on a selection screen in which a list of products with no barcode is displayed. Furthermore, there is an operation for registering an item of a product in the self-checkout machine 50 by scanning a barcode of a product with the barcode by the self-checkout machine 50.
  • The self-checkout machine 50 registers a product with no barcode in the cash register through manual input of a person. In some cases, the self-checkout machine 50 receives the register registration of the item of the product, from a selection screen in which the items of the products with no barcode are displayed. For example, the self-checkout machine 50 registers an item of a product selected by a user from the list of the items of the products with no barcodes in a recoding medium of the self-checkout machine 50, based on a user's touch operation on the selection screen. At this time, the product region specification unit 115 of the information processing device 100 specifies a product region of a product, with respect to a timing when the item of the product with no barcode is registered in the self-checkout machine 50.
  • The self-checkout machine 50 transmits a notification of scan information indicating that the operation for registering the product has been performed, to the information processing device 100, via the network. The product region specification unit 115 identifies the registration timing, based on the notification of the scan information from the self-checkout machine 50 via the network. Specifically, when the item of the product with no barcode is registered in the self-checkout machine 50, the product region specification unit 115 specifies the product region of the product from among the time-series coordinate positions that have been stored, with respect to the timing when the item of the product with no barcode is registered in the self-checkout machine 50. Note that the product region specification unit 115 may specify the product region of the product, with reference to a timing when the touch operation is performed on a display of the self-checkout machine 50.
  • On the other hand, the self-checkout machine 50 registers the product with the barcode in the cash register by scanning the barcode. The self-checkout machine 50 identifies an item of the product by scanning the barcode. Then, the self-checkout machine 50 registers the identified item of the product in the recoding medium of the self-checkout machine 50. At this time, the product region specification unit 115 of the information processing device 100 specifies the product region of the product, with reference to the timing when the item of the product is registered in the self-checkout machine 50, through scanning of the barcode.
  • The self-checkout machine 50 transmits a notification of scan information indicating that the operation for registering the product has been performed, to the information processing device 100, via the network. The product region specification unit 115 identifies the registration timing, based on the notification of the scan information from the self-checkout machine 50 via the network. Specifically, when the item of the product with the barcode is registered in the self-checkout machine 50, the product region specification unit 115 specifies the product region of the product from among the time-series coordinate positions that have been stored, with reference to the timing when the item of the product with the barcode is registered in the self-checkout machine 50.
  • FIG. 9 is a diagram for explaining specification of information to be a determination target of fraud. In FIG. 9 , as in FIG. 8 , each of pieces of image data subsequent to image data n that is the input data into the HOID and a detection content of the HOID when each of the pieces of the image data subsequent to the image data n is sequentially input are illustrated.
  • As illustrated in FIG. 9 , the region extraction unit 113 acquires the image data n in which a person who takes out a product from a shopping basket is imaged, inputs the image data n into the HOID, and detects a behavior of the user 2 for moving the holding product on the shopping basket, according to an output result. Then, the region extraction unit 113 starts tracking because the product is detected. Here, the coordinate position specification unit 114 acquires a coordinate position M of a product region of the tracked product.
  • Subsequently, the region extraction unit 113 acquires image data n1 in which a person holding a product is imaged, inputs the image data n1 into the HOID, and detects a behavior of the user 2 for taking out the product from the shopping basket and holding the product, according to an output result. Here, the coordinate position specification unit 114 acquires a coordinate position M1 of the product region of the tracked and held product.
  • Subsequently, the region extraction unit 113 acquires image data n2 in which a product held by a person around the self-checkout machine 50 is imaged, inputs the image data n2 into the HOID, and detects a behavior of the user 2 for placing the product around the self-checkout machine 50, according to an output result. Here, the coordinate position specification unit 114 acquires a coordinate position M2 of the product region of the tracked and placed product.
  • Subsequently, the region extraction unit 113 acquires image data n3 in which a product placed around the self-checkout machine 50 by a person is imaged, inputs the image data n3 into the HOID, and detects the product kept placed around the self-checkout machine 50, according to an output result. Here, the coordinate position specification unit 114 acquires a coordinate position M3 of the product region of the tracked and kept placed product.
  • Subsequently, the region extraction unit 113 acquires image data n4 in which a person is holding a product, inputs the image data n4 into the HOID, and detects a behavior of the user 2 for holding the product placed around the self-checkout machine 50, according to an output result. Here, the coordinate position specification unit 114 acquires a coordinate position M4 of the product region of the tracked and held product.
  • Thereafter, the region extraction unit 113 acquires image data n5 in which a person who puts a product in a plastic bag is imaged, inputs the image data n5 into the HOID, and detects a behavior of the user 2 for putting the holding product into the holding plastic bag, according to an output result. Then, the coordinate position specification unit 114 acquires the coordinate position M4 of the product region of the tracked product that is in the plastic bag, and tracking performed by the region extraction unit 113 ends.
  • In a situation where the time-series data of the coordinate positions is collected in this way, the product region specification unit 115 receives a scan result from the self-checkout machine 50. Then, the product region specification unit 115 specifies the coordinate position M3 immediately before a scan time included in the scan result and the coordinate position M4 immediately after the scan time. As a result, the product region specification unit 115 specifies the coordinate position of the product corresponding to the timing when the person has performed the operation for registering the product in the self-checkout machine 50, as the coordinate position M3 or the coordinate position M4.
  • Next, the product region specification unit 115 specifies image data of a region corresponding to the specified coordinate position that is a product region to be the determination target of the fraud. Here, a specification example of the product region to be the determination target of the fraud is described as an example using the coordinate position M3. However, the coordinate position M4 may be used.
  • For example, the product region specification unit 115 specifies a region of a product including a coordinate position, from image data that is a coordinate position specification source, as the determination target of the fraud. FIG. 10 is a diagram for explaining specification of a product region used to determine fraud. As illustrated in FIG. 10 , the product region specification unit 115 specifies a region of a product C2 including the coordinate position M3, in the image data n3 that is the specification source image data. Then, the product region specification unit 115 extracts image data including the region of the product C2 from the image data n3, as the image data of the product region to be the determination target of the fraud.
  • For example, the product region specification unit 115 can specify the specified region of the product including the coordinate position, from among a plurality of product regions extracted by the HOID, as the determination target of the fraud. FIG. 11 is a diagram for explaining specification of a product region used to determine fraud using the HOID. As illustrated in FIG. 11 , the product region specification unit 115 specifies the region of the product C2 including the coordinate position M3, from among a person region, a region of a product C1, and the region of the product C2 extracted from the image data n3 by the HOID. Then, the product region specification unit 115 extracts image data including the region of the product C2 from the image data n3, as the image data of the product region to be the determination target of the fraud.
  • For example, the product region specification unit 115 can specify a product region to be the determination target of the fraud, based on a distribution of the time-series coordinate positions. FIG. 12 is a diagram for explaining specification of a product region used to determine fraud using a distribution. As illustrated in FIG. 12 , the product region specification unit 115 plots each coordinate position (coordinate position M, coordinate position M1, . . . ) of the product to be tracked on the x axis and the y axis. Then, the product region specification unit 115 performs clustering and specifies a cluster including the largest number of coordinate positions. Thereafter, the product region specification unit 115 calculates a coordinate position S, based on the center in the cluster, an average value of all coordinate positions in the cluster, or the like. Then, the product region specification unit 115 extracts image data including the coordinate position S from the image data n3, as the image data of the product region to be the determination target of the fraud. Note that a size of the image data to be extracted (size of region) can be preset.
  • Note that the product region specification unit 115 can use a distribution of coordinate positions before the timing when the person has performed the operation for registering the product in the self-checkout machine 50, among all the coordinate positions, not limiting to a distribution of all the coordinate positions of the tracked product. In the example in FIG. 12 , the product region specification unit 115 can use a distribution of the coordinate positions including the coordinate position M, the coordinate position M1, the coordinate position M2, and the coordinate position M3.
  • (Detection of Fraud)
  • Returning to FIG. 3 , the fraud detection unit 116 is a processing unit that specifies an item of a product by inputting the product region related to the product registered in the self-checkout machine 50 into the second machine learning model 105, and detects a fraudulent behavior when the item of the product registered in the self-checkout machine 50 by the person and the item of the product specified using the second machine learning model 105 do not match. That is, in a case where a scanned product is different from a product specified from a video, the fraud detection unit 116 determines that a fraudulent behavior occurs.
  • FIG. 13 is a diagram for explaining specification of a product item. As illustrated in FIG. 13 , image data 20 of a product region specified as the determination target of the fraud by the product region specification unit 115 is input into the image encoder 10I of the CLIP model 10. As a result, the image encoder 10I outputs an embedding vector I1 of the image data 20 of the product region.
  • On the other hand, texts such as “melon”, “rice”, “wine”, and “beer” that have been prepared in advance are input, as a list of class captions, into the text encoder 10T of the CLIP model 10. At this time, the texts “melon”, “rice”, “wine”, and “beer” can be input into the text encoder 10T. However, “Prompt Engineering” can be performed to convert a class caption format at the time of inference into a class caption format at the time of training. For example, it is possible to insert a text corresponding to an attribute of a product, for example, “drink” into a portion of {object} in “photograph of {object}” and makes an input as “photograph of drink”.
  • As a result, the text encoder 10T outputs an embedding vector T1 of the text “melon”, an embedding vector T2 of the text “rice”, an embedding vector T3 of the text “wine”, . . . and an embedding vector TN of the text “beer”.
  • Then, a similarity is calculated between the embedding vector I1 of the image data 20 of the product region, the embedding vector T1 of the text “melon”, the embedding vector T2of the text “rice”, the embedding vector T3 of the text “wine”, and the embedding vector TN of the text “beer”.
  • As illustrated in black and white inverted display in FIG. 13 , in this example, the similarity between the embedding vector I1 of the image data 20 of the product region and the embedding vector T1 of the text “wine” is the largest. Therefore, the CLIP model 10 outputs “wine” as a prediction result of the class of the image data 20 of the product region.
  • Next, the fraud detection unit 116 compares the product item “wine” specified using the second machine learning model 105 in this way and the product item registered in the self-checkout machine 50 and determines whether or not a fraudulent behavior has occurred.
  • FIG. 14 is a diagram for explaining detection of a fraudulent behavior. As illustrated in FIG. 14 , the fraud detection unit 116 specifies the product item “wine” from the video data by the method illustrated in FIG. 13 . On the other hand, the fraud detection unit 116 acquires a product item “banana” registered in the self-checkout machine 50, from the self-checkout machine 50. Then, since the product items do not match, the fraud detection unit 116 determines that a fraudulent behavior has occurred, and notifies the warning control unit 117 of an alarm notification instruction including an identifier of the self-checkout machine 50 or the like.
  • (Alert Notification)
  • The warning control unit 117 is a processing unit that generates an alert and performs alert notification control in a case where the fraud detection unit 116 detects the fraudulent behavior (fraudulent operation). For example, the warning control unit 117 generates an alert indicating that the product registered in the self-checkout machine 50 by the person is abnormal and outputs the alert to the self-checkout machine 50 and the administrator's terminal 60.
  • FIG. 15 is a diagram illustrating an alert display example on the self-checkout machine 50. In FIG. 15 , an alert displayed on the self-checkout machine 50 when banana trick is detected is illustrated. As illustrated in FIG. 15 , an alert window 230 is displayed on a touch panel 51 of the self-checkout machine 50. In this alert window 230, a product item “banana” registered in the cash register through manual input and the product item “wine” specified through image analysis by the second machine learning model 105 (for example, zero-shot image classifier) are displayed in a comparable state. In addition, the alert window 230 can include a notification that prompts to correct and input again. According to such display on the alert window 230, it is possible to warn the user of the detection of the banana trick for registering “banana” in the cash register through manual input, instead of registering “wine” in the cash register through manual input. Therefore, it is possible to urge the user to stop the settlement using the banana trick, and as a result, it is possible to suppress damages of a store caused by using the banana trick. Note that the warning control unit 117 can output content of the alert illustrated in FIG. 15 by voice.
  • Furthermore, the warning control unit 117 turns on a warning light provided in the self-checkout machine 50, displays the identifier of the self-checkout machine 50 and a message indicating a possibility of the occurrence of the fraud on the administrator's terminal 60, or transmits the identifier of the self-checkout machine 50 and a message indicating the occurrence of the fraud and necessity of confirmation to a terminal of a clerk in the store.
  • FIG. 16 is a diagram illustrating an alert display example to a clerk. In FIG. 16 , an alert displayed on a display unit of the administrator's terminal 60 at the time when the banana trick is detected is illustrated. As illustrated in FIG. 16 , an alert window 250 is displayed on the display unit of the administrator's terminal 60. In this alert window 250, a product item “banana” and a price “350 yen” registered in the cash register through manual input, the product item “wine” and a price “4500 yen” specified through image analysis are displayed in a comparable state. Moreover, in the alert window 250, a fraud type “banana trick”, a cash register number “2” where the banana trick occurs, and a predicted damage amount “4150 yen (=4500 yen−350 yen)” caused by the settlement using the banana trick are displayed. In addition, in the alert window 250, graphical user interface (GUI) components 251 to 253 used to receive a request for a face photograph obtained by imaging the face of the user 2 who is using the self-checkout machine 50 having the cash register number “2” or the like, in-store announcement, or a notification to the police or the like, for example, are displayed. According to such display on the alert window 250, it is possible to realize the notification of the occurrence of the damage caused by the banana trick, grasping a damage degree, and presentation of various countermeasures for the damage. Therefore, it is possible to facilitate to take measures against the banana trick by the user 2, and as a result, it is possible to suppress the damage of the store caused by the banana trick.
  • Furthermore, in a case of generating an alert regarding an abnormality in the behavior for registering the product in the self-checkout machine 50, the warning control unit 117 causes the camera 30 included in the self-checkout machine 50 to image the person and stores the image data of the imaged person and the alert in the storage unit in association with each other. In this way, since information regarding a fraudulent person who performs a fraudulent behavior can be collected, the information can be used for various countermeasures to prevent a fraud in advance, for example, by detecting a visitor who has performed a fraudulent behavior at an entrance of the store. Furthermore, the warning control unit 117 generates a machine learning model through supervised learning using the image data of the fraudulent person so as to detect the fraudulent person from the image data of the person who uses the self-checkout machine 50, detect the fraudulent person at the entrance of the store, or the like. Furthermore, the warning control unit 117 can acquire information regarding a credit card of a person who has performed a fraudulent behavior from the self-checkout machine 50 and hold the information.
  • (Settlement Processing)
  • Here, settlement processing of the self-checkout machine 50 will be described. The self-checkout machine 50 receives a checkout of an item of a registered product. The self-checkout machine 50 receives money used for the settlement of the product and pays change. The self-checkout machine 50 may execute the settlement processing using not only cash but also various credit cards, prepaid cards, or the like. Note that, when the alert regarding the abnormality in the behavior for registering the product is issued, the self-checkout machine 50 stops the settlement processing.
  • Furthermore, when receiving registration of an age-restricted product, the self-checkout machine 50 scans user's personal information, and executes settlement processing of the product registered in the self-checkout machine 50, based on the scanned result.
  • There is a case where the self-checkout machine 50 receives registration of an age-restricted product such as alcoholic beverages or cigarettes, as the operation for registering the product. The self-checkout machine 50 identifies the age-restricted product, by scanning a barcode of the product. The self-checkout machine 50 scans a my number card of a user or personal information stored in a terminal having a my number card function and specifies an age of the user from the date of birth. Then, when the age of the user is an age that is an age-restricted product sales target, the self-checkout machine 50 can permit to settle the product to be purchased by the user. On the other hand, when the age of the user is not the age that is the age-restricted product sales target, the self-checkout machine 50 outputs an alert indicating that the registered product cannot be sold. As a result, the self-checkout machine 50 can permit sales of alcoholic beverages, cigarettes, or the like, in consideration of the age restriction of the user.
  • <Flow of Processing of Information Processing Device 100>
  • FIG. 17 is a flowchart illustrating a flow of processing of the information processing device 100. As illustrated in FIG. 17 , the information processing device 100 acquires video data as needed (S101).
  • Subsequently, when being instructed to start fraud detection processing (S102: Yes), the information processing device 100 acquires a frame in the video data (S103), and extracts a region of a product using the first machine learning model 104 (S104).
  • Here, in a case where the detected product is not tracked yet (S105: No), the information processing device 100 starts tracking (S106). On the other hand, in a case where the detected product has been already tracked (S105: Yes) or in a case where tracking is started, the information processing device 100 specifies a coordinate position and holds the coordinate position as time-series data (S107).
  • Here, while continuing tracking (S108: No), the information processing device 100 repeats the processing in and subsequent to S103, and when tracking ends (S108: Yes), the information processing device 100 acquires scan information (scan result) including a scan time and a product item from the self-checkout machine 50 (S109).
  • Subsequently, the information processing device 100 specifies a scan timing, based on the scan information (S110) and specifies a product region to be a fraud behavior determination target based on the scan timing (S111).
  • Then, the information processing device 100 inputs image data of the product region into the second machine learning model 105 and specifies the product item (S112).
  • Here, in a case where the product item in the scan information and the product item specified using the second machine learning model 105 do not match (S113: No), the information processing device 100 notifies of an alert (S114), and in a case where the product items match (S113: Yes), the information processing device 100 ends the processing.
  • <Flow of Processing of Self-checkout Machine 50>
  • FIG. 18 is a flowchart illustrating a flow of processing of the self-checkout machine 50. As illustrated in FIG. 18 , the self-checkout machine 50 identifies an operation for registering a product by a user. Specifically, the self-checkout machine 50 identifies the operation for registering the product, through an operation on a selection screen in which a list of products with no barcode is displayed. Furthermore, the self-checkout machine 50 identifies the operation for registering the product, by scanning a barcode of a product with the barcode (S201). Subsequently, the self-checkout machine 50 specifies a product item and a scan time. Specifically, the self-checkout machine 50 specifies the product item, based on the operation for registering the product. Furthermore, the self-checkout machine 50 specifies a time when the operation for registering the product is identified as the scan time, based on the operation for registering the product (S202). The self-checkout machine 50 transmits the scan information including the product item and the scan time, to the information processing device 100 (S203). Then, the self-checkout machine 50 determines whether or not there is an alert notified from the information processing device 100. In a case of determining that there is the alert, the self-checkout machine 50 proceeds to S205 (S204: Yes). On the other hand, in a case of determining that there is no alert, the self-checkout machine 50 proceeds to S206 (S204: No). The self-checkout machine 50 stops the settlement processing of the product item (S206). The self-checkout machine 50 executes the settlement processing of the product item (S205).
  • <Effects>
  • As described above, the information processing device 100 acquires video data in a predetermined area including an accounting machine in which a person registers a product and inputs the video data into the first machine learning model 104 so as to extract a product region from the video data. The information processing device 100 stores time-series coordinate positions of the extracted product region, specifies a timing when the person performs the operation for registering the product in the self-checkout machine 50, and specifies a product region related to the product registered in the self-checkout machine 50, based on the specified timing of the operation and the time-series coordinate positions. As a result, since the information processing device 100 can specify the region of the product that is a fraud target from the video data, it is possible to recognize the product before the person ends the payment or before the person leaves the store, and it is possible to detect fraud in the self-checkout machine 50.
  • Furthermore, the information processing device 100 specifies an item of the product, by inputting the product region related to the product registered in the self-checkout machine 50 into the second machine learning model 105. When the item of the product registered in the self-checkout machine 50 by the person and the item of the product specified using the second machine learning model 105 do not match, the information processing device 100 generates an alert. Therefore, the information processing device 100 can detect fraud of scanning a barcode of an inexpensive product instead of that of an expensive product.
  • Furthermore, the information processing device 100 specifies the product region to be the fraud determination target, based on the coordinate position immediately before or immediately after the timing when the person performs the operation for registering the product in the self-checkout machine 50, from among the time-series coordinate positions. Therefore, since the information processing device 100 can accuracy specify the held product before and after the timing when the operation for registering the product is performed, the information processing device 100 can improve fraud detection accuracy.
  • Furthermore, the information processing device 100 specifies the product region to be the fraud determination target, from a distribution of the time-series coordinate positions. Therefore, even in a situation where it is difficult to make determination using the image data, for example, since the image data is unclear, the information processing device 100 can accurately specify the held product before and after the timing when the operation for registering the product is performed.
  • Furthermore, the information processing device 100 generates an alert indicating that the product registered in the self-checkout machine 50 by the person is abnormal. Therefore, the information processing device 100 can take measures such as asking circumstances before the person who has performed a fraudulent behavior goes out of the store.
  • Furthermore, in a case where the alert regarding the abnormality in the behavior for registering the product in the self-checkout machine 50 is generated, the information processing device 100 outputs voice or a screen indicating alert content from the self-checkout machine 50 to a person positioned by the self-checkout machine 50. Therefore, even in case of a force majeure mistake or an intentional fraud, the information processing device 100 can directly call attention to the person who is scanning. Therefore, it is possible to reduce mistakes and intentional fraud.
  • Furthermore, when the alert regarding the abnormality in the behavior for registering the product in the self-checkout machine 50 is generated, the information processing device 100 causes the camera of the self-checkout machine 50 to image the person and stores image data of the imaged person and the alert in the storage unit in association with each other. Therefore, since the information processing device 100 can collect and hold information regarding the fraudulent person who performs the fraudulent behavior, the information processing device 100 can use the information for various measures to prevent the fraud in advance, by detecting entrance of the fraudulent person from data captured by a camera that images customers. Furthermore, since the information processing device 100 can acquire and hold credit card information of the person who has performed the fraudulent behavior from the self-checkout machine 50, in a case where the fraudulent behavior is confirmed, it is possible to charge a fee via a credit card company.
  • Second Embodiment
  • Incidentally, while the embodiment of the present disclosure has been described above, the present disclosure may be implemented in a variety of different modes in addition to the embodiment described above.
  • (Numerical Values, etc.)
  • The numbers of self-checkout machines and cameras, numerical examples, training data examples, the number of pieces of training data, machine learning models, each class name, the number of classes, data formats, or the like used in the above embodiments are merely examples and can be arbitrarily changed. In addition, the processing flow described in each flowchart may be appropriately changed in a range without contradiction. Furthermore, for each model, a model generated by various algorithms such as a neural network may be adopted. Furthermore, the shopping basket is an example of a conveyance tool such as a shopping basket or a product cart used to carry a product to be purchased selected by a user in the store to a self-checkout machine, for example.
  • Furthermore, the information processing device 100 can use known techniques such as another machine learning model for detecting a position, object detection techniques, or position detection techniques, for the scan position and the position of the shopping basket. For example, since the information processing device 100 can detect the position of the shopping basket based on a time-series change of the frame that is a difference between the frames (image data), the information processing device 100 may perform detection using the position and generate another model using the position. Furthermore, by designating the size of the shopping basket in advance, in a case where an object having that size is detected from the image data, the information processing device 100 can identify the object as the position of the shopping basket. Note that, since the scan position is a position fixed to some extent, the information processing device 100 can identify a position designated by an administrator or the like as the scan position.
  • (System)
  • Pieces of information including a processing procedure, a control procedure, a specific name, various types of data, and parameters described above or illustrated in the drawings may be optionally changed unless otherwise specified.
  • Furthermore, specific forms of distribution and integration of components of individual devices are not limited to those illustrated in the drawings. For example, the region extraction unit 113 and the coordinate position specification unit 114 may be integrated. That is, all or some of the components may be functionally or physically dispersed or integrated in optional units, depending on various kinds of loads, use situations, or the like. Moreover, all or some of the respective processing functions of the respective devices may be implemented by a central processing unit (CPU) and a program to be analyzed and executed by the CPU, or may be implemented as hardware by wired logic.
  • Moreover, all or some of processing functions individually performed in each device can be implemented by a central processing unit (CPU) and a program analyzed and executed by the CPU, or may be implemented as hardware by wired logic.
  • (Hardware)
  • FIG. 19 is a diagram for explaining a hardware configuration example. Here, the information processing device 100 will be described as an example. As illustrated in FIG. 19 , the information processing device 100 includes a communication device 100 a, a hard disk drive (HDD) 100 b, a memory 100 c, and a processor 100 d. Furthermore, the individual units illustrated in FIG. 19 are mutually coupled by a bus or the like.
  • The communication device 100 a is a network interface card or the like and communicates with another device. The HDD 100 b stores programs for operating the functions illustrated in FIG. 3 and databases (DBs).
  • The processor 100 d reads a program that executes processing similar to the processing of each processing unit illustrated in FIG. 3 from the HDD 100 b or the like, and develops the read program in the memory 100 c to operate a process that executes each function described with reference to FIG. 3 or the like. For example, this process executes a function similar to the function of each processing unit included in the information processing device 100. Specifically, the processor 100 d reads a program having functions similar to those of the machine learning unit 111, the video acquisition unit 112, the region extraction unit 113, the coordinate position specification unit 114, the product region specification unit 115, the fraud detection unit 116, the warning control unit 117, or the like from the HDD 100 b or the like. Then, the processor 100 d executes a process for executing processing similar to those of the machine learning unit 111, the video acquisition unit 112, the region extraction unit 113, the coordinate position specification unit 114, the product region specification unit 115, the fraud detection unit 116, the warning control unit 117, or the like.
  • As described above, the information processing device 100 works as an information processing device that executes an information processing method by reading and executing the program. In addition, the information processing device 100 can also implement functions similar to the functions of the above-described embodiments by reading the program described above from a recording medium by a medium reading device and executing the above read program. Note that other programs mentioned in the embodiments are not limited to being executed by the information processing device 100. For example, the embodiments described above may be similarly applied also to a case where another computer or server executes the program or a case where these computer and server cooperatively execute the program.
  • This program may be distributed via a network such as the Internet. In addition, this program may be recorded in a computer-readable recording medium such as a hard disk, a flexible disk (FD), a compact disc read only memory (CD-ROM), a magneto-optical disk (MO), or a digital versatile disc (DVD) and may be executed by being read from the recording medium by a computer.
  • FIG. 20 is a diagram for explaining a hardware configuration example of the self-checkout machine 50. As illustrated in FIG. 20 , the self-checkout machine 50 includes a communication interface 400 a, an HDD 400 b, a memory 400 c, a processor 400 d, an input device 400 e, and an output device 400 f. Furthermore, the individual units illustrated in FIG. 20 are mutually coupled by a bus or the like.
  • The communication interface 400 a is a network interface card or the like, and communicates with other information processing devices. The HDD 400 b stores a program for operating each function of the self-checkout machine 50 and data.
  • The processor 400 d is a hardware circuit that reads the program that executes processing of each function of the self-checkout machine 50 from the HDD 400 b or the like and develops the read program in the memory 400 c to operate a process that executes each function of the self-checkout machine 50. That is, this process executes a function similar to each processing unit included in the self-checkout machine 50.
  • In this way, the self-checkout machine 50 operates as an information processing device that executes operation control processing by reading and executing the program that executes processing of each function of the self-checkout machine 50. Furthermore, the self-checkout machine 50 can implement each function of the self-checkout machine 50 by reading a program from a recoding medium by a medium reading device and executing the read program. Note that other programs mentioned in the embodiments are not limited to being executed by the self-checkout machine 50. For example, the present embodiment may be similarly applied to a case where another computer or server executes the program, or a case where these computer and server cooperatively execute the program.
  • Furthermore, the program that executes the processing of each function of the self-checkout machine 50 can be distributed via a network such as the Internet. Furthermore, this program can be recorded in a computer-readable recording medium such as a hard disk, an FD, a CD-ROM, an MO, or a DVD, and can be executed by being read from the recording medium by a computer.
  • The input device 400 e detects various input operations by the user, such as an input operation for the program executed by the processor 400 d. The input operation includes, for example, a touch operation or the like. In a case of the touch operation, the self-checkout machine 50 further includes a display unit, and the input operation detected by the input device 400 e may be a touch operation on the display unit. The input device 400 e may be, for example, a button, a touch panel, a proximity sensor, or the like. Furthermore, the input device 400 e reads a barcode. The input device 400 e is, for example, a barcode reader. The barcode reader includes a light source and an optical sensor and scans a barcode.
  • The output device 400 f outputs data output from the program executed by the processor 400 d via an external device coupled to the self-checkout machine 50, for example, an external display device or the like. Note that, in a case where the self-checkout machine 50 includes the display unit, the self-checkout machine 50 does not need to include the output device 400 f.
  • All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.

Claims (13)

What is claimed is:
1. A non-transitory computer-readable storage medium storing an information processing program that causes at least one computer to execute a process, the process comprising:
acquiring video data each image data of which includes a registration machine used to register a product by a user;
extracting, from the acquired video data, image data that include products by specifying a first region that includes a hand of the user, a second region that includes a product, and a relationship between the first region and the second region, for the image data of the acquired video data;
specifying a timing when first information regarding a first product registered to the registration machine by the user;
specifying certain image data of the image data that includes a second product held in the hand of the user within a certain time period from the timing and placed in a place in an angle of view of the video data that is not a place where a product that has been registered to the registration machine is placed for most of the certain time period, based on the first region for the image data, the second region for the image data, and the relationship for the image data;
specifying second information regarding the second product by inputting the certain image data to a machine learning model; and
generating an alert when the first information and the second information do not match.
2. The non-transitory computer-readable storage medium according to claim 1, wherein
the extracting includes extracting by specifying by inputting by inputting the acquired video data to a machine learning model that specifies a plurality of first regions that include a hand of a user, a plurality of second region that include a product, and a relationship between one of the plurality of first regions and one of the plurality of second regions according to an input of video data, for the image data of the input.
3. The non-transitory computer-readable storage medium according to claim 1, wherein
the extracting includes extracting, from the acquired video data, the image data that include the first product held in the hand of the user and moved to one place selected from a place where a product that has been registered in the registration machine is placed and outside of an angle of view of the video data.
4. The non-transitory computer-readable storage medium according to claim 1, wherein
the specifying includes specifying the certain image data that includes the second product based on one selected from average value of coordinates of the second region for the image data and median value of coordinates of the second region for the image data.
5. The non-transitory computer-readable storage medium according to claim 1, wherein
the generating includes notifying a terminal of a clerk of identification information of the registration machine and the generated alert, in association with each other, when an alert regarding an abnormality in a behavior of registering the product in the registration machine is generated.
6. The non-transitory computer-readable storage medium according to claim 1, wherein
the generating the alert includes
in a case where an alert regarding an abnormality in a behavior of registering the product in the registration machine is generated, outputting a voice or a screen with alert content from the registration machine to the user positioned at the registration machine.
7. The non-transitory computer-readable storage medium according to claim 1, wherein
the generating includes:
when an alert regarding an abnormality in a behavior of registering a product in the registration machine is generated, causing a camera included in the registration machine to image the user, and
storing imaged data of the user and the alert in the memory in association with each other.
8. The non-transitory computer-readable storage medium according to claim 1, wherein
the registering the product in the registration machine is a first operation of registering the product selected by the user in the registration machine, based on a selection operation on a selection screen in which an item of a product with no barcode is displayed,
wherein the process further comprising
when the item of the product registered in the registration machine and an item of the product included in the specified product region do not match, notifying of an alert regarding an abnormality of the product registered in the registration machine.
9. The non-transitory computer-readable storage medium according to claim 8, wherein
the specifying includes:
when an item of a product with no barcode is registered in the registration machine based on the first operation, specifying a timing with reference to an operation of registering the item of the product with no barcode into the registration machine, by using a notification from the registration machine via a network, and
specifying a product region of the product from the time-series coordinate positions stored in the memory, based on the specified timing with reference to the operation.
10. The non-transitory computer-readable storage medium according to claim 1, wherein
the operation of registering a product in the registration machine is a second operation of registering an item of a product in the registration machine, by scanning a barcode of a product with the barcode,
wherein the process further comprising
when the item of the product registered in the registration machine and an item of the product included in the specified product region do not match, notifying of an alert regarding an abnormality of the product registered in the registration machine.
11. The non-transitory computer-readable storage medium according to claim 10, wherein
the specifying includes:
when an item of the product with the barcode is registered in the registration machine, specifying a timing with reference to an operation of registering the item of the product with the barcode in the registration machine, by using a notification from the registration machine via a network and
specifying a product region of the product from the time-series coordinate positions stored in the memory, based on the timing with reference to the operation.
12. An information processing method for a computer to execute a process comprising:
acquiring video data each image data of which includes a registration machine used to register a product by a user;
extracting, from the acquired video data, image data that include products by specifying a first region that includes a hand of the user, a second region that includes a product, and a relationship between the first region and the second region, for the image data of the acquired video data;
specifying a timing when first information regarding a first product registered to the registration machine by the user;
specifying certain image data of the image data that includes a second product held in the hand of the user within a certain time period from the timing and placed in a place in an angle of view of the video data that is not a place where a product that has been registered to the registration machine is placed for most of the certain time period, based on the first region for the image data, the second region for the image data, and the relationship for the image data;
specifying second information regarding the second product by inputting the certain image data to a machine learning model; and
generating an alert when the first information and the second information do not match.
13. An information processing device comprising:
one or more memories; and
one or more processors coupled to the one or more memories and the one or more processors configured to:
acquire video data each image data of which includes a registration machine used to register a product by a user,
extract, from the acquired video data, image data that include products by specifying a first region that includes a hand of the user, a second region that includes a product, and a relationship between the first region and the second region, for the image data of the acquired video data,
specify a timing when first information regarding a first product registered to the registration machine by the user,
specify certain image data of the image data that includes a second product held in the hand of the user within a certain time period from the timing and placed in a place in an angle of view of the video data that is not a place where a product that has been registered to the registration machine is placed for most of the certain time period, based on the first region for the image data, the second region for the image data, and the relationship for the image data;
specify second information regarding the second product by inputting the certain image data to a machine learning model, and
generate an alert when the first information and the second information do not match.
US18/532,225 2022-12-23 2023-12-07 Information processing program, information processing method, and information processing device Pending US20240211952A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022207689A JP2024091181A (en) 2022-12-23 2022-12-23 Information processing program, information processing method, and information processing device
JP2022-207689 2022-12-23

Publications (1)

Publication Number Publication Date
US20240211952A1 true US20240211952A1 (en) 2024-06-27

Family

ID=89223487

Family Applications (1)

Application Number Title Priority Date Filing Date
US18/532,225 Pending US20240211952A1 (en) 2022-12-23 2023-12-07 Information processing program, information processing method, and information processing device

Country Status (4)

Country Link
US (1) US20240211952A1 (en)
EP (1) EP4390872A1 (en)
JP (1) JP2024091181A (en)
KR (1) KR20240101455A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240193993A1 (en) * 2022-12-07 2024-06-13 Fujitsu Limited Non-transitory computer-readable recording medium, information processing method, and information processing apparatus
US20240193389A1 (en) * 2022-12-07 2024-06-13 Fujitsu Limited Non-transitory computer-readable recording medium, information processing method, and information processing apparatus

Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030115254A1 (en) * 2000-09-29 2003-06-19 Satoshi Suzuki Information management system using agent
US20030212609A1 (en) * 2002-04-03 2003-11-13 Jeffery Blair Method of facilitating a transaction between a buyer and at least one seller
US20040039659A1 (en) * 2002-08-19 2004-02-26 Nec Corporation Electronic purchasing system and method using mobile terminal and server and terminal apparatus in the system
US20040064401A1 (en) * 2002-09-27 2004-04-01 Capital One Financial Corporation Systems and methods for detecting fraudulent information
US20040260636A1 (en) * 2003-05-28 2004-12-23 Integrated Data Control, Inc. Check image access system
US20060173729A1 (en) * 2005-01-31 2006-08-03 Caleb Clark System and methods for managing a volunteer organization
US20070067189A1 (en) * 2005-09-16 2007-03-22 Numoda Corporation Method and apparatus for screening, enrollment and management of patients in clinical trials
US20100169422A1 (en) * 2001-07-23 2010-07-01 Masayuki Kuwata Information processing system, information processing apparatus, and method
US20110066658A1 (en) * 1999-05-19 2011-03-17 Rhoads Geoffrey B Methods and Devices Employing Content Identifiers
US20120310788A1 (en) * 2010-01-28 2012-12-06 Ripplex Inc. Sales system
US20140006150A1 (en) * 2012-06-27 2014-01-02 United Video Properties, Inc. Systems and methods for targeting advertisements based on product lifetimes
US20140244411A1 (en) * 2013-02-22 2014-08-28 Jong Myoung Kim Method of operating a duty-free store at an airport with a product storage area and product pickup area
US20140289323A1 (en) * 2011-10-14 2014-09-25 Cyber Ai Entertainment Inc. Knowledge-information-processing server system having image recognition system
US20150206257A1 (en) * 2012-07-24 2015-07-23 Nec Corporation Information processing device, data processing method thereof, and program
US20150213459A1 (en) * 2014-01-29 2015-07-30 Farrokh F. Radjy Systems, methods and apparatus for providing a graphical representation of statistical performance and benchmarking data for one or more production facilities in a closed-loop production management system
US20150269692A1 (en) * 2014-03-18 2015-09-24 Jed Ryan Electronic Contract Creator
US20170064014A1 (en) * 2015-08-28 2017-03-02 Sony Interactive Entertainment Inc. Information processing device, event management server, event participation method, and event participation management method
US20170140653A9 (en) * 2010-03-30 2017-05-18 Ns Solutions Corporation Image display system, image display method and program
US20190026562A1 (en) * 2016-01-21 2019-01-24 Nec Corporation Information processing apparatus, control method, and program
US20200059705A1 (en) * 2017-02-28 2020-02-20 Sony Corporation Information processing apparatus, information processing method, and program
US20210056149A1 (en) * 2018-03-16 2021-02-25 Rakuten, Inc. Search system, search method, and program
US20210280027A1 (en) * 2020-03-03 2021-09-09 Beijing Jingdong Shangke Information Technology Co., Ltd. System and method for anti-shoplifting in self-checkout
US20220019769A1 (en) * 2018-12-07 2022-01-20 Nec Corporation Information processing apparatus, information processing method, and program
US20220319033A1 (en) * 2020-01-29 2022-10-06 Rakuten Group, Inc. Object recognition system, position information acquisition method, and program
US20230100920A1 (en) * 2021-09-30 2023-03-30 Fujitsu Limited Non-transitory computer-readable recording medium, notification method, and information processing device
US20230106962A1 (en) * 2020-03-11 2023-04-06 Panasonic Intellectual Property Management Co., Ltd. Skill evaluation device and skill evaluation method
US20230267441A1 (en) * 2022-02-22 2023-08-24 Toshiba Tec Kabushiki Kaisha Payment machine and payment machine method
US20230298004A1 (en) * 2022-03-17 2023-09-21 Toshiba Tec Kabushiki Kaisha Store system, information processing device, and control method
US20240096182A1 (en) * 2021-07-28 2024-03-21 Nec Corporation Action detection system, action detection method, and non-transitory computer-readable medium
US20240193953A1 (en) * 2021-05-06 2024-06-13 Sony Semiconductor Solutions Corporation Information processing method, information processing device, and program
US20240193995A1 (en) * 2022-12-07 2024-06-13 Fujitsu Limited Non-transitory computer-readable recording medium, information processing method, and information processing apparatus
US20240420506A1 (en) * 2022-03-02 2024-12-19 Nec Corporation Motion determination apparatus, motion determination method, and non-transitory computer readable medium
US20240430595A1 (en) * 2023-06-26 2024-12-26 Canon Kabushiki Kaisha Photoelectric conversion device, image processing device, movable apparatus, processing method, and storage medium for generating a mapping parameter

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6474179B2 (en) 2017-07-30 2019-02-27 国立大学法人 奈良先端科学技術大学院大学 Learning data set creation method, and object recognition and position and orientation estimation method
US11120265B2 (en) * 2018-01-31 2021-09-14 Walmart Apollo, Llc Systems and methods for verifying machine-readable label associated with merchandise
JP7680671B2 (en) * 2021-06-07 2025-05-21 富士通株式会社 MOTION DISCRETION PROGRAM, MOTION DISCRETION METHOD, AND MOTION DISCRETION DEVICE

Patent Citations (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110066658A1 (en) * 1999-05-19 2011-03-17 Rhoads Geoffrey B Methods and Devices Employing Content Identifiers
US7373375B2 (en) * 2000-09-29 2008-05-13 Sony Corporation Information management system using agents
US20030115254A1 (en) * 2000-09-29 2003-06-19 Satoshi Suzuki Information management system using agent
US20140123262A1 (en) * 2001-07-23 2014-05-01 Sony Corporation Information processing system, information processing apparatus, and method
US20100169422A1 (en) * 2001-07-23 2010-07-01 Masayuki Kuwata Information processing system, information processing apparatus, and method
US20030212609A1 (en) * 2002-04-03 2003-11-13 Jeffery Blair Method of facilitating a transaction between a buyer and at least one seller
US20040039659A1 (en) * 2002-08-19 2004-02-26 Nec Corporation Electronic purchasing system and method using mobile terminal and server and terminal apparatus in the system
US20040064401A1 (en) * 2002-09-27 2004-04-01 Capital One Financial Corporation Systems and methods for detecting fraudulent information
US20040260636A1 (en) * 2003-05-28 2004-12-23 Integrated Data Control, Inc. Check image access system
US20060173729A1 (en) * 2005-01-31 2006-08-03 Caleb Clark System and methods for managing a volunteer organization
US20070067189A1 (en) * 2005-09-16 2007-03-22 Numoda Corporation Method and apparatus for screening, enrollment and management of patients in clinical trials
US20120310788A1 (en) * 2010-01-28 2012-12-06 Ripplex Inc. Sales system
US20170140653A9 (en) * 2010-03-30 2017-05-18 Ns Solutions Corporation Image display system, image display method and program
US20140289323A1 (en) * 2011-10-14 2014-09-25 Cyber Ai Entertainment Inc. Knowledge-information-processing server system having image recognition system
US20140006150A1 (en) * 2012-06-27 2014-01-02 United Video Properties, Inc. Systems and methods for targeting advertisements based on product lifetimes
US20150206257A1 (en) * 2012-07-24 2015-07-23 Nec Corporation Information processing device, data processing method thereof, and program
US9489702B2 (en) * 2012-07-24 2016-11-08 Nec Corporation Information processing device, data processing method thereof, and program
US20140244411A1 (en) * 2013-02-22 2014-08-28 Jong Myoung Kim Method of operating a duty-free store at an airport with a product storage area and product pickup area
US20150213459A1 (en) * 2014-01-29 2015-07-30 Farrokh F. Radjy Systems, methods and apparatus for providing a graphical representation of statistical performance and benchmarking data for one or more production facilities in a closed-loop production management system
US20150269692A1 (en) * 2014-03-18 2015-09-24 Jed Ryan Electronic Contract Creator
US20170064014A1 (en) * 2015-08-28 2017-03-02 Sony Interactive Entertainment Inc. Information processing device, event management server, event participation method, and event participation management method
US20190026562A1 (en) * 2016-01-21 2019-01-24 Nec Corporation Information processing apparatus, control method, and program
US20200059705A1 (en) * 2017-02-28 2020-02-20 Sony Corporation Information processing apparatus, information processing method, and program
US20210056149A1 (en) * 2018-03-16 2021-02-25 Rakuten, Inc. Search system, search method, and program
US20230368561A1 (en) * 2018-12-07 2023-11-16 Nec Corporation Information processing apparatus, information processing method, and non-transitory computer-readable storage medium
US20240021006A1 (en) * 2018-12-07 2024-01-18 Nec Corporation Information processing apparatus, information processing method, and non-transitory computer-readable storage medium
US20220019769A1 (en) * 2018-12-07 2022-01-20 Nec Corporation Information processing apparatus, information processing method, and program
US20230368559A1 (en) * 2018-12-07 2023-11-16 Nec Corporation Information processing apparatus, information processing method, and non-transitory computer-readable storage medium
US20230368564A1 (en) * 2018-12-07 2023-11-16 Nec Corporation Information processing apparatus, information processing method, and non-transitory computer-readable storage medium
US20230368563A1 (en) * 2018-12-07 2023-11-16 Nec Corporation Information processing apparatus, information processing method, and non-transitory computer-readable storage medium
US20230368562A1 (en) * 2018-12-07 2023-11-16 Nec Corporation Information processing apparatus, information processing method, and non-transitory computer-readable storage medium
US20230368560A1 (en) * 2018-12-07 2023-11-16 Nec Corporation Information processing apparatus, information processing method, and non-transitory computer-readable storage medium
US20220319033A1 (en) * 2020-01-29 2022-10-06 Rakuten Group, Inc. Object recognition system, position information acquisition method, and program
US20210280027A1 (en) * 2020-03-03 2021-09-09 Beijing Jingdong Shangke Information Technology Co., Ltd. System and method for anti-shoplifting in self-checkout
US20230106962A1 (en) * 2020-03-11 2023-04-06 Panasonic Intellectual Property Management Co., Ltd. Skill evaluation device and skill evaluation method
US20240193953A1 (en) * 2021-05-06 2024-06-13 Sony Semiconductor Solutions Corporation Information processing method, information processing device, and program
US20240096182A1 (en) * 2021-07-28 2024-03-21 Nec Corporation Action detection system, action detection method, and non-transitory computer-readable medium
US20230100920A1 (en) * 2021-09-30 2023-03-30 Fujitsu Limited Non-transitory computer-readable recording medium, notification method, and information processing device
US20230267441A1 (en) * 2022-02-22 2023-08-24 Toshiba Tec Kabushiki Kaisha Payment machine and payment machine method
US20240420506A1 (en) * 2022-03-02 2024-12-19 Nec Corporation Motion determination apparatus, motion determination method, and non-transitory computer readable medium
US20230298004A1 (en) * 2022-03-17 2023-09-21 Toshiba Tec Kabushiki Kaisha Store system, information processing device, and control method
US20240193995A1 (en) * 2022-12-07 2024-06-13 Fujitsu Limited Non-transitory computer-readable recording medium, information processing method, and information processing apparatus
US20240430595A1 (en) * 2023-06-26 2024-12-26 Canon Kabushiki Kaisha Photoelectric conversion device, image processing device, movable apparatus, processing method, and storage medium for generating a mapping parameter

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240193993A1 (en) * 2022-12-07 2024-06-13 Fujitsu Limited Non-transitory computer-readable recording medium, information processing method, and information processing apparatus
US20240193389A1 (en) * 2022-12-07 2024-06-13 Fujitsu Limited Non-transitory computer-readable recording medium, information processing method, and information processing apparatus
US12393808B2 (en) * 2022-12-07 2025-08-19 Fujitsu Limited Non-transitory computer-readable recording medium, information processing method, and information processing apparatus

Also Published As

Publication number Publication date
JP2024091181A (en) 2024-07-04
KR20240101455A (en) 2024-07-02
EP4390872A1 (en) 2024-06-26

Similar Documents

Publication Publication Date Title
US10824902B2 (en) Mislabeled product detection
US9299229B2 (en) Detecting primitive events at checkout
US8429016B2 (en) Generating an alert based on absence of a given person in a transaction
US20240193995A1 (en) Non-transitory computer-readable recording medium, information processing method, and information processing apparatus
US20240211952A1 (en) Information processing program, information processing method, and information processing device
US20240220999A1 (en) Item verification systems and methods for retail checkout stands
US20240193993A1 (en) Non-transitory computer-readable recording medium, information processing method, and information processing apparatus
JP2025146684A (en) Method and device for detecting abnormal shopping behavior in a smart shopping cart, and shopping cart
US20230005267A1 (en) Computer-readable recording medium, fraud detection method, and fraud detection apparatus
US10878670B1 (en) Method for protecting product against theft and computer device
KR20240101353A (en) Specific programs, specific methods and information processing devices
US20240193573A1 (en) Storage medium and information processing device
US20240212355A1 (en) Storage medium, alert generation method, and information processing device
US11657400B2 (en) Loss prevention using video analytics
KR20240101349A (en) Alert generation program, alert generation method, and information processing device
US12393808B2 (en) Non-transitory computer-readable recording medium, information processing method, and information processing apparatus
US20240005750A1 (en) Event-triggered capture of item image data and generation and storage of enhanced item identification data
WO2020228437A1 (en) Apparatus and methods for multi-sourced checkout verification
EP4383171A1 (en) Information processing program, information processing method, and information processing apparatus
US20240211920A1 (en) Storage medium, alert generation method, and information processing apparatus
US20250307826A1 (en) Erroneous operation prevention system, erroneous operation prevention method, and computer program product for erroneous operation prevention
US20250278988A1 (en) System and method for shrinkage detection and prevention in self-checkout systems
US20230093938A1 (en) Non-transitory computer-readable recording medium, information processing method, and information processing apparatus
Jurj et al. Mobile application for receipt fraud detection based on optical character recognition
KR20240101358A (en) Data generation program, data generation method, and information processing device

Legal Events

Date Code Title Description
AS Assignment

Owner name: FUJITSU LIMITED, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:OBINATA, YUYA;YAMAMOTO, TAKUMA;UCHIDA, DAISUKE;SIGNING DATES FROM 20231129 TO 20231130;REEL/FRAME:065804/0767

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION COUNTED, NOT YET MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED