WO2019048924A1

WO2019048924A1 - Using augmented reality for image capturing a retail unit

Info

Publication number: WO2019048924A1
Application number: PCT/IB2018/001107
Authority: WO
Inventors: Yair ADATO; Yotam MICHAEL; Nativ LEVI; Aviv EISENSCHTAT; Ziv MHABARY; Dolev POMERANZ; Nir Hemed; Bar FINGERMAN; Maria Kushnir; Alexander BURDEYNYY
Original assignee: Trax Technology Solutions Pte Ltd.
Priority date: 2017-09-06
Filing date: 2018-09-05
Publication date: 2019-03-14

Abstract

Methods, devices, and computer-readable storage media for providing a user with augmented guidance to capture images of products placed on a store shelving unit, the method including using an image sensor of a mobile device to capture and display in real-time a video stream depicting a store shelving unit, augmenting the video stream with a marking identifying an area of the store shelving unit and receiving an image including an area outside the marked area. A plurality of images captured by the image sensor and associated with a first discontinuous area of the store shelving unit with at least two non-overlapping regions and having image resolution higher than an image resolution of the video stream are used to mark the area of the video stream representing first discontinuous area and a second area outside the first discontinuous area. The received image overlaps the first discontinuous area and second area.

Description

USING AUGMENTED REALITY FOR IMAGE CAPTURING A RETAIL UNIT

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional Application No. 62/554,792, filed

September 06, 2017, the entirety of which is hereby incorporated by reference.

TECHNICAL FIELD

[0002] The present disclosure relates generally to systems, methods, and devices for capturing images of store shelving unit, and more specifically to systems, methods, and devices that provide augmented guidance to capture images of products placed on a store shelving unit.

BACKGROUND

[0003] Shopping in stores is a prevalent part of modern daily life. Store owners (also known as "retailers") stock a wide variety of products on store shelves and add associated labels and promotions to the store shelves. Typically, retailers have a set of processes and instructions for organizing the shelves. The source of some of these instructions can be contractual obligations and/or other preferences related to the retailer methodology for placement of products on the store shelves, and for pricing of the products. Nowadays, many retailers have personnel in the store and suppliers send personnel to stores to monitor compliance with the desired product placement plan.

SUMMARY

[0004] Certain embodiments of the present disclosure relate to a method for providing a user with augmented guidance to capture images of products placed on a store shelving unit. The method may include receiving a video stream captured by at least one image sensor of a mobile device, the video stream depicting different areas of the store shelving unit; causing a real-time display of the video stream for enabling the user to select areas of the store shelving unit for image capturing; receiving a plurality of first images captured by the at least one image sensor and associated with a first discontinuous area of the store shelving unit that includes at least two non-overlapping regions, wherein an image resolution of the first plurality of images may be higher than an image resolution of the video stream; analyzing the video stream to identify the first discontinuous area of the store shelving unit; causing a real-time augmented display of the video stream with a marking identifying a second area of the store shelving unit different from the first discontinuous area of the store shelving unit; and receiving at least one second image captured by the at least one image sensor and associated with the second area of the store shelving unit, wherein the at least one second image overlaps with each of the at least two non-overlapping regions of the first discontinuous area.

[0005] In some embodiments, the method may include enabling identification of a first plurality of products associated with at least one product type in the store shelving unit based on the image resolution of the first plurality of images. The method may include analyzing the video stream to identify the second area, wherein the second area includes a second plurality of products associated with at least one product type. In some embodiments, images of the plurality of first images do not overlap with each other. In some embodiments, at least some of the plurality of first images may overlap with each other.

[0006] The method may include monitoring in the video stream changing positions of the first area of the store shelving unit as the mobile device moves relative to the store shelving unit. The method may also include adjusting in real-time positions of the marking to account for the changing positions of the first area of the store shelving unit in the augmented display of the video stream.

[0007] Further, in some embodiments, a marking identifying a second area of the store shelving unit may highlight the second area in a manner distinct form the first discontinuous area of the store shelving unit and distinct from areas in video stream that are not part of the store shelving unit. The marking identifying the second area of the store shelving unit may highlight the first discontinuous area in a manner distinct from the second area of the store shelving unit and distinct from areas in the video stream that are not part of the store shelving unit.

[0008] Additionally, in some embodiments, the method may include causing a real-time augmented display of the video stream with a marking illustrating an area of interest in the store shelving unit for enabling the user to capture images of the area of interest prior to receiving the plurality of first images. The method may include directing the user to a store shelving unit including the area of interest prior to causing a real-time augmented display of the video stream with the marking illustrating the area of interest. The area of interest may comprise an area that may be outside a field of view of a plurality of cameras fixedly-connected to the other store shelving units.

[0009] Still further, in some embodiments, the method may include uploading images associated with the area of interest and images captured by the plurality of cameras to build a three- dimensional store map with information on products in a store. The method may include identifying a first discontinuous area of the store shelving unit by recognizing in the plurality of first images a plurality of regions of the store shelving unit that includes products and have an image quality higher than a selected image quality threshold. The method may also associate the recognized regions as the first discontinuous area. The plurality of regions may include two non-overlapping regions each associated with at least two of the plurality of first images. The first discontinuous area may be associated with less than 95% of a field of view captured by the plurality of first images.

[00010] Still further, in some embodiments, the method may include modifying at least one image of the plurality of first images in accordance with the identified first discontinuous area. The method may also include uploading the at least one modified image to a server for product identification and for monitoring compliance with a desired product placement.

[0001 1] Additionally, in some embodiments, the method may include causing a display of an indicator, in the real-time augmented display of the video stream, informing the user that some of the products depicted in the plurality of first images were not captured in the image quality higher than the selected image quality threshold. The indicator may be configured to guide a user how to improve the image quality.

[00012] Further, in some embodiments, the method may include analyzing the video stream to identify a combined area of the first discontinuous area and the second area. The method may also include causing a real-time augmented display of the video stream with a marking identifying a third area of the store shelving unit different from the combined area of the first discontinuous area and the second area. The method may further include receiving at least one third image captured by the at least one image sensor and associated with the third area of the store shelving unit.

[00013] Still further, in some embodiments, the method may include identifying an overlap area in the at least one second image. The method may also include selecting from the plurality of first images and the at least one second image, image data associated with the overlap area that has better image quality. The method may further include transmitting the selected image data to a server for product identification and for monitoring compliance with the desired product placement.

[00014] Additionally, in some embodiments, the method may include determining that the user is about to start capturing images of a second store shelving unit. The method may also include informing the user that the first store shelving unit includes at least one region for which no images were received.

[00015] Certain embodiments of the present disclosure relate to a device for providing a user with augmented guidance to capture images of products placed on a store shelving unit. The device may include at least one image sensor configured to capture image data from the environment of the user. The device may also include at least one processor configured to receive a video stream captured by the at least one image sensor, the video stream depicting different areas of the store shelving unit; cause a real-time display of the video stream for enabling the user to select areas of the store shelving unit for image capturing; receive a plurality of first images captured by the at least one image sensor and associated with a first discontinuous area of the store shelving unit that includes at least two non-overlapping regions, wherein an image resolution of the first plurality of images may be higher than an image resolution of the video stream; analyze the video stream to identify the first discontinuous area of the store shelving unit; cause a real-time augmented display of the video stream with a marking identifying a second area of the store shelving unit different from the first discontinuous area of the store shelving unit; and receive at least one second image captured by the at least one image sensor and associated with the second area of the store shelving unit, wherein the at least one second image overlaps with each of the at least two non-overlapping regions of the first discontinuous area.

[00016] In some embodiments, the device may include a smartphone screen configured to display the real-time augmented display of the video stream. The device may include a headset configured to project the real-time augmented display of the video stream to an eye of the user. The device may include a transmitter configured to wirelessly upload images to a server for product identification and for monitoring compliance with the desired product placement.

[00017] Additionally, in some embodiments, the device includes a receiver configured to obtain from a server information associated with a plurality of camera fixedly connected to an opposing store shelving unit, and to cause a real-time augmented display of the video stream with markings illustrating areas monitored by the plurality of cameras.

[00018] Certain embodiments of the present disclosure relate to a non-transitory computer readable medium for providing a user with augmented guidance to capture images of product inventory placed on a store shelving unit. The computer readable medium may contain instructions that when executed by a processor can cause the processor to perform operations for receiving a video stream captured by at least one image sensor of a mobile device, the video stream depicting different areas of the store shelving unit; causing a real-time display of the video stream for enabling the user to select areas of the store shelving unit for image capturing; receiving a plurality of first images captured by the at least one image sensor and associated with a first discontinuous area of the store shelving unit that includes at least two non-overlapping regions, wherein an image resolution of the first plurality of images may be higher than an image resolution of the video stream; analyzing the video stream to identify the first discontinuous area of the store shelving unit; causing a real-time augmented display of the video stream with a marking identifying a second area of the store shelving unit different from the first discontinuous area of the store shelving unit; and receiving at least one second image captured by the at least one image sensor and associated with the second area of the store shelving unit, wherein the at least one second image overlaps with each of the at least two non- overlapping regions of the first discontinuous area.

BRIEF DESCRIPTION OF THE DRAWINGS

[00019] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate various disclosed embodiments. In the drawings:

[00020] Fig. 1 is an illustration of an exemplary system for analyzing information collected from a retail store;

[00021] Fig. 2 is a block diagram of exemplary components of systems consistent with the present disclosure;

[00022] Fig. 3 is a schematic illustration of exemplary images, consistent with the present disclosure, depicting a plurality of products on a plurality of store shelves, and a plurality of labels coupled to the store shelves and associated with the plurality of products;

[00023] Fig. 4A is a schematic illustration of exemplary embodiments, consistent with the present disclosure;

[00024] Fig. 4B is a schematic illustration of exemplary embodiments, consistent with the present disclosure;

[00025] Fig. 4C is a schematic illustration of exemplary embodiments, consistent with the present disclosure; [00026] Fig. 5 A is an illustration of an exemplary process of scanning a retail unit, consistent with the present disclosure;

[00027] FIG. 5B and 5C are schematic illustrations of an approach for capturing images from retail establishment using image capturing devices mounted to store shelves, consistent with the present disclosure;

[00028] Fig. 6A is a flowchart of an exemplary method for guided creation of visual representations of a store area using augmented markers over a displayed area of the store, consistent with the present disclosure;

[00029] FIG. 6B is a flowchart of an exemplary method for determining completion of visual representations of a store area, consistent with the present disclosure;

[00030] Fig. 7 is an illustration of exemplary communications between an image processing system and a mobile device, consistent with the present disclosure;

[00031] Fig. 8 is an illustration of an exemplary usage of an image processing system for monitoring contract compliance, consistent with the present disclosure; and

[00032] Fig. 9 is a flowchart of an exemplary method for monitoring compliance with contracts between retailers and suppliers, consistent with the present disclosure.

DETAILED DESCRIPTION

[00033] Reference will now be made in detail to exemplary embodiments implemented according to the present disclosure, the examples of which are illustrated in the accompanying drawings. Wherever possible the same reference numbers will be used throughout the drawings to refer to the same or like parts. The disclosure is not limited to only the described embodiments and examples.

[00034] Reference is now made to Fig. 1 , which shows an example of a system 100 for analyzing information collected from a retail store. In one embodiment, system 100 may be a computer-based system that includes computer system components, desktop computers, workstations, tablets, handheld computing devices, memory devices, and/or internal network(s) connecting the components. System 100 may include or be connected to network computing resources (e.g., servers, routers, switches, network connections, storage devices, etc.) necessary to support the services provided by system 100. In one embodiment, system 100 may be used to indicate shelf label accuracy in a store.

[00035] System 100 may include at least one capturing device 105 that may be associated with user 1 10, a server 1 15 operatively connected to a database 120, and an output unit 125 associated with the retail store. The communication between the components in system 100 components may be facilitated by communications network 130.

[00036] Consistent with the present disclosure, system 100 may analyze image data acquired by capturing device 105 to determine information associated with retail products. The term "capturing device" refers to any device configured to acquire image data and transmit data by wired or wireless transmission. Capturing device 105 may represent any type of device that can capture images of products on a shelf and may be connectable to network 130. In one embodiment, user 1 10 may acquire image data of products on a shelf using capturing device 105. Capturing device 105 may include handheld devices (e.g., a smartphone, a tablet, a mobile station, a personal digital assistant, a laptop, etc.), wearable devices (e.g., smart glasses, a clip-on camera, etc.), etc. In another embodiment, capturing device 105 may be operated remotely or autonomously. Capturing device 105 may include a fixed security camera with communication layers, a dedicated camera fixed to a store shelf, autonomous robotic devices, drones with cameras, etc. Capturing device 105 may capture images depicting a plurality of products on a plurality of store shelves, and a plurality of labels coupled to the store shelves and associated with the plurality of products.

[00037] Capturing device 105 may exchange raw or processed data with server 115 via respective communication links. Server 1 15 may include one or more servers connected by network 130. In one example, server 1 15 may be a cloud server that processes images received from a capturing device (e.g., capturing device 105) and processes the images to identify at least some of the plurality of products in the images based on visual characteristics of the plurality of products. Server 1 15 may also process the received images to determine from labels associated with each of the identified products, a specific product identifier and a specific displayed price. The term "cloud server" refers to a computer platform that provides services via a network, such as the Internet or other network. In another example, server 115 may be part of a system associated with retail store that communicates with a capturing device 105 using a wireless local area network (WLAN) and can provide similar functionality as a cloud server.

[00038] Remote server 1 15 ay be a cloud server that uses virtual machines which may not correspond to individual hardware. Specifically, computational and/or storage capabilities may be implemented by allocating appropriate portions of desirable computation/storage power from a scalable repository, such as a data center or a distributed computing environment. Server 1 15 may implement the exemplary methods described herein using customized hard-wired logic, one or more Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs), firmware and/or program logic, which in combination with the computer system cause server 1 15 to be a special-purpose machine. According to one embodiment, the methods herein are performed by server 115 in response to a processing device executing one or more sequences of one or more instructions contained in a memory device (e.g., database 120). In some embodiments, the memory device may include operating system programs that perform operating system functions when executed by the processing device. By way of example, the operating system programs may include Microsoft Windows™, Unix™, Linux™, Apple™ operating systems, personal digital assistant (PDA) type operating systems, such as Apple iOS, Google Android, Blackberry OS, or other types of operating systems.

[00039] As depicted in Fig. 1 , server 1 15 may be coupled to one or more physical or virtual storages such as database 120. Server 115 may access database 120 to determine product ID numbers associated with each of the identified products, for example through an analysis of product features in the image. Server 1 15 may also access database 120 to determine an accurate price for the identified products. Database 120 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible or non-transitory computer- readable medium. Database 120 may also be part of server 115 or separate from server 115. If database 120 is not part of server 1 15, database 120 and server 1 15 may exchange data via a communication link. Database 120 may include one or more memory devices that store data and instructions used to perform one or more features of the disclosed embodiments. In one embodiment, database 120 may include any suitable database e.g., databases hosted on a work station, large databases distributed among data centers etc. Database 120 may also include any combination of one or more databases controlled by memory controller devices (e.g., servers) or software, such as document management systems, Microsoft SQL databases, SharePoint databases, Oracle™ databases, Sybase™ databases, or other relational databases.

[00040] Consistent with the present disclosure, capturing device 105 and/or server 115 may communicate with output unit 125 to present information derived from processing image data acquired by capturing device 105. For example, server 1 15 may determine a product-label mismatch associated with a first product depicted in the image, wherein the product-label mismatch relates to an incorrect product placement on the shelf. Server 1 15 may also determine a price mismatch associated with a second product depicted in the image, wherein the price mismatch relates to an incorrect price display. Server 1 15 may also determine a product-promotion mismatch associated with a third product depicted in the image, wherein the product-promotion mismatch relates to incorrect data depicted on a promotion sign. A promotion sign may include any type of presentation that includes sales information about specific products. Server 1 15 may, based on the image in which the product-label mismatch, the price mismatch, and/or the product-promotion mismatch are identified, provide electronic notification of any of the one or more mismatches to output unit 125. In one embodiment, output unit 125 may be part of a store manager station for controlling and monitoring different aspects of a store (e.g., updated price list, product inventory, etc.). Output unit 125 may be connected to a desktop computer, a laptop computer, a PDA, etc. In another embodiment, output unit 125 may be incorporated with capturing device 105 such that the information derived from processing image data may be presented on a display of capturing device 105. System 100 may identify all the products in an image in real time. System 100 may add a layer of information on the display of capturing device 105.

[00041] Network 130 facilitates communications and data exchange between capturing device

105, server 1 15, and output unit 125. In one embodiment, network 130 may be any type of network that provides communications, exchanges information, and/or facilitates the exchange of information between network 130 and different elements of system 100. For example, network 130 may be the Internet, a Local Area Network, a cellular network (e.g., 2G, 3G, 4G, 5G, LTE), a public switched telephone network (PSTN), or other suitable connection(s) that enables system 100 to send and receive information between the components of system 100.

[00042] The components and arrangements shown in Fig. 1 are not intended to limit the disclosed embodiments, as the system components used to implement the disclosed processes and features can vary . For example, system 100 may include multiple servers 1 10, and each server 1 15 may host a certain type of service, e.g., a first server that can process images received from capturing device 105 to identify at least some of the plurality of products in the image and to determine from labels associated with each of the identified products, a specific product identifier and a specific displayed price, and a second server that can determine a product-label mismatch, a price mismatch, and a product-promotion mismatch associated with one or more of the identified products.

[00043] Fig. 2 is a diagram of example components of capturing device 105 and server 115.

In one embodiment, both capturing device 105 and server 1 15 includes a bus 200 (or other

communication mechanism) that interconnects subsystems and components for transferring information within capturing device 105 and/or server 1 15. For example, bus 200 may interconnect a processing device 202, a memory interface 204, a network interface 206, and a peripherals interface 208 connected to I/O system 210.

[00044] Processing device 202, shown in Fig. 2, may include at least one processor configured to execute computer programs, applications, methods, processes, or other software to perform embodiments described in the present disclosure. The term "processing device" refers to any physical device having an electric circuit that performs a logic operation. For example, the processing device may include one or more integrated circuits, microchips, microcontrollers, microprocessors, all or part of a central processing unit (CPU), graphics processing unit (GPU), digital signal processor (DSP), field programmable gate array (FPGA), or other circuits suitable for executing instructions or performing logic operations. The processing device may include at least one processor configured to perform functions of the disclosed methods such as a microprocessor manufactured by Intel™ or manufactured by AMD™. The processing device may include a single core or multiple core processors executing parallel processes simultaneously. In one example, the processing device may be a single core processor configured with virtual processing technologies. The processing device may implement virtual machine technologies or other technologies to provide the ability to execute, control, run, manipulate or store multiple software processes, applications, programs, etc. In another example, the processing device may include a multiple-core processor arrangement (e.g., dual, quad core, etc.) configured to provide parallel processing functionalities to allow a device associated with the processing device to execute multiple processes simultaneously. Other types of processor arrangements maybe implemented to provide the capabilities disclosed herein.

[00045] In some embodiments, processing device 202 may use memory interface 204 to access data and a software product stored on a memory device or a non-transitory computer-readable medium. For example, server 115 may use memory interface 204 to access database 120. As used herein, a non- transitory computer-readable storage medium refers to any type of physical memory on which information or data readable by at least one processor can be stored. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, any other optical data storage medium, a RAM, a PROM, and EPROM, a FLASH-EPROM or any other flash memory, NVRAM, a cache, a register, any other memory chip or cartridge, and networked versions of the same. The terms "memory" and "computer-readable storage medium" may refer to multiple structures, such as a plurality of memories or computer-readable storage mediums located within capturing device 105, server 1 15, or at a remote location. Additionally, one or more computer-readable storage mediums can be utilized in implementing a computer- implemented method. The term "computer-readable storage medium" should be understood to include tangible items and exclude carrier waves and transient signals.

[00046] Both capturing device 105 and server 1 15 may include network interface 206 coupled to bus 200. Network interface 206 may provide a two-way data communication to a local network, such as network 130. In Fig. 2 the communication between capturing device 105 and server 1 15 may be represented by a dashed arrow. In one embodiment, network interface 206 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, network interface 206 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. In another embodiment, network interface 206 may include an Ethernet port connected to radio frequency receivers and transmitters and/or optical (e.g., infrared) receivers and transmitters. The specific design and implementation of network interface 206 depends on the communications network(s) over which capturing device 105 and server 1 15 are intended to operate. For example, in some embodiments, capturing device 105 may include network interface 206 designed to operate over a GSM network, a GPRS network, an EDGE network, a Wi-Fi or WiMax network, and a Bluetooth® network. In any such implementation, network interface 206 may be configured to send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

[00047] Both capturing device 105 and server 1 15 may also include peripherals interface 208 coupled to bus 200. Peripherals interface 208 may be connected to sensors, devices, and subsystems to facilitate multiple functionalities. In one embodiment, peripherals interface 208 may be connected to I/O system 210 configured to receive signals or input from devices and providing signals or output to one or more devices that allow data to be received and/or transmitted by capturing device 105 and server 1 15. In one example, I/O system 210 may include a touch screen controller 212, audio controller 214, and/or other input controller(s) 216. Touch screen controller 212 may be coupled to a touch screen 218. Touch screen 218 and touch screen controller 212 can, for example, detect contact, movement or break thereof using any of a plurality of touch sensitivity technologies, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies as well as other proximity sensor arrays or other elements for determining one or more points of contact with the touch screen 218. Touch screen 218 can also, for example, be used to implement virtual or soft buttons and/or a keyboard. While a touch screen 218 is shown in Fig. 2, I/O system 210 may include a display screen (e.g., CRT or LCD) in place of touch screen 218.

[00048] Audio controller 214 may be coupled to a microphone 220 and a speaker 222 to facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording, and telephony functions. The other input controllers) 216 may be coupled to other input/control devices 224, such as one or more buttons, rocker switches, thumb- wheel, infrared port, USB port, and/or a pointer device such as a stylus.

[00049] With regards to capturing device 105, peripherals interface 208 may also be connected to an image sensor 226 for capturing image data. The term "image sensor" refers to a device capable of detecting and converting optical signals in the near-infrared, infrared, visible, and ultraviolet spectrums into electrical signals. The electrical signals may be used to form an image or a video stream (i.e. image data) based on the detected signal. The term "image data" includes any form of data retrieved from optical signals in the near-infrared, infrared, visible, and ultraviolet spectrums.

[00050] Examples of image sensors may include semiconductor charge-coupled devices (CCD), active pixel sensors in complementary metal-oxide-semiconductor (CMOS), or N-type metal-oxide- semiconductor (NMOS, Live MOS). In some cases, image sensor 226 may be part of a camera included in capturing device 105. According to some embodiments, peripherals interface 208 may also be connected to a motion sensor 228, a light sensor 230, and a proximity sensor 232 to facilitate orientation, lighting, and proximity functions. Other sensors (not shown) can also be connected to the peripherals interface 208, such as a temperature sensor, a biometric sensor, or other sensing devices to facilitate related functionalities. In addition, a GPS receiver can also be integrated with, or connected to, capturing device 105. For example, a GPS receiver can be built into mobile telephones, such as smartphone devices. GPS software may allow mobile telephones to use an internal or external GPS receiver (e.g., connecting via a serial port or Bluetooth).

[00051] Consistent with the present disclosure, capturing device 105 may use memory interface 204 to access memory device 234. Memory device 234 may include high-speed random access memory and/or non-volatile memory such as one or more magnetic disk storage devices, one or more optical storage devices, and/or flash memory (e.g., NAND, NOR). Memory device 234 may store an operating system 236, such as DARWIN, RTXC, LINUX, iOS, UNIX, OS X, WINDOWS, or an embedded operating system such as VXWorkS. The operating system 236 can include instructions for handling basic system services and for performing hardware dependent tasks. In some implementations, the operating system 236 can be a kernel (e.g., UNIX kernel).

[00052] Memory device 202 may also store communication instructions 238 to facilitate communicating with one or more additional devices, one or more computers and/or one or more servers. Memory device 234 can include graphical user interface instructions 240 to facilitate graphic user interface processing; sensor processing instructions 242 to facilitate sensor-related processing and functions; phone instructions 244 to facilitate phone-related processes and functions; electronic communications devices 105 messaging instructions 246 to facilitate electronic-messaging related processes and functions; web browsing instructions 248 to facilitate web browsing-related processes and functions; media processing instructions 250 to facilitate media processing-related processes and functions; GPS/navigation instructions 252 to facilitate GPS and navigation-related processes and instructions; capturing instructions 254 to facilitate processes and functions related to image sensor 226; and/or other software instructions 260 to facilitate other processes and functions.

[00053] Memory device 202 may also include application specific instructions 260 to facilitate a process for providing an indication about shelf label accuracy or for monitoring

compliance between retailers and suppliers.

[00054] In some embodiments, capturing device 105 may include software applications having instructions to facilitate connection with server 1 1 5 and/or database 120 and access or use of information about a plurality of products. Graphical user interface instructions 240 may include a software program that enables user 1 10 associated with capturing device 105 to acquire images of an area of interest in a retail establishment. Further, capturing device 105 may include software applications that enable receiving incentives for acquiring images of an area of interest. The process of acquiring images and receiving incentives is described in detail with reference to Fig. 9.

[00055] Any of the above identified instructions and applications may correspond to a set of instructions for performing one or more functions described above. These instructions need not be implemented as separate software programs, procedures, or modules. Memory device 234 may include additional instructions or fewer instructions. Furthermore, various functions of capturing device 105 may be implemented in hardware and/or in software, including in one or more signal processing and/or application specific integrated circuits. For example, capturing device 105 may execute an image processing algorithm to identify products in a received image.

[00056] In one exemplary embodiment, an image processing system (e.g., system 100) may be configured to provide one or more indications about shelf label accuracy in a store. The term "store" refers to any commercial establishment offering products for sale. In some embodiments, a store may include a retail establishment offering products for sale to consumers. A retail establishment may include shelves for display of the products and associated labels with pricing and other product information.

[00057] Fig. 3 illustrates exemplary images depicting a plurality of products on a plurality of store shelves in part of a shelving unit 300, and a plurality of labels coupled to the store shelves and associated with the plurality of products. A capturing device (e.g., capturing device 105) may acquire the images illustrated in Fig. 3. An image processing system (e.g., system 100) may process the images and provide an indication about the shelf label accuracy.

[00058] A processing device (e.g., processing device 202 of capturing device 105 and/or processing device 202 of server 1 15) may process the images captured by the capturing device to identify at least some of the plurality of products in the images, based on visual characteristics of the plurality of the products. For example, the identification may be based on shape, size of bottles and color of fluids within the bottles depicted in Fig. 3. The products may be identified based on a confidence level determined from based on the visual characteristics. For example, in some embodiments a product may be identified if it is determined to be a specific product with a confidence level greater than a threshold of 90%. In other embodiments, the threshold of confidence level for identification of products may be less than or greater than 90%.

[00059] In some conventional barcode scanning techniques, products are required to be scanned one at a time. However, the disclosed image processing systems can simultaneously identify multiple products captured in an image. Simultaneous identification of multiple products can greatly improve the speed of product identification. Further, the simultaneous identification can be used to provide contextual information for product identification, as described in further below. For example, processing device 202 may identify all the products depicted in Fig. 3 except products 305

(corresponding to label B3). The threshold of confidence level for identification may be 95% and products 305 may only be determined with 85% confidence. Processing device 202 may use the determined identity of other products in the image to increase the identification confidence level of products 305 above 95% and thereby identify products 305.

[00060] Processing device 202 can further access a database (e.g., database 120) to determine product ID numbers associated with each of the identified products. In some examples, the determination may be made by analyzing products features in the image. In one example, the determination may be based on comparison of the features of the products in the image with features of a template image stored in a database (e.g., database 120). Specifically, database 120 may store one or more template images associated with each of the known products and corresponding product ID numbers. In another example, the determination may be made by analyzing a visual code placed on the product. Database 120 can be configured to store product ID numbers corresponding to the codes placed on the products. In some embodiments, database 120 may be further configured to store prices corresponding to the products and processing device 202 can further access database 120 to determine an accurate price for the identified products.

[00061 ] Processing device 202 may also process the images to determine a specific product identifier and/or a specific displayed price from labels associated with each of the identified products. For example, processing device 202 may determine a specific product identifier and a specific displayed price included in all the labels (Al, A2, A3, B l , B2, B3, C I , C2, C3, Dl, D2, D3, El , E2, E3, F1, F2, F3) depicted in Fig. 3. [00062] Processing device 202 may also process the images to determine at least one promotion sign associated with at least some of the identified products. For example, processing device 202 may identify a promotion sign P 1 and determine a specific promotion associated with products associated with label C2.

[00063] The disclosed systems (e.g., system 100) may determine product-label, pricing, or product-promotion mismatches based on retrieved information of the identified products, the product information determined from the associated labels, and the data retrieved from promotion signs. In some embodiments, processing device 202 may determine a product-label mismatch associated with an identified product, where the product-label mismatch relates to an incorrect product placement on the shelf or absence of product placement on the shelf. For example, processing device 202 may determine a product-label mismatch based on a comparison of the determined product ID number of identified product 31 1 (of region 310) with the determined product ID numbers of products 312 and 313. In some embodiments, processing device 202 may determine multiple product-label mismatches simultaneously. For example, processing device 202 may determine a second product-label mismatch based on a comparison of the determined product ID number of identified product 315 with the determined product identifier included in associated label C3.

[00064] Processing device 202 may also determine a price mismatch associated with an identified product, where the price mismatch relates to an incorrect price display. For example, processing device 202 may determine a price mismatch based on the determined accurate price of identified products of region 320 (retrieved from database 120, as described above) and determined display price included in associated label E2. In some embodiments, processing device 202 may determine multiple price mismatches simultaneously. For example, processing device 202 may determine a second price-mismatch based on the determined accurate price of identified products of region 325 and determined display price included in associated label Dl.

[00065] Processing device 202 may also determine a product-promotion associated with an identified product, where the product-promotion mismatch relates to incorrect data displayed on promotion sign (e.g., P 1 ) compared to the store database. For example, processing device 202 may determine that promotion sign P 1 indicates an outdated discount, a sale that needs to be updated, and so forth. In some embodiments, processing device 202 may determine multiple product-promotion mismatches simultaneously. For example, processing device 202 may determine a second product- promotion mismatch based on the determined data in a second promotion sign.

[00066] In some embodiments, processing device 202 may also determine one or more product-label mismatches and one or more price mismatches simultaneously. For example, processing device 202 may simultaneously determine product-label mismatches associated with products 31 1 and 315, and price mismatches associated with labels E2 and Dl . In some embodiments, the determination of the product-label mismatch and/or the price mismatch may be performed after identifying more than 50% of the plurality of products in the image based on visual characteristics of the plurality of products. In other embodiments, the determination of the product-label mismatch and/or the price mismatch may be performed after identifying more than 75% or 80% or 90% or 95% of the plurality of products.

Further, the determination of the product-label mismatch and/or the price mismatch may be performed after determining the specific product identifier and the specific displayed price of more than 50% of the labels in the image. In other embodiments, the determination of the product-label mismatch and/or the price mismatch may be performed after identifying more than 75% or 80% or 90% or 95% of the labels in the image.

[00067] Consistent with the present disclosure, processing device 202 may also determine the space between different products in the captured images. In one example, processing device 202 may determine that a product is missing in region 330 and generate an electronic notification about the missing product. In another example, processing device 202 may determine that the arrangement of products in region 335 can be improved. In addition, processing device 202 may aggregate the space between products in different regions of the shelf and generate a report with recommendations for an improved placement of products.

[00068] In another exemplary embodiment, a method for providing a user with augmented guidance to capture images of products placed on a store shelving unit is provided. It should be noted that aspects of the present disclosure in their broadest sense are not limited to a mobile phone based augmented guidance. Rather, it is contemplated that the principles described may be applied to other devices with augmented user interface capabilities to overlay digital objects on physical object surfaces as well. The term augmented guidance refers generally to a display of digital objects overlaid upon real word objects, such as the user interfaces of head-mounted Augmented Reality (AR) devices and mobile phones with AR capabilities, and other variants,, such as smart phones with a flat overlay of a two-dimensional digital layer over objects in a physical environment. The term store shelving unit refers generally to furniture or other surfaces used to organize goods for both storage and display, such as a shelf or table with the goods placed in/on it accessible from one or more sides. First area 402 is digital shading overlaid on shelving unit 300 in a physical environment, as illustrated in FIG. 4A, first area 402 is one example of an augmented guidance in accordance with the present disclosure.

[00069] Providing augmented guidance to capture images in accordance with the present disclosure may include receiving a video stream captured by at least one image sensor of a mobile device, the video stream depicting different areas of the store shelving unit. The term "mobile device" is recognized by those skilled in the art and refers to any mobile device configured to capture and/or analyze images, such as a camera, a wearable camera, a wearable computer, a cell phone, mobile phone, a tablet, and so forth. The mobile device may receive video of the environment as captured by the device. The term image sensor refers generally to a device that aids in capturing digital still and motion images. The term video stream refers to a continuous stream of images or video feed.

[00070] In accordance with the present disclosure, a real-time display of a video-stream may be presented on a display screen. The term real-time display refers to a live display of video stream with minimal delay between capture and display on screen. Fig. 4A, for example, illustrates a view of the store shelf as seen by the individual capturing it. When the user moves the device, the video stream displayed on the screen follows the changes to the position and orientation of the device.

[00071 ] Embodiments of the present disclosure may further receive a plurality of images captured previously by at least one image sensor of the mobile device. In some embodiments, in accordance with the present disclosure, all images of a store shelving unit previously captured may be received. In other embodiments, images overlapping part of the store shelving unit video stream currently being displayed real-time on a display screen may be received.

[00072] In some exemplary embodiments, the plurality of previously captured images may be associated with a discontinuous area of the store shelving unit. The discontinuous area may include at least two non-overlapping regions. The term discontinuous area refers to the section of the displayed store shelving unit which includes disconnected regions if the displayed area not represented by any previously captured images. First area 402, illustrated in Fig 4A, is one example of a discontinuous area of a shelving unit in accordance with the present disclosure. But, the discontinuous area may include three, four, or more non-overlapping regions. . Further, in some embodiments the image resolution of the plurality of previously captured images may be higher than the image resolution of the video stream. The term image resolution of the video stream refers to the pixel resolution per inch of the motion images, which form part of the video stream.

[00073] In some exemplary embodiments, the present disclosure may further analyze the video stream to identify the first discontinuous area of the store shelving unit. Prior to identifying the areas in the video stream which need to be directed to be captured by an user the system 100 may attempt to align the determined discontinuous area formed by previously captured images and results in identifying the portions of the video where there are no prior capture images. Alignment of discontinuous area with the video stream may also aid in updating the discontinuous area in video stream based on the orientation of the camera.

[00074] In accordance with one embodiment, a real-time display of video stream may be augmented with a marking identifying an area of the store shelving unit different from the discontinuous area of the shelving unit. Marking generally refers to a digital overlay of an area on top of a video stream. The marked area may indicate an area of the shelving unit whose images have been previously captured or whose images captured in the past include all or a portion of the bounded area.

[00075] Embodiments of the present disclosure may further receive at least one other image captured by the at least one image sensor, which is associated with the second area of the store shelving. At least one of the newly captured images may overlap with the two non-overlapping regions of the discontinuous area. Second image 434 in Fig. 4A, for example, is an image overlapping two non-overlapping regions 410 and 440 of discontinuous area 402.

[00076] FIG. 4A is another diagrammatic representation of a store shelving unit 300. Fig. 4A depicts areas of the shelving unit, which include images captured in the past, and areas which need new images to be captured in order to have a more complete record of the shelving unit. A capturing device (e.g., capturing device 105) may acquire the images included in areas that have been captured in past in Fig. 3. A processing device 202 of server 1 15 may identify the section of shelving unit 300 represented by the video stream. Processing device 202 may request database 120 of server 1 15 to provide images of part of the same section of shelving unit that have been previously captured. In some embodiments, processing device 202 may request database 120 to provide all the previously captured images of the shelving unit. Processing device 202 may overlay the requested images on the video stream to identify the sections of the shelving unit covered using images. Processing device 202 may store the coordinates of the discontinuous areas representing the shelf instead of constructing the areas of the shelving unit from the previously captured images. For example, coordinates may include the horizontal and/or vertical points of a two dimensional plane parallel to the shelving unit. Coordinates based storage of the discontinuous area may help reduce the amount of data needed to transfer between server and capturing device which may need the information to determine which region of shelving unit needs image capture. Processing device 202 may identify a first discontinuous area 402 comprising regions 410 and 440 and send the coordinates of the regions relative to the video stream coordinates. Capturing device 105 on receiving the coordinates of the first discontinuous area 404 may mark the regions of the discontinuous areas of the video stream with a pattern. In some embodiments, a color or texture may also be used for marking. Capturing device 105 may also mark the second area 420 of the shelving unit displayed in the video stream, which might not have any previously captured images. In some embodiments, first discontinuous area 402 may be beyond the viewable borders of the video stream of the shelving unit. In certain other embodiments, one of the regions of the first area 410 may be outside the bounds of the video stream.

[00077] Capturing device 105 capture second image 434. The second image 434 may overlap the two non-overlapping regions of discontinuous first area 402 in order to stitch the area into one continuous area. In some embodiments, second image 434 may have higher resolution than the marked second area 404. Capturing second image 434 may result in discontinuous first area 410 to be connected in certain part but still connected in other parts.

[00078] Consistent with the present disclosure, processing device 202 may implement a method to acquire a complete image representation of shelving unit 300.

[00079] In one embodiment, the first discontinuous area (e.g., first area 402) may be associated with less than 95% of a field of view captured by the plurality of first images. Alternatively, first discontinuous area 402 may be associated with less than 90%, less than 80%, or between 50% and 90% of a field of view captured by the plurality of first images.

[00080] As shown in Fig. 4B a processing device may construct the first area of the shelving unit from images captured by capturing devices 105. Region 410 of first area 402 may be represented by exemplary images 421-425. In some embodiments, images 421 and 422 may capture the ceiling which is beyond the shelving unit. Images 423 and 424 may capture the floor section of the store. The captured images may overlap partially. Image 425 may be overlapped completely by other images. The captured images might overlap one or more images. For example, image 425 may overlap all other images representing region 410 of first area 402. In some embodiments, region 440 might not extend the complete height of the shelving unit. Images 442 and 444 might not be included in region 440 of first area 402 and still identify the region. Images 442 and 444 may be included in region 440 to improve the resolution of the region.

[00081] Fig. 4C illustrates an embodiment on capturing device 105. In the embodiment, processing device 202 enables visual stitching of images on capturing device 105 by providing real-time feedback to user 110 in the form of augmented content 402 on the camera screen. In addition, processing device 202 may also determine shelf height measurements and shelf depth measurements for better estimation of the number of products. This embodiment is described in greater detail with regard to Fig.

8

[00082] Fig. 5A depicts ten exemplary diagrams illustrating a process for capturing a more complete visual representation of a store shelving unit in a retail environment (e.g., an aisle, a shelf etc.). The visual representation may be from any number of individual images, and a user 1 10 that captures the images may receive in real time feedback about specific areas of the retail environment for which additional images are required and/or feedback about specific areas of the retail environment for which additional images are not required. In one embodiment, user 1 10 may receive the feedback in the form of augmented content overlaid on the camera screen of capturing device 105.

[00083] In diagram A 1 of Fig. 5 A, user 1 10 may point capturing device 105 toward a retail shelving unit, takes a picture, and starts moving capturing device 105 from right to left towards a different part of the retail shelving unit. Diagram A2 shows two areas on the camera screen. The first area (illustrated as dotted gray) is the area that was previously captured and the second area (illustrated as darker gray) is the area that was not yet captured. As shown in diagram A3, user 1 10 may use this information to aim capturing device 105 in a manner that minimizes a repetitive or redundant capturing of areas. In diagram A4, all the area is marked as the first area illustrating the camera screen right after an image was captured. Diagram A5 shows the retail shelving unit from a distance. The marked first and second areas assist user 1 10 to quickly identify where additional image capturing is required. In diagram A6, capturing device 105 is pointed partially to the floor. Processing device 202 may identify that the floor near the retail unit is not part of the retail unit and therefore does not mark it as a first or second area. Similarly, in diagram A7, Processing device 202 identifies an upper part (e.g., ceiling) of the retail environment and avoids marking this region also. In diagrams A8 and A9, user 1 10 may use the overlaid visual identification of the first area and the second area to patch the missing parts to obtain a more complete visual representation of the retail unit. Again, the visual identification of the first area and the second area enables user 110 to quickly identify the missing areas. Diagram A10 depicts another embodiment wherein stitching lines between different images are shown in the camera screen of capturing device 105.

[00084] In some exemplary embodiments, the area of interest may be an area outside of the field of view of a plurality of cameras fixedly-connected to other store shelving unit. For example, the area of interest may be an area that needs manual capture of images because they fall outside the field of view of the fixed camera.

[00085] Embodiments of the present disclosure may further obtain from a server (e.g., server

1 15) information associated with a plurality of cameras fixedly connected to an opposing store shelving unit. The information refers to the metadata about the camera hardware and the images captured by the cameras. Camera metadata includes field of view, resolution and other hardware capabilities and captured images metadata may include resolution of the images captured by the camera, lighting conditions with the aid of histograms among other captured images properties. The processing device 202 receiving this information can make a better prior analysis of where exactly there is need for a manual capture of images and where future images to be captured by the fixed cameras may fix the image quality issues identified.

[00086] In some exemplary embodiments, the mobile device (e.g., capturing device 105) may further display in a real-time video stream captured by the image sensor of the mobile device augmented with markings illustrating areas monitored by the plurality camera fixedly connected to the opposing shelving unit (e.g., shelving unit 300). The fixed cameras and their fixed field of view may always capture images of a certain portions of the shelving unit 300 and may help in knowing a discontinuous area of the shelving unit the fixed cameras have the capacity to capture. This prior information can be used to mark the video stream seen by a user who may then manually capture images for the area not reachable by fixed cameras. In some circumstances the prior marked discontinuous area may be larger in area than the area covered by the images actually captured by the fixed cameras. The discrepancy may occur due to poorer quality images (lacking focus, obstacles, lesser resolution, lack of light, etc.) and may need user to capture images manually for a larger area than the pre-determined area.

[00087] Fig. 5B is an illustration of a store aisle 400 with a plurality of fixedly mounted capturing devices 105, consistent with some embodiments. As depicted in Fig. 5B, one side of aisle 500 may include a plurality of capturing devices 105 fixedly mounted and oriented such that they capture images of an opposing side of isle 500. The plurality of capturing devices 105 may be connected to an associated mobile power source (e.g., one or more batteries) or to an external power supply (e.g., a power grid). As depicted in Fig. 5B, the plurality of camera may be placed at different heights and a least their vertical field of view may be adjustable. Additionally, their horizontal field of view may also be adjustable. Generally, both sides of aisle 500 may include capturing devices 105 in order to cover more areas of the retail establishment.

[00088] FIG. 5C is a top-view of an exemplary retail establishment with a plurality of fixedly mounted capturing devices 105, consistent with some embodiments. As depicted in Fig. 5C, various numbers of capturing devices 105 may be used to cover shelving units 300. In addition, there may be an overlap region in the horizontal field of views of some capturing devices 105. The overlap region may be used to determine inventory changes in a store itself by comparing the previous covering overlap region and the newly captured image. According to one embodiment, each capturing device 105 in a retail establishment may be connected to a server 115 via a single WLAN. Network interface 206 may transmit information associated with a plurality of images captured by the plurality of capturing devices 105 for determining if a disparity exists between at least one contractual obligation and product placement as reflected by the plurality of images. The information may be transmitted as raw images, cropped images, or processed data about products in the images. Network interface 206 may also transmit information identifying the location of the plurality capturing decides 105 in the retail establishment. [00089] In some embodiments, the fixedly-connected capturing devices 105 might have some degree of motion. In which case the area of interest may be the region of the shelving unit 300 not monitored by the capturing devices. The capturing devices 105 may be drones with a camera allowing for complete more capture of shelving unit 300. In such cases, the area of interest with missing image data may be blocked during a patrol of the capturing device. It might be also area which needs an image recapture due to the low quality (e.g., resolution) of previous images bad lighting conditions, or simply an area with image data that requiring updating for any reason

[00090] Fig. 6A, depicts an exemplary method 600 for guided creation of a visual representation of a store area using augmented markers over a displayed area of the store. In one embodiment, method 600 may be performed by system 100. In the following description, reference is made to certain components of system 100 for purposes of illustration. It will be appreciated, however, that other implementations are possible and that other components may be utilized to implement the exemplary method 600. It will be readily appreciated that the illustrated method may be altered to modify the order of steps, delete steps, or further include additional steps.

[00091] At step 602, the processing device (e.g., processing device 202) may receive a video stream captured by at least one image sensor (e.g., image sensor 226) of a mobile device (e.g., mobile device 1 15), the video stream depicting different areas of the store shelving unit (e.g., shelving unit 300). The processing device 202 may receive a video stream of an area of the store (e.g., shelving unit 300) captured by the image sensor 226. In some embodiments, the area of the store may be pre- determined and requested by the system 100 to be image captured using capturing device 105. The required images for capture may be in a fixed set of regions of the area of the store that an automated system cannot capture on its own. The automated system could be a fixed set of cameras mounted and facing an aisle of a store or a drone hovering over an aisle of a store.

[00092] At step 604, the processing device (e.g., processing device 202) may cause real-time display of a video stream for enabling a user to select areas of the store shelving unit for image capturing. Consistent with present disclosure, capturing device 105 may display in real-time the video stream of the area of the store (e.g., shelving unit 300) captured using the image sensor 226 on a touch screen 218. The displayed video stream may aid in providing a live feedback of the coverage of the captured images of a store area and help guide the user in identifying a next set of regions whose images may need to be taken by the user in order to achieve a more complete visual representation of shelving unit or the whole store.

[00093] At step 606, the processing device may receive a plurality of first images captured by the at least one image sensor (e.g., image sensor 226) and associated with a first discontinuous area (e.g., first area 402) of the store shelving unit (e.g., shelving unit 300) that includes at least two non- overlapping regions (e.g. 410 and region 440). In one embodiment, the image resolution of the first plurality of images may be higher than an image resolution of the video stream. The image resolution of the first plurality of images may enable identification of a first plurality of products associated with at least one product type in the store shelving unit. For example, the image resolution of the first plurality of images may enable identification more than fifty Coca Cola 500 ml bottles.

[00094] Processing device 202 may receive a set of first images identifying the non- overlapping regions of the shelving unit 300. Each non-overlapping region may include one or more images, which may overlap within the region. The non-overlapping regions may form a discontinuous area representing the area of the shelving unit. For example, previously captured images 421 -425 of region 410 and images 441-444 of region 440 may form the discontinuous first area 402 in the shelving unit 300 where the user 1 10 intends to or is directed to capture images.

[00095] At step 608, the processing device (e.g., processing device 202) may analyze the video stream to identify the first discontinuous area (e.g., first area 402) of the store shelving unit (e.g., shelving unit 300). In some embodiments, identifying the first discontinuous area may further include recognizing in the plurality of first images a plurality of regions of the store shelving unit that include products and/or have an image quality higher than a selected image quality threshold. Identifying the first discontinuous area may further include associating the recognized regions identify with the first discontinuous area. The plurality of regions may include two non-overlapping regions, each associated with at least two of the plurality of first images. In response to the identification of the first discontinuous area, the processing device may cause a display of an indicator, in the real-time augmented display of the video stream, informing the user that some of the products depicted in the in the plurality of first images were not captured at a required image quality (for example, in an image quality lower than the selected image quality threshold). The indicator may be further configured to guide a user on how to improve the image quality. [00096] The processing device 202 may identify the discontinuous area in the video stream.

In some embodiments, the discontinuous area may include multiple regions beyond the boundaries of the area of the shelving unit 300 that user 1 10 intends to capture images. Any motion detected by motion sensor 228 of capturing device 105 may result in adjustment of the discontinuous area to match the angle of the capturing device 105 relative to the shelving unit 300 and/or to match the distance between the capturing device 105 and shelving unit 300. In some cases, in response to such adjustments, current images of the area being captured may be received to reconstruct the discontinuous area again.

[00097] In some embodiments, processing device (e.g., processing device 202) may include images of a region in the first discontinuous area if the resolution of the region represented by the images is greater than the video stream resolution. For example, image 444 of region 440 of first area 402 may be of lower resolution than the threshold resolution and thus might not be included in the region 440. Similarly, image 442 of region 440 might also not be included thus resulting in region 440 of first area 404 to not be considered beyond the boundaries of images 441 and 443 when combined. In some embodiments, regions of the area being captured may be indicated as requiring recapture because the resolution of the combined set of images is lower than the video stream. In some embodiments, the marked region requiring image capture may also indicate the reason for the image capture. For example, the marker may indicate a lack of focus in a previously captured image or an obstacle blocking the view in previously captured images. The indication may be in text or color, or in a texture or a combination of all of them.

[00098] In some embodiments, processing device (e.g., processing device 202) may analyze images to determine which image and/or regions of the image has sufficient image quality (for example, having image quality higher than a selected image quality threshold). In some cases, the images and/or regions of images that are determined to have sufficient image quality may be included in the first discontinuous area. For example, a product recognition algorithm may be used to identify products depicted in the analyzed images, and images and/or image regions associated with successful product recognition results may be identified as having sufficient image quality. In some examples, successful product recognition may comprise product recognition with confidence levels (for example, as provided by the product recognition algorithm) higher than a selected threshold, product recognition result with sufficient product details (such as brand, label information, size, price, etc.), and so forth. In some examples, product recognition may comprise recognition of a plurality of products, and successful product recognition may be determined based on the distribution of the confidence levels associated with the recognized products. For example, successful product recognition may correspond to a distribution of confidence levels with a mean value higher than a selected threshold, with a median value higher than a selected threshold, with a variance lower than a selected threshold, with an entropy higher than a selected threshold, any combination of the above, and so forth. In some examples, product recognition may comprise recognition of a plurality of products of a group of products detected in the image by a product detection algorithm, and successful product recognition may comprise successful product recognition of at least a selected number and/or a selected ratio of products of the group of detected products.

[00099] At step 610, the processing device (e.g., processing device 202) may cause a realtime augmented display of the video stream with a marking identifying a second area (e.g., second area 404) of the store shelving unit (e.g., shelving unit 300) different from the first discontinuous area (e.g., first area 402) of the store shelving unit. In some embodiments, the marking identifying the second area (e.g., second area 404) of the store shelving unit may highlight the second area in a manner distinct from the first discontinuous area of the store shelving unit. The display may distinguish the second area from the areas in the video stream that are not part of the store shelving unit (e.g., ceiling 452, floor 454). In another embodiment, the marking identifying the first discontinuous area of the store may be distinct from the second area of the store shelving unit The display may have distinct markings identifying first discontinuous area from areas in the video stream that are not part of the store shelving unit.

[000100] Consistent with the present disclosure, the processing device 202 can mark in the video stream the discontinuous area in need of new images. In some embodiments, the region between the discontinuous area may be marked different from the discontinuous area. There may be more than one region separating regions of discontinuous area.

[000101] At step 612, the processing device (e.g., processing device 202) may receive at least one second image (e.g., second image 434) captured by the at least one image sensor (e.g., image sensor 226) and associated with the second area (e.g., second area 404) of the store shelving unit (e.g., shelving unit 300). In one embodiment, the at least one second image may overlap with each of the at least two non-overlapping regions (e.g., the first area region 410 and first area region 440) of discontinuous area (e.g., first area 402). At least two processing device may receive an image overlapping two or more regions of discontinuous area. User 1 10 may select one of the regions, augmented with a marker on the video stream, in need of images for an image to be captured. For example, processing device 202 of capturing device 105 may receive a second image 434 overlapping regions 410 and 440 of discontinuous area 402. The image may be captured by selecting second area 404 marked and displayed on touch screen 218 of capturing device 105. Capturing device may transmit the captured image to server 1 15 over network 130 to store in database 120. In some embodiments, the captured second image 434 may be sent directly to database 120.

[000102] Reference is now made to Fig. 6B, which illustrates an exemplary method for generating a more complete visual representation of a store area. In one embodiment, the steps of method 650 may be performed by system 100. In the following description, reference is made to certain components of system 100 for purposes of illustration. It will be appreciated, however, that other implementations are possible and that other components may be utilized to implement the exemplary method. It will be readily appreciated that the illustrated method can be altered to modify the order of steps, delete steps, or further include additional steps.

[000103] At step 652, the processing device may analyze the video stream to identify a combined area of the first discontinuous areas and the second area. The processing device (e.g., processing device 202 of server 1 15) may combine the image captured using the capturing device 105 with previously captured images. For example, second image 434 may be combined with images 421- 425 and 441-444 of first area 404. In some embodiments, the combined image may be stored in a database or other electronic storage device. For example, combining images of the first discontinuous areas and the second area may comprise stitching the images, image matting of the images, and so forth.

[000104] At step 654, the processing device may determine that the user is about to start capturing images of a second store shelving unit. The processing device may further inform the user that the first store shelving unit includes regions of the area of the shelving unit which need images to be captured. Previously captured images might not be sufficient to fully address or cover the identified region of interest. For example, second image 434 may connect regions 410 and 440 of first area 402 but still doesn't fully cover completely the shelving unit 300.

[000105] If the answer in step 654 is no, then process 650 has captured the required images needed for visual representation of the area of the shelving unit. And process 650 may exit.

[000106] If the answer in step 654 is yes, then process 650 may inform the user of system 100 the need for more images and jump back to step 602 of process 600 to show a video stream of the shelving unit and mark any potential areas in need of images. In some embodiments, when system 100 detects a user moving away from the shelving unit, similar check may be performed to make sure the visual representation of the shelving unit is up to date. If the visual representation requires updating, process 650 may inform the user of the regions of the area of shelving unit in need of images and jump back to step 602.

[000107] On completion of process 600 and 650 the images stored in the database 120 may be used to building a three-dimensional map of the store. In some embodiments, the map may also include product information aiding the customer with the location or a store manager with available inventory on the shelves.

[000108] Reference is now made to Fig. 7, which illustrates exemplary communications between an image processing system and a mobile device of a user in proximity to or within the retail establishment, consistent with the present disclosure. In some embodiments, the mobile device may direct the user to a store shelving unit including the area of interest. The user may be directed to a store shelving unit including the area of interest prior to cause a real-time augmented display of the video stream with the marking illustrating the area of interest.

[000109] Consistent with the present disclosure, processing device 202 may provide a request

711 to a detected mobile device for an updated image of the area of interest. Request 711 may include an incentive (e.g., $2 discount) to the customer for acquiring the image. The request can be a text message appearing prior to showing an augmented display or can be shown as part of and augmented display requesting to capturing a newer area of interest after capturing a current area of interest. The request may include an augmented display with a text base description of the location. Based on the proximity of the area of interest to the position of the capturing device 115, the augmented display may directly augment the area of interest with flag to print of the area. In response to one of these forms of requests, a customer/user may acquire an updated image 721 of an area of interest.

[0001 10] The processing device may be configured to receive a plurality of images (e.g., image 721 ), which include a marked area of interest 722 from a plurality of mobile devices. The received image may include video showing shelves in multiple aisles. The image processing system may use an interface where the acquired image may be automatically sent to the server (e.g., server 1 15) without any further user intervention.

[0001 1 1 ] This may be used to prevent users from editing the images or prevent fraud, for example where a certain product manufacturer product is not placed in right amount or not placed at an optimal eye level. After receiving an image (e.g., image 721) from a mobile device, processing device 202 may transmit an incentive to the user of mobile device. The incentive may comprise a text notification and a redeemable coupon, such as, for example, a text notification 731 thanking the user with a coupon 732 redeemable by the user using the mobile device. In some embodiments, the incentive may include a redeemable coupon for a product associated with the area of interest.

[0001 12] Further, the processing device (e.g. processing device 202) may be configured to select one or all of the images of the area of interest from the plurality of received images. Processing device 202 may be configured to select a group of images that follows predetermined criteria, for example, a specific timeframe, quality of image, distance from shelf to the capturing device, lighting during image acquisition, sharpness of the image etc. In some embodiments, one or more of the selected images may include a panoramic image.

[0001 13] The processing device (e.g., processing device 202) may be configured to analyze the selected image(s) to derive image-related data. For cases where two or more images are selected, processing device 202 may generate image-related data based on aggregated data from the two or more images.

[0001 14] Reference is now made to Fig. 8, which illustrates exemplary usage of an image processing system (e.g., system 100) for monitoring contract compliance, consistent with the present disclosure. Processing device 202 may receive a plurality of images 81 1 depicting a plurality of differing products corresponding to sections of a shelf needing more image capture. Processing device 202 may be configured to differentiate the differing products from each other through an identification of unique identifiers in the image, for example, set 531 of symbols found in associated labels. The unique identifiers may be determined through recognizing a graphic feature or a text feature extracted from an image object representative of the at least one product. Processing device 202 may be further configured to calculate one or more parameters (e.g., key performance indicators) associated with the shelf. Processing device 202 may also be configured to determine stock keeping units (SKUs) for the plurality of differing products based on the unique identifiers (other than S U bar codes) in the image. Processing device 202 may further determine a number of products 821 associated with each detemiined unique identifier. In some embodiments, processing device 202 can further be configured to calculate a shelf share for each of the plurality of products. The shelf share may be calculated by dividing an aggregated number of products associated with the one or more predetermined unique identifiers by a total number of products.

[0001 15] The processing device may modify at least one image of the plurality of first images in accordance with the identified first discontinuous area. In some embodiments of the present disclosure, the processing device may upload the at least one modified image to a server for product identification and for monitoring compliance with a desired product placement.

[0001 16] Additionally, the processing device (e.g. , processing device 202) may be configured to compare the image-related data with contract-related data to determine if a disparity exists between a contractual obligation and the placement of products in the area of interest. Processing device 202 may compare the shelf share calculated from received images (as described above) with the contracted shelf share required by an agreement between the manufacturer and a store that owns the retailer shelf. Processing device 202 may also compare the display location of products in received images with a contractual obligation regarding display locations. Processing device 202 may further generate a compliance report based on the comparison.

[0001 17] Further, the processing device (e.g., processing device 202) may be configured to generate a notification if a disparity is determined to exist based on the comparison of the image- related data with the contract-related data. Processing device 202 may also generate a notification based on a comparison of the calculated shelf share with a contracted shelf share. The notification may identify products that are misplaced on the shelf. For example, the processing device may highlight shelf region 831 and indicate that the products within shelf region 831 are misplaced. The notification can also identify that a contractual obligation for shelf space by one of the plurality of products is not met.

[0001 18] Embodiments of the present disclosure, the processing device (e.g., processing device 202) may identify an overlap area between a newly captured image (e.g., second image 434) and previously captured images. In some embodiments of the present disclosure, the processing device may select between a newly captured image and a previously captured image, for example where the overlap area image data is of better quality. Embodiments of the present disclosure may further transmit the selected images to a server (e.g., server 1 15) for product identification and for monitoring compliance with the desired product placement.

[0001 19] Reference is now made to Fig. 9, which depicts an exemplary method 900 for monitoring compliance with contracts between retailers and suppliers, consistent with the present disclosure. In one embodiment, the steps of method 900 may be performed by system 100. In the following description, reference is made to certain components of system 100 for purposes of illustration. It will be appreciated, however, that other implementations are possible and that other components may be utilized to implement the exemplary method. It will be readily appreciated that the illustrated method can be altered to modify the order of steps, delete steps, or further include additional steps.

[000120] At step 902, a processing device (e.g., processing device 202) may identify an area of interest in a retail establishment using contract-related data in a database (e.g., database 120). The contract-related data may include product location requirements, shelf share requirements, a planogram, etc. In some embodiments, the processing device may identify an area of interest based upon data received from a supplier or the head office of the supplier. The processing device may also identify an area of interest based upon the time duration from a previous image being larger than a threshold time duration.

[000121] At step 904, the processing device may detect a plurality of mobile devices in proximity to or within the retail establishment. The detection may include mobile devices of known customers of the retail establishment. The known customers may include customers having an application of the retail establishment on their mobile devices. The application may enable image capture of a section of the retail establishment as described in greater detail with reference to Figs. 5 and 6. [000122] At step 906, the processing device may provide to each of the detected plurality of mobile devices a request for an updated image of the area of interest. In some embodiments, the processing device may transmit requests based on specific location information. As an example, the processing device may first transmit requests to customer mobile devices that are determined to be within the retail establishment or in the parking lot of the retail establishment. Based on the feedback from the customers, the processing device may either not transmit additional requests or transmit further requests, e.g., to customer mobile devices detected to be within a five mile radius of the retail establishment or other distance.

[000123] The transmitted request may include an incentive to the customer. For example, request 71 1 may include a $2 discount incentive to the customer for acquiring the image. In response to the request, a customer may acquire an updated image 721 of an area of interest. In some embodiments, the incentive may be based on the number of detected mobile devices. For example, the processing device may offer a smaller incentive if a large number of mobile devices is detected in proximity to the area of interest. The processing device may offer a larger incentive if a very small number of mobile devices is detected in proximity to the area of interest. In some embodiments, the incentive may be based on the time duration from a previous image of the area of interest. For example, the processing device may offer a larger incentive if the time duration from a previous image of the area of interest is very long. The processing device may offer a smaller incentive if the time duration from a previous image of the area of interest is short. In some embodiments, the incentive can be based on an urgency level of an image request from supplier. For example, the processing device may offer a larger incentive if the image request is marked urgent. The processing device may offer a smaller incentive if the image request is marker as normal priority.

[000124] At step 908, the processing device may receive a plurality of images (e.g., image

721) of the area of interest from a plurality of mobile devices. The received image may include video containing shelves in multiple bays. After receiving an image from a mobile device, the processing device may transmit the incentive to the mobile device. The incentive may comprise a text notification and a redeemable coupon. For example, the incentive may include a text notification 731 thanking a customer and a coupon 732 redeemable by the customer using the mobile device.

[000125] At step 910, the processing device may select one, a group, or all of the images of the area of interest from the plurality of received images. In one embodiment, the processing device may select a group of images that follows predetermined criteria, for example, a specific timeframe, quality of the image, distance from shelf of the capturing device, lighting during image acquisition, sharpness of the image etc. For example, the processing device may analyze the plurality of received images to determine which image and/or regions of the image has sufficient image quality (as described above), and select images or regions of images with sufficient image quality. In some embodiments, one or more of the selected images may include a panoramic image. In another embodiment, the processing device may generate a panoramic image from the selected group of images.

[000126] At step 912, the processing device may combine the selected images by analyzing the selected images to derive image-related data. For cases where two or more images are selected, the processing device may generate image-related data based on an aggregation of data from the two or more images. The processing device may differentiate the differing products in the received images through an identification of unique identifiers (or code in labels). The processing device may further calculate one or more analytics (e.g., key performance indicators) associated with the shelf. The processing device can also determine SKUs for the plurality of differing products based on the unique identifiers in the image. The processing device may further calculate a shelf share for each of the plurality of products.

[000127] At step 914, the processing device may compare the image-related data with contract-related data to determine if a disparity exists between a contractual obligation and the current placement of products in the area of interest. The processing device can also compare the shelf share calculated from received images with a contracted shelf share required by an agreement between the manufacturer and a store that owns the retailer shelf. The processing device may further compare the display location of products in received images with a contractual obligation regarding display locations. In some embodiments, the processing device may generate a compliance report based on the comparisons.

[000128] Various operations or functions are described herein, which may be implemented or defined as software code or instructions. Such content may be directly executable ("object" or

"executable" form), source code, or difference code ("delta" or "patch" code). Software implementations of the embodiments described herein may be provided via an article of manufacture with the code or instructions stored thereon, or via a method of operating a communication interface to send data via the communication interface. A machine readable or computer readable storage medium may cause a machine to perform the functions or operations described and includes any mechanism that stores information in a form accessible by a machine (e.g., computing device, electronic system, and the like), such as recordable/non-recordable media (e.g., read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, and the like). A communication interface includes any mechanism that interfaces to any of a hardwired, wireless, optical, or similar, medium to communicate to another device, such as a memory bus interface, a processor bus interface, an Internet connection, a disk controller, and the like. The communication interface may be configured by providing configuration parameters and/or sending signals to prepare the communication interface to provide a data signal describing the software content. The communication interface may be accessed via one or more commands or signals sent to the communication interface.

[000129] The present disclosure also relates to a system for performing the operations herein. The system may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CDROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.

[000130] Embodiments of the present disclosure may be implemented with computer executable instructions. The computer-executable instructions may be organized into one or more computer- executable components or modules. Aspects of the disclosure may be implemented with any number and organization of such components or modules. For example, aspects of the disclosure are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other embodiments may include different computer-executable instructions or components having more or less functionality than illustrated and described herein.

[000131] Computer programs based on the written description and methods of this specification are within the skill of a software developer. The various programs or program modules may be created using a variety of programming techniques. For example, program sections or program modules may be designed by means of JavaScript, Scala, python, Java, C, C++, assembly language, or any such programming languages, as well as data encoding languages (such as XML, JSON, etc.), query languages (such as SQL), presentation-related languages (such as HTML, CSS etc.), and data transformation language (such as XSL). One or more such software sections or modules may be integrated into a computer system, non-transitory computer readable media, or existing communications software.

[000132] The words "comprising," "having," "containing," and "including," and other similar forms are intended to be equivalent in meaning and be interpreted as open ended, in that, an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. In addition, the singular forms "a," "an," and "the" are intended to include plural references, unless the context clearly dictates otherwise.

[000133] Having described aspects of the embodiments in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the invention as defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the invention, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.

Claims

WHAT IS CLAIMED IS:

1. A method for providing a user with augmented guidance to capture images of products placed on a store shelving unit, the method comprising:

receiving a video stream captured by at least one image sensor of a mobile device, the video stream depicting different areas of the store shelving unit;

causing a real-time display of the video stream for enabling the user to select areas of the store shelving unit for image capturing;

receiving a plurality of first images captured by the at least one image sensor and associated with a first discontinuous area of the store shelving unit that includes at least two non-overlapping regions, wherein an image resolution of the first plurality of images is higher than an image resolution of the video stream;

analyzing the video stream to identify the first discontinuous area of the store shelving unit; causing a real-time augmented display of the video stream with a marking identifying a second area of the store shelving unit different from the first discontinuous area of the store shelving unit; and receiving at least one second image captured by the at least one image sensor and associated with the second area of the store shelving unit, wherein the at least one second image overlaps with each of the at least two non-overlapping regions of the first discontinuous area.

2. The method of claim 1 , wherein the image resolution of the first plurality of images enables identification of a first plurality of products associated with at least one product type in the store shelving unit

3. The method of claim 2 further comprising:

analyzing the video stream to identify the second area, wherein the second area includes a second plurality of products associated with the at least one product type.

4. The method of claim 1, wherein each of the plurality of first images does not overlap with each other.

5. The method of claim 1, wherein at least some of the plurality of first images overlap with each other.

6. The method of claim 1 further comprising:

monitoring in the video stream changing positions of the first area of the store shelving unit as the mobile device moves relative to the store shelving unit; and

adjusting in real-time positions of the marking to account for the changing positions of the first area of the store shelving unit in the augmented display of the video stream.

7. The method of claim 1, wherein the marking identifying the second area of the store shelving unit includes highlighting the second area in a manner distinct from the first discontinuous area of the store shelving unit and distinct from areas in video stream that are not part of the store shelving unit.

8. The method of claim 1, wherein the marking identifying the second area of the store shelving unit includes highlighting of the first discontinuous area in a manner distinct from the second area of the store shelving unit and distinct from areas in the video stream that are not part of the store shelving unit.

9. The method of claim 1, wherein prior to receiving the plurality of first images, the method further comprising:

causing a real-time augmented display of the video stream with a marking illustrating an area of interest in the store shelving unit for enabling the user to capture images of the area of interest.

10. The method of claim 9, wherein prior to causing a real-time augmented display of the video stream with the marking illustrating the area of interest, the method further comprising:

directing the user to a store shelving unit including the area of interest.

1 1. The method of claim 9, wherein the area of interest comprises an area that is outside a field of view of a plurality of cameras fixedly-connected to other store shelving units.

12. The method of claim 11, further comprising:

uploading images associated with the area of interest and images captured by the plurality of cameras to build a three dimensional store map with information on products in a store.

13. The method of claim 1, wherein identifying the first discontinuous area of the store shelving unit includes:

recognizing in the plurality of first images a plurality of regions of the store shelving unit that include products and have an image quality higher than a selected image quality threshold; and

associating the recognized regions as the first discontinuous area.

14. The method of claim 13, wherein the plurality of regions includes two non-overlapping regions each associated with at least two of the plurality of first images.

5. The method of claim 13, wherein the identified first discontinuous area is associated with less than 95% of a field of view captured by the plurality of first images.

16. The method of claim 13, further comprising:

modifying at least one of the plurality of first images in accordance with the identified first discontinuous area; and

uploading the at least one modified image to a server for product identification and for monitoring compliance with a desired product placement.

17. The method of claim 13, further comprising:

causing a display of an indicator, in the real-time augmented display of the video stream, informing the user that some of the products depicted in the plurality of first images were not captured in the image quality higher than the selected image quality threshold.

18. The method of claim 17, wherein the indicator is configured to guide a user how to improve the image quality.

19. The method of claim 1 , further comprising:

analyzing the video stream to identify a combined area of the first discontinuous area and the second area;

causing a real-time augmented display of the video stream with a marking identifying a third area of the store shelving unit different from the combined area of the first discontinuous area and the second area; and

receiving at least one third image captured by the at least one image sensor and associated with the third area of the store shelving unit.

20. The method of claim 1, further comprising:

identifying an overlap area in the at least one second image;

selecting from the plurality of first images and the at least one second image, image data associated with the overlap area that has better image quality; and

transmitting the selected image data to a server for product identification and for monitoring compliance with the desired product placement.

21. The method of claim 1 , further comprising:

determining that the user is about to start capturing images of a second store shelving unit; and informing the user that the first store shelving unit still includes at least one region for which no images were received.

22. A mobile device for providing a user with augmented guidance to capture images of products placed on a store shelving unit, the mobile device comprising:

at least one image sensor configured to capture image data from the environment of the user; at least one processor configured to:

receive a video stream captured by the at least one image sensor, the video stream depicting different areas of the store shelving unit;

cause a real-time display of the video stream for enabling the user to select areas of the store shelving unit for image capturing;

receive a plurality of first images captured by the at least one image sensor and associated with a first discontinuous area of the store shelving unit that includes at least two non-overlapping regions, wherein an image resolution of the first plurality of images is higher than an image resolution of the video stream;

analyze the video stream to identify the first discontinuous area of the store shelving unit; cause a real-time augmented display of the video stream with a marking identifying a second area of the store shelving unit different from the first discontinuous area of the store shelving unit; and

receive at least one second image captured by the at least one image sensor and associated with the second area of the store shelving unit, wherein the at least one second image overlaps with each of the at least two non-overlapping regions of the first discontinuous area.

23. The mobile device of claim 22, wherein the mobile device is a smartphone including a screen configured to display the real-time augmented display of the video stream.

24. The mobile device of claim 22, wherein the mobile device is a headset configured to project the realtime augmented display of the video stream to an eye of the user.

25. The mobile device of claim 22, further comprising a transmitter configured to wirelessly upload images to a server for product identification and for monitoring compliance with the desired product placement.

26. The mobile device of claim 22, further comprising a receiver configured to obtain from a server information associated with a plurality of cameras fixedly connected to an opposing store shelving unit, and to cause a real-time augmented display of the video stream with markings illustrating areas monitored by the plurality of cameras.

27. A non-transitory computer readable medium for providing a user with augmented guidance to capture images of product inventory placed on a store shelving unit, the computer readable medium containing instructions that when executed by a processor cause the processor to perform operations, comprising: receiving a video stream captured by at least one image sensor of a mobile device, the video stream depicting different areas of the store shelving unit;