US20190138795A1 - Automatic Object Detection and Recognition via a Camera System - Google Patents
Automatic Object Detection and Recognition via a Camera System Download PDFInfo
- Publication number
- US20190138795A1 US20190138795A1 US16/163,521 US201816163521A US2019138795A1 US 20190138795 A1 US20190138795 A1 US 20190138795A1 US 201816163521 A US201816163521 A US 201816163521A US 2019138795 A1 US2019138795 A1 US 2019138795A1
- Authority
- US
- United States
- Prior art keywords
- video
- camera
- detected
- human face
- triggering event
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G06K9/00288—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G06K9/00228—
-
- G06K9/00268—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B13/00—Burglar, theft or intruder alarms
- G08B13/18—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
- G08B13/189—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
- G08B13/194—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
- G08B13/196—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
- G08B13/19695—Arrangements wherein non-video detectors start video recording or forwarding but do not generate an alarm themselves
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/036—Insert-editing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/65—Control of camera operation in relation to power supply
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/667—Camera operation mode switching, e.g. between still and video, sport and normal or high- and low-resolution modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/765—Interface circuits between an apparatus for recording and another apparatus
- H04N5/77—Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/765—Interface circuits between an apparatus for recording and another apparatus
- H04N5/77—Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera
- H04N5/772—Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera the recording apparatus and the television camera being placed in the same enclosure
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/91—Television signal processing therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
- H04N7/183—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a single remote source
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
- H04N7/188—Capturing isolated or intermittent images triggered by the occurrence of a predetermined event, e.g. an object reaching a predetermined position
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/80—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
- H04N9/82—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
- H04N9/8205—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/02—Casings; Cabinets ; Supports therefor; Mountings therein
- H04R1/04—Structural association of microphone with electric circuitry therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/326—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B13/00—Burglar, theft or intruder alarms
- G08B13/18—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
- G08B13/189—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
- G08B13/194—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
- G08B13/196—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
- G08B13/19665—Details related to the storage of video surveillance data
- G08B13/19671—Addition of non-video data, i.e. metadata, to video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2499/00—Aspects covered by H04R or H04S not otherwise provided for in their subgroups
- H04R2499/10—General applications
- H04R2499/11—Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's
Definitions
- the present disclosure relates to object detection and recognition via camera hardware and software.
- Security cameras have been widely deployed in physical spaces for many years. Typically, cameras record video of a space the entire time they are on, which is twenty four hours a day, for seven days a week. This results in lots of video data to be stored. Additionally, a person needs to view many hours of video where nothing of significance is happening, simply to determine if anything significant even occurred in the space or not.
- Activity based recording cameras are gradually being introduced into the market, which record video only if a specific activity or event is detected in the physical space around the camera. For consumer use, such cameras are typically sold for $100-$300. However, these cameras still require a human to review recorded video after the fact to determine if an unusual event happened that the owner of the space should be concerned about, or if the activity recorded by the camera is one that is routine and not concerning. Thus, this can still be burdensome on the user of the camera.
- a camera system is needed that is easy to use for the average consumer, and can automatically detect and recognize an object present in a physical space around the camera, without the need for burdensome user intervention.
- processor-implemented systems and methods for automatically detecting and recognizing objects, such as human faces, via a smart camera system when a camera located in a physical space detects the occurrence of a triggering event (such as motion or sound activity), the camera begins recording video for a predetermined time period. Substantially simultaneously, the camera firmware processes video frames to detect whether a human face is present in any of the video frames of the recorded video.
- a triggering event such as motion or sound activity
- a metadata file associated with the recorded video is updated with information regarding the specific video frames in the recorded video where human faces are present.
- FIG. 1 illustrates an environment within which systems and methods for automatic facial detection and recognition via a camera system can be implemented, according to an example embodiment.
- FIG. 2 illustrates an exemplary camera that can be used for automatic facial detection and recognition via the disclosed camera system.
- FIG. 3 is a block diagram showing various modules of a video analysis system for processing captured video, in accordance with certain embodiments.
- FIG. 4 is a block diagram showing various modules of a system for facial recognition, in accordance with certain embodiments.
- FIG. 5 is a process flow diagram showing a method for automatic facial detection via a camera system, according to an example embodiment.
- FIG. 6 is a process flow diagram showing a method for automatic facial recognition via a camera system, according to an example embodiment.
- FIG. 7 depicts an exemplary screenshot of a video frame from recorded video, according to an example embodiment.
- FIG. 8 and FIG. 9 depict an exemplary screenshot of a user interface provided on a user computing device.
- FIG. 10 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
- the disclosure relates to a system and methods for facilitating automatic facial recognition and detection in camera systems. More specifically, the system allows a consumer-friendly camera to be deployed in a physical space, such as a house, office building, or other room.
- the camera through use of specially designed hardware and specially programmed software, can record short video clips when a triggering event is detected by one or more sensors on the camera, and/or by a microphone on the camera.
- activity based recording this method provides a smart and automated way to enable and disable a camera recording, so that only potentially significant events are detected and recorded, thus obviating the need for many hours of recorded video of an empty room with no activity occurring.
- pre-processing software operating on the camera itself can mark specific video frames that contain a human face. Recorded video folders have additional information on frame(s) containing human faces so that further analysis of these frames can be conducted. That is, while recording the video on the camera itself, individual frames are processed simultaneously in camera firmware and a metadata file is generated and updated with information regarding the specific video frame(s) in which a human face was detected. The recorded video along with the metadata information is transmitted to facial recognition software for further analysis.
- the facial recognition software may operate via a processor separate from the camera itself, such as on a separate connected server in a cloud or elsewhere.
- a typical facial recognition method processes an entire video clip to detect and recognize faces, which is time consuming and significantly increases the compute power required.
- video analysis software processes the recorded video, extracts the frames that have been previously marked by the camera as containing a human face, and applies facial recognition algorithm(s) to those selected video frames. That is, instead of performing facial recognition analysis on the entire video clip, the video analysis software processes the metadata file from the camera and carries out further facial analysis on the selective frames identified in the metadata file. This unique method significantly reduces the compute time and storage resources required.
- the results of the further facial analysis may constitute facial recognition information, which may then be transmitted to a user of the camera system via a user interface on a software application operating on a user device (also referred to herein sometimes as a user computing device).
- the camera itself can be manufactured at a consumer-friendly price point, and deployed by a consumer quickly and easily in a physical space. Further, by pre-processing video clips in the camera firmware itself, only selected video frames need to be analyzed by the software algorithm for recognizing the detected faces. This significantly reduces the computing burden and time for the facial recognition process, as well as significantly reducing the computing storage resources necessary. Further, this allows the facial recognition analysis to occur quickly, in substantially real-time. Thus, if an unrecognized face is detected in the video clip, the user can be alerted quickly and take appropriate action in a timely manner.
- the camera can identify specific video frames that contain a known object other than a human.
- a camera can identify video frames containing another body part of a human, a body part of an animal, and/or an inanimate object.
- FIG. 1 illustrates an environment 100 within which systems and methods for automatic facial detection and recognition via a camera system can be implemented, in accordance with some embodiments.
- the environment 100 may include a camera 102 containing a camera lens 104 , and camera sensor(s) 106 .
- the camera 102 may be deployed in a physical space 108 , such as a house. Though not explicitly shown in exemplary FIG. 1 , camera 102 also has one or more additional components in various embodiments, that enable its operation for the purposes of the present disclosure.
- the captured video 112 from the camera 102 may be transmitted via a network 110 to a cloud video analysis system 122 , which may include a system for facial recognition 124 .
- the cloud video analysis system 122 may further utilize a database 114 and one or more computing processors and volatile and non-volatile memory.
- the system for facial recognition 124 may generate facial recognition information 116 , which is transmitted through network 110 to an application operating on a user device 118 , which in turn can be viewed by a user 120 .
- facial recognition information 116 is transmitted through network 110 to an application operating on a user device 118 , which in turn can be viewed by a user 120 .
- a camera 102 may be deployed in any physical space 108 to record audio and/or video around the physical space 108 . While physical space 108 is depicted in exemplary FIG. 1 as a house, a person of ordinary skill in the art will understand that camera 102 may be deployed in any physical space, such as an office building, or any other space. Further, while only one camera 102 is depicted in FIG. 1 for simplicity, there can be any number of cameras in physical space 108 . If multiple cameras are located in space 108 , one or more of the cameras may be in wireless communication with one another, in exemplary embodiments. Further, while camera 102 is depicted in FIG. 1 as a standalone device, in other embodiments, camera 102 may be incorporated as a part of other electronic devices. For example, camera 102 may be incorporated as part of a smartphone, tablet, intelligent personal assistant, or other smart electronic device.
- Camera 102 is described in further detail with respect to FIG. 2 .
- camera 102 is a consumer-friendly camera that can be utilized by a human user without needing to have any specialized camera expertise.
- the camera 102 may have one or more lens 104 , with which video is captured.
- lens 104 may be any type of lens typically found in consumer cameras, such as a standard prime lens, zoom lens, and wide angle lens.
- Camera 102 further has one or more sensors 106 .
- Sensor(s) 106 may be any type of sensor to monitor conditions around the camera 102 .
- sensor 106 may comprise one or more of a PIR (passive infrared) sensor that can bring to life colored night vision, a motion sensor, a temperature sensor, a humidity sensor, a GPS, etc.
- PIR passive infrared
- other types of sensors can be utilized to preset other types of conditions or triggers as well for camera 102 .
- camera 102 has additional components that enable its operation.
- camera 102 may have power component(s) 206 .
- Power component(s) 206 may comprise an electrical connector interface for electronically coupling a power source to, or for providing power to the camera 102 .
- Electrical connector interface may comprise, for example, an electrical cable (the electrical cable can be any of a charging cable, a FireWire cable, a USB cable, a micro-USB cable, a lightning cable, a retractable cable, a waterproof cable, a cable that is coated/covered with a material that would prevent an animal from chewing through to the electrical wiring, and combinations thereof), electrical ports (such as a USB port, micro-USB port, microSD port, etc.), a connector for batteries (including rechargeable battery, non-rechargeable battery, battery packs, external chargers, portable power banks, etc.), and any other standard power source used to provide electricity/power to small electronic devices.
- an electrical cable can be any of a charging cable, a FireWire cable, a USB cable, a micro-USB cable, a lightning cable, a retractable cable, a waterproof cable, a cable that is coated/covered with a material that would prevent an animal from chewing through to the electrical wiring, and combinations thereof
- electrical ports such as a USB port, micro
- power component(s) 206 comprises at least one battery provided within a housing unit.
- the battery may also have a wireless connection capability for wireless charging, or induction charging capabilities.
- Camera 102 also comprises audio component(s) 204 .
- audio component(s) 204 may comprise one or more microphones for receiving, recording, and transmitting audio.
- Camera 102 further has processing component(s) 208 to enable it to perform processing functions discussed herein.
- Processing component(s) 208 may comprise at least one processor, static or main memory, and software such as firmware that is stored on the memory and executed by a processor.
- Processing component(s) 208 may further comprise a timer that operates in conjunction with the functions disclosed herein.
- a specialized video processor is utilized with a hardware accelerator and specially programmed firmware to identify triggering events, begin recording audio and/or video (in either Standard Definition or High Definition), cease recording of audio and/or video, process the captured video frames and insert metadata information regarding the specific video frame(s) containing a human face, and transmit the recorded audio, video, and metadata to a video analysis system 122 operating via software in a cloud computing environment.
- Camera 102 also comprises networking component(s) 202 , to enable camera 102 to connect to network 110 in a wired or wireless manner, similar to networking capabilities utilized by persons of ordinary skill in the art. Further, networking component(s) 202 may also allow for remote control of camera 102 .
- the networking communication capability of camera 102 can be achieved via an antenna attached to any portion of camera 102 , and/or via a network card.
- Camera 102 may communicate with network 110 via wired or wireless communication capabilities, such as radio frequency, Bluetooth, ZigBee, Wi-Fi, electromagnetic wave, RFID (radio frequency identification), etc.
- a human user 120 may further interact with, and control certain operations of the camera 102 via a graphical user interface displayed on a user device 118 .
- the graphical user interface can be accessed by a human user 120 via a web browser on the user device 118 (such as a desktop or laptop computer, netbook, smartphone, tablet, etc.).
- a human user may further interact with, and control certain operations of the camera 102 via a dedicated software application on a smartphone, tablet, smartwatch, laptop or desktop computer, or any other computing device with a processor that is capable of wireless communication.
- a human user 120 can interact with, and control certain operations of the camera 102 via a software application utilized by the user 120 for controlling and monitoring other aspects of a residential or commercial building, such as a security system, home monitoring system for Internet-enabled appliances, voice assistant such as Amazon Echo, Google Home, etc.
- a software application utilized by the user 120 for controlling and monitoring other aspects of a residential or commercial building, such as a security system, home monitoring system for Internet-enabled appliances, voice assistant such as Amazon Echo, Google Home, etc.
- camera 102 captures video as discussed herein.
- the captured video 112 is then transmitted to video analysis system 122 via network 110 .
- the network 110 may include the Internet or any other network capable of communicating data between devices. Suitable networks may include or interface with any one or more of, for instance, a local intranet, a Personal Area Network, a Local Area Network, a Wide Area Network, a Metropolitan Area Network, a virtual private network, a storage area network, a frame relay connection, an Advanced Intelligent Network connection, a synchronous optical network connection, a digital T1, T3, E1 or E3 line, Digital Data Service connection, Digital Subscriber Line connection, an Ethernet connection, an Integrated Services Digital Network line, a dial-up port such as a V.90, V.34 or V.34bis analog modem connection, a cable modem, an Asynchronous Transfer Mode connection, or a Fiber Distributed Data Interface or Copper Distributed Data Interface connection.
- communications may also include links to any of a variety of wireless networks, including Wireless Application Protocol, General Packet Radio Service, Global System for Mobile Communication, Code Division Multiple Access or Time Division Multiple Access, cellular phone networks, Global Positioning System, cellular digital packet data, Research in Motion, Limited duplex paging network, Bluetooth radio, or an IEEE 802.11-based radio frequency network.
- the network can further include or interface with any one or more of an RS-232 serial connection, an IEEE-1394 (FireWire) connection, a Fiber Channel connection, an IrDA (infrared) port, a SCSI (Small Computer Systems Interface) connection, a Universal Serial Bus (USB) connection or other wired or wireless, digital or analog interface or connection, mesh or Digi® networking.
- the network 110 may be a network of data processing nodes that are interconnected for the purpose of data communication.
- the network 110 may include any suitable number and type of devices (e.g., routers and switches) for forwarding commands, content, requests, and/or responses between each user device 118 , each camera 102 , and the video analysis system 122 .
- the video analysis system 124 may include a server-based distributed software application, thus the system 122 may include a central component residing on a server and one or more client applications residing on one or more user devices and communicating with the central component via the network 110 .
- the user 120 may communicate with the system 122 via a client application available through the user device 118 .
- Video analysis system 122 may comprise software application(s) for processing captured video 112 , as well as other capabilities. Video analysis system 122 is further in communication with one or more data structures, such as database 114 . In exemplary embodiments, at least some components of video analysis system 122 operate on one or more cloud computing devices or servers.
- Video analysis system 122 further comprises a system for facial recognition 124 .
- the system for facial recognition 124 analyzes the specific video frames noted in metadata associated with captured video 112 . Through the analysis, which consists of one or more software algorithms executed by at least one processor, the system for facial recognition 124 analyzes the video frames from captured video 112 that have been noted as containing a human face. The human face detected in each video frame is then “recognized”, i.e., associated with a likely name of the person whose face is detected.
- Face recognition information 116 which may comprise a name of one or more people recognized in captured video 112 is then transmitted by system for facial recognition 124 , through network 110 , to a user device 118 , at which point it can be viewed by a user.
- additional information may be transmitted with face recognition information 116 , such as a copy of the face image from the captured video 112 , and/or other information regarding the facial recognition.
- Face recognition information 116 is displayed via a user interface on a screen of user device 118 , in the format of a pop-up alert, text message, e-mail message, or any other means of communicating with user 120 .
- the user device 118 may include a Graphical User Interface for displaying the user interface associated with the system 122 .
- the user device 118 may include a mobile telephone, a desktop personal computer (PC), a laptop computer, a smartphone, a tablet, a smartwatch, intelligent personal assistant device, smart appliance, and so forth.
- FIG. 3 is a block diagram showing various modules of a video analysis system 122 for processing captured video 112 , in accordance with certain embodiments.
- the system 112 may include a processor 310 and a database 320 .
- the database 320 may include computer-readable instructions for execution by the processor 310 .
- the processor 310 may include a programmable processor, such as a microcontroller, central processing unit (CPU), and so forth.
- the processor 310 may include an application-specific integrated circuit or programmable logic array, such as a field programmable gate array, designed to implement the functions performed by the system 122 .
- the system 122 may be installed on a user device or may be provided as a cloud service residing in a cloud storage. The operations performed by the processor 310 and the database 320 are described in further detail herein.
- FIG. 4 is a block diagram showing various modules of a system for facial recognition 124 , for identifying (recognizing) detected human faces in select frames of captured video 112 , in accordance with certain embodiments.
- the system 124 may include a processor 410 and a database 420 .
- the processor 410 of system for facial recognition 124 may be the same, or different from processor 310 of the video analysis system 122 .
- database 420 of system for facial recognition 124 may be the same or different than database 320 of video analysis system 122 .
- Database 420 may include computer-readable instructions for execution by the processor 410 .
- the processor 410 may include a programmable processor, such as a microcontroller, central processing unit (CPU), and so forth.
- the processor 410 may include an application-specific integrated circuit or programmable logic array, such as a field programmable gate array, designed to implement the functions performed by the system for facial recognition 124 .
- the system for facial recognition 124 may be installed on a user device or may be provided as a cloud service residing in a cloud storage. The operations performed by the processor 410 and the database 420 are described in further detail herein.
- FIG. 5 is a process flow diagram showing a method 500 for automatic facial detection via a camera system, within the environment described with reference to FIG. 1 .
- the operations may be combined, performed in parallel, or performed in a different order.
- the method 500 may also include additional or fewer operations than those illustrated.
- the method 500 may be performed by processing logic that may comprise hardware (e.g., decision making logic, dedicated logic, programmable logic, and microcode), hardware accelerator, software (such as firmware on camera 102 or other software run on a special-purpose computer system), or any combination of the above.
- processing logic may comprise hardware (e.g., decision making logic, dedicated logic, programmable logic, and microcode), hardware accelerator, software (such as firmware on camera 102 or other software run on a special-purpose computer system), or any combination of the above.
- the method 500 may commence with camera 102 detecting a triggering event at operation 502 .
- camera 102 may be located in a physical space 108 and powered on, but not be actively recording video and/or audio.
- a triggering event may cause camera 102 to begin recording.
- the triggering event can be any preset condition or trigger.
- the triggering event is a noise detected by a microphone on camera 102 above a certain decibel threshold.
- a triggering event is a noise detected by a microphone on camera 102 within a certain time period.
- a triggering event may be the detection of motion, temperature, smoke, humidity, gaseous substance, or any other environmental condition above a preset threshold, or occurring within a preset time period.
- the preset threshold may be configured by a manufacturer of camera 102 , or configured by a user 120 .
- camera 102 Upon detection of the triggering event, camera 102 enters capture video and/or audio mode at operation 504 for a certain predetermined time period.
- capture mode may be enabled on camera 102 for an amount of time that is pre-configured by a manufacturer of camera 102 , or pre-configured by a user 120 .
- the predetermined time period that capture mode is enabled on camera 102 may be variable based on the type of triggering event, time of day, or any other criterion. In an exemplary embodiment, capture mode is enabled on camera 102 for 5-30 seconds.
- video is recorded by camera 102 onto memory within camera 102 hardware. Further, substantially simultaneously, recorded video is processed by firmware on a specialized video processor hardware and/or hardware accelerator within camera 102 .
- the firmware processes recorded video and detects select video frames within the recorded video that contain a human face or are likely to contain a human face.
- a threshold confidence level may be preset for the facial detection, such that false positives are preferable to false negatives.
- the facial detection may occur substantially instantaneously (within 1 second) of the video capture by camera 102 .
- exemplary metadata may comprise:
- typedef struct ⁇ unsigned int timestamp; /* timestamp when face was detected */ unsigned char face_detected; /* Face has been detected or not - YES/NO */ unsigned char no_of_faces; /* No of faces detected at the particular timestamp*/ unsigned char type; /* human/pet - HUMAN/PET */ unsigned char reserved; ⁇ METADATA;
- the recorded video and updated metadata file are transmitted to video analysis system 122 for further analysis, at operation 510 .
- camera 102 is in wireless communication with video analysis system 122 and operation 510 occurs in a wireless manner.
- the transmission occurs via a wired communication network.
- video analysis system 122 may be executed by a module within camera 102 itself.
- an exemplary transmission from camera 102 to video analysis system 122 may comprise a plurality of computing files.
- the following computing files may be transmitted:
- “.ts” files are video recordings and “.met
- FIG. 6 is a process flow diagram showing a method 600 for automatic facial recognition via a camera system, within the environment described with reference to FIG. 1 .
- the operations may be combined, performed in parallel, or performed in a different order.
- the method 600 may also include additional or fewer operations than those illustrated.
- the method 600 may be performed by processing logic that may comprise hardware (e.g., decision making logic, dedicated logic, programmable logic, and microcode), hardware accelerator, software (such as firmware or other software run on a special-purpose computer system or general purpose computer system), or any combination of the above.
- processing logic may comprise hardware (e.g., decision making logic, dedicated logic, programmable logic, and microcode), hardware accelerator, software (such as firmware or other software run on a special-purpose computer system or general purpose computer system), or any combination of the above.
- Various operations of method 600 may be performed by video analysis system 122 , system for facial recognition 124 , or a combination of both systems.
- the method 600 may commence at operation 602 with video analysis system 122 receiving recorded video and/or audio, along with corresponding metadata from camera 102 .
- the recorded video may be a video clip of any duration. In preferred embodiments, the recorded video clip is between 5-30 seconds, preferably about 20 seconds in length.
- the recorded video and corresponding metadata may be received from camera 102 via any wireless or wired communication mechanism.
- the video analysis system 122 examines the metadata received to determine which video frames of the recorded video clip have detected human faces. The select video frames are then extracted from the video clip for further analysis by the system for facial recognition 124 .
- system for facial recognition 124 processes the extracted video frames from the received video clip.
- the processing may optionally entail verifying whether the detected human face by the camera 102 is indeed a human face. Further, the processing may entail identifying the detected human face in the select video frames. This process is also referred to herein as recognizing the human face.
- the human face is compared to known human faces stored in a data structure such as a database to determine a match with a previously identified human face.
- the previously identified human faces may be identified by one or more machine learning algorithms, and/or by human input.
- a threshold confidence level may be preset for the facial recognition, such that false positives are preferable to false negatives, or vice versa.
- the system for facial recognition 124 may then generate a result with face recognition information 116 .
- Face recognition information 116 may comprise at least one of a first name associated with the human face, a last name associated with the human face, a relationship of the face recognized with user 120 , a location where the face was detected within physical space 108 , or any other factor.
- face recognition information 116 may comprise a likely identity of the human, and a request that the user 120 verify the likely identity.
- face recognition information 116 may comprise a result that the human face detected is not recognized by system for facial recognition 124 .
- the user may be asked if she recognizes the detected human face. If so, user 120 may input identity information for the detected face, which is stored by the system for facial recognition 124 in its database of known identified faces. In some embodiments, user 120 may not recognize the detected face, and system for facial recognition 124 stores the detected face as an unknown person.
- the face recognition information 116 result is transmitted to user 120 at operation 608 , and displayed on a user device 118 at operation 610 .
- the face recognition information 116 may be partially or fully displayed on a user interface of user device 118 via a pop-up notification, text message, e-mail message, via a web browser operating on user device 118 , or via a dedicated software application operating on user device 118 .
- a human face can be automatically detected and recognized by a camera system quickly and efficiently, without needed significant user input.
- FIG. 7 depicts an exemplary screenshot 700 of a video frame from captured video 112 .
- at least face 702 may be detected by camera 102 .
- System for facial recognition 124 may compare face 702 to previously identified faces (familiar faces), and to previously unidentified faces (unfamiliar faces), as depicted in the exemplary screenshot 800 of FIG. 8 .
- face 702 is categorized as an “unfamiliar face” by system for facial recognition 124 . This information may be transmitted and presented on a graphical user interface on user device 118 .
- FIG. 9 depicts an exemplary screenshot 900 that may be further displayed on a graphical user interface on user device 118 .
- a user has input that face 702 , which was categorized as an “unfamiliar” or unrecognized face by the system for facial recognition 124 , is actually a known person to the user with the name “Kathleen”.
- System for facial recognition 124 detects that a previously identified face with that name exists, and prompts the user as to whether face 702 should be combined with the previously recognized face under the identifier “Kathleen”. In this way, system for facial recognition 124 has learned that these different views are of the same person, and will recognize face 702 accordingly in future video frames.
- FIG. 10 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system 1000 , within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
- Computer system 1000 may be implemented within camera 102 , video analysis system 122 , and/or system for facial recognition 124 .
- the machine operates as a standalone device or can be connected (e.g., networked) to other machines.
- the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
- the machine can be a PC, a tablet PC, a set-top box, a cellular telephone, a digital camera, a portable music player (e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- a portable music player e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 player
- a web appliance e.g., a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- a network router e.g., a router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken
- the example computer system 1000 includes a processor or multiple processors 1002 , a hard disk drive 1004 , a main memory 1006 , and a static memory 1008 , which communicate with each other via a bus 1010 .
- the computer system 1000 may also include a network interface device 1012 .
- the hard disk drive 1004 may include a computer-readable medium 1020 , which stores one or more sets of instructions 1022 embodying or utilized by any one or more of the methodologies or functions described herein.
- the instructions 1022 can also reside, completely or at least partially, within the main memory 1006 and/or within the processors 1002 during execution thereof by the computer system 1000 .
- the main memory 1006 and the processors 1002 also constitute machine-readable media.
- While the computer-readable medium 1020 is shown in an exemplary embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
- the term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions.
- the term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, Random Access Memory (RAM), Read-Only Memory (ROM), and the like.
- the exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware.
- the computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems.
- the computer system 1000 may be implemented as a cloud-based computing environment, such as a virtual machine operating within a computing cloud.
- the computer system 1000 may itself include a cloud-based computing environment, where the functionalities of the computer system 1000 are executed in a distributed fashion.
- the computer system 1000 when configured as a computing cloud, may include pluralities of computing devices in various forms, as will be described in greater detail below.
- a cloud-based computing environment is a resource that typically combines the computational power of a large grouping of processors (such as within web servers) and/or that combines the storage capacity of a large grouping of computer memories or storage devices.
- Systems that provide cloud-based resources may be utilized exclusively by their owners, or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
- the cloud may be formed, for example, by a network of web servers that comprise a plurality of computing devices, such as a client device, with each server (or at least a plurality thereof) providing processor and/or storage resources.
- These servers may manage workloads provided by multiple users (e.g., cloud resource consumers or other users).
- users e.g., cloud resource consumers or other users.
- each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depends on the type of business associated with the user.
- Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk.
- Volatile media include dynamic memory, such as system RAM.
- Transmission media include coaxial cables, copper wire, and fiber optics, among others, including the wires that comprise one embodiment of a bus.
- Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM disk, digital video disk, any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a Programmable Read-Only Memory, an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory, a FlashEPROM, any other memory chip or data exchange adapter, a carrier wave, or any other medium from which a computer can read.
- EPROM Erasable Programmable Read-Only Memory
- FlashEPROM any other memory chip or data exchange adapter, a carrier wave, or any other medium from which a computer can read.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Otolaryngology (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Studio Devices (AREA)
Abstract
Systems and methods for automatically detecting and recognizing human faces by a smart camera system are described herein. A camera located in a physical space may detect a triggering event and begin recording video and/or audio for a predetermined time period. The camera also processes the video to determine if any human faces are present. If so, the camera updates the metadata file associated with the recorded video with information identifying specific video frames that contain a human face. The recorded video and metadata file are transmitted to video analysis software, which performs further facial analysis on the selected video frames, in an attempt to identify the person(s) detected in the video. Results of the facial recognition process are presented on a user computing device.
Description
- The present utility patent application is related to, and claims the priority benefit under 35 U.S.C. 119(e) of: U.S. Provisional Application No. 62/585,686 filed on Nov. 14, 2017 and entitled “Unique Method to Detect Faces in Videos and Process Selective Frames in Recorded Videos to Recognize and Analyze Faces for Camera Applications”; U.S. Provisional Application No. 62/582,919 filed on Nov. 7, 2017 and entitled “Activity Based Recording (ABR) for Camera Applications”, and U.S. Provisional Application No. 62/583,875 filed on Nov. 9, 2017 and entitled “Sound Detection Sensing Logic for Camera Applications”. The disclosure of all of the above-referenced applications is incorporated herein by reference for all purposes to the extent that such subject matter is not inconsistent herewith or limiting hereof.
- The present disclosure relates to object detection and recognition via camera hardware and software.
- Security cameras have been widely deployed in physical spaces for many years. Typically, cameras record video of a space the entire time they are on, which is twenty four hours a day, for seven days a week. This results in lots of video data to be stored. Additionally, a person needs to view many hours of video where nothing of significance is happening, simply to determine if anything significant even occurred in the space or not.
- Activity based recording cameras are gradually being introduced into the market, which record video only if a specific activity or event is detected in the physical space around the camera. For consumer use, such cameras are typically sold for $100-$300. However, these cameras still require a human to review recorded video after the fact to determine if an unusual event happened that the owner of the space should be concerned about, or if the activity recorded by the camera is one that is routine and not concerning. Thus, this can still be burdensome on the user of the camera.
- To overcome these challenges, a camera system is needed that is easy to use for the average consumer, and can automatically detect and recognize an object present in a physical space around the camera, without the need for burdensome user intervention.
- This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
- Provided are processor-implemented systems and methods for automatically detecting and recognizing objects, such as human faces, via a smart camera system. In exemplary embodiments, when a camera located in a physical space detects the occurrence of a triggering event (such as motion or sound activity), the camera begins recording video for a predetermined time period. Substantially simultaneously, the camera firmware processes video frames to detect whether a human face is present in any of the video frames of the recorded video.
- If at least one human face is detected in at least one video frame of the recorded video, a metadata file associated with the recorded video is updated with information regarding the specific video frames in the recorded video where human faces are present.
- Further facial analysis is then conducted on the selective video frames to identify the human faces detected as being familiar faces or unfamiliar faces. Results of the further facial analysis are presented to a user via a user computing device.
- Other example embodiments of the disclosure and aspects will become apparent from the following description taken in conjunction with the following drawings.
- Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.
-
FIG. 1 illustrates an environment within which systems and methods for automatic facial detection and recognition via a camera system can be implemented, according to an example embodiment. -
FIG. 2 illustrates an exemplary camera that can be used for automatic facial detection and recognition via the disclosed camera system. -
FIG. 3 is a block diagram showing various modules of a video analysis system for processing captured video, in accordance with certain embodiments. -
FIG. 4 is a block diagram showing various modules of a system for facial recognition, in accordance with certain embodiments. -
FIG. 5 is a process flow diagram showing a method for automatic facial detection via a camera system, according to an example embodiment. -
FIG. 6 is a process flow diagram showing a method for automatic facial recognition via a camera system, according to an example embodiment. -
FIG. 7 depicts an exemplary screenshot of a video frame from recorded video, according to an example embodiment. -
FIG. 8 andFIG. 9 depict an exemplary screenshot of a user interface provided on a user computing device. -
FIG. 10 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed. - The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with exemplary embodiments. These exemplary embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other embodiments can be utilized, or structural, logical, and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
- The disclosure relates to a system and methods for facilitating automatic facial recognition and detection in camera systems. More specifically, the system allows a consumer-friendly camera to be deployed in a physical space, such as a house, office building, or other room. The camera, through use of specially designed hardware and specially programmed software, can record short video clips when a triggering event is detected by one or more sensors on the camera, and/or by a microphone on the camera. Also referred to as activity based recording, this method provides a smart and automated way to enable and disable a camera recording, so that only potentially significant events are detected and recorded, thus obviating the need for many hours of recorded video of an empty room with no activity occurring.
- Further, pre-processing software operating on the camera itself can mark specific video frames that contain a human face. Recorded video folders have additional information on frame(s) containing human faces so that further analysis of these frames can be conducted. That is, while recording the video on the camera itself, individual frames are processed simultaneously in camera firmware and a metadata file is generated and updated with information regarding the specific video frame(s) in which a human face was detected. The recorded video along with the metadata information is transmitted to facial recognition software for further analysis. The facial recognition software may operate via a processor separate from the camera itself, such as on a separate connected server in a cloud or elsewhere.
- A typical facial recognition method processes an entire video clip to detect and recognize faces, which is time consuming and significantly increases the compute power required. In embodiments of the present disclosure, video analysis software processes the recorded video, extracts the frames that have been previously marked by the camera as containing a human face, and applies facial recognition algorithm(s) to those selected video frames. That is, instead of performing facial recognition analysis on the entire video clip, the video analysis software processes the metadata file from the camera and carries out further facial analysis on the selective frames identified in the metadata file. This unique method significantly reduces the compute time and storage resources required. The results of the further facial analysis may constitute facial recognition information, which may then be transmitted to a user of the camera system via a user interface on a software application operating on a user device (also referred to herein sometimes as a user computing device).
- By bifurcating the facial detection and facial recognition processes, the camera itself can be manufactured at a consumer-friendly price point, and deployed by a consumer quickly and easily in a physical space. Further, by pre-processing video clips in the camera firmware itself, only selected video frames need to be analyzed by the software algorithm for recognizing the detected faces. This significantly reduces the computing burden and time for the facial recognition process, as well as significantly reducing the computing storage resources necessary. Further, this allows the facial recognition analysis to occur quickly, in substantially real-time. Thus, if an unrecognized face is detected in the video clip, the user can be alerted quickly and take appropriate action in a timely manner.
- In other embodiments, the camera can identify specific video frames that contain a known object other than a human. For example, a camera can identify video frames containing another body part of a human, a body part of an animal, and/or an inanimate object.
-
FIG. 1 illustrates anenvironment 100 within which systems and methods for automatic facial detection and recognition via a camera system can be implemented, in accordance with some embodiments. Theenvironment 100 may include acamera 102 containing acamera lens 104, and camera sensor(s) 106. Thecamera 102 may be deployed in aphysical space 108, such as a house. Though not explicitly shown in exemplaryFIG. 1 ,camera 102 also has one or more additional components in various embodiments, that enable its operation for the purposes of the present disclosure. - The captured
video 112 from thecamera 102 may be transmitted via a network 110 to a cloudvideo analysis system 122, which may include a system forfacial recognition 124. The cloudvideo analysis system 122 may further utilize adatabase 114 and one or more computing processors and volatile and non-volatile memory. - After processing captured
video 112, the system forfacial recognition 124 may generatefacial recognition information 116, which is transmitted through network 110 to an application operating on auser device 118, which in turn can be viewed by auser 120. Each of these components is discussed in further detail below. - A
camera 102 may be deployed in anyphysical space 108 to record audio and/or video around thephysical space 108. Whilephysical space 108 is depicted in exemplaryFIG. 1 as a house, a person of ordinary skill in the art will understand thatcamera 102 may be deployed in any physical space, such as an office building, or any other space. Further, while only onecamera 102 is depicted inFIG. 1 for simplicity, there can be any number of cameras inphysical space 108. If multiple cameras are located inspace 108, one or more of the cameras may be in wireless communication with one another, in exemplary embodiments. Further, whilecamera 102 is depicted inFIG. 1 as a standalone device, in other embodiments,camera 102 may be incorporated as a part of other electronic devices. For example,camera 102 may be incorporated as part of a smartphone, tablet, intelligent personal assistant, or other smart electronic device. -
Camera 102 is described in further detail with respect toFIG. 2 . In various embodiments,camera 102 is a consumer-friendly camera that can be utilized by a human user without needing to have any specialized camera expertise. Thecamera 102 may have one ormore lens 104, with which video is captured. In exemplary embodiments,lens 104 may be any type of lens typically found in consumer cameras, such as a standard prime lens, zoom lens, and wide angle lens. -
Camera 102 further has one ormore sensors 106. Sensor(s) 106 may be any type of sensor to monitor conditions around thecamera 102. By way of non-limiting example,sensor 106 may comprise one or more of a PIR (passive infrared) sensor that can bring to life colored night vision, a motion sensor, a temperature sensor, a humidity sensor, a GPS, etc. As would be understood by persons of ordinary skill in the art, other types of sensors can be utilized to preset other types of conditions or triggers as well forcamera 102. - Referring to
FIG. 2 ,camera 102 has additional components that enable its operation. For example,camera 102 may have power component(s) 206. Power component(s) 206 may comprise an electrical connector interface for electronically coupling a power source to, or for providing power to thecamera 102. Electrical connector interface may comprise, for example, an electrical cable (the electrical cable can be any of a charging cable, a FireWire cable, a USB cable, a micro-USB cable, a lightning cable, a retractable cable, a waterproof cable, a cable that is coated/covered with a material that would prevent an animal from chewing through to the electrical wiring, and combinations thereof), electrical ports (such as a USB port, micro-USB port, microSD port, etc.), a connector for batteries (including rechargeable battery, non-rechargeable battery, battery packs, external chargers, portable power banks, etc.), and any other standard power source used to provide electricity/power to small electronic devices. - In an exemplary embodiment, power component(s) 206 comprises at least one battery provided within a housing unit. The battery may also have a wireless connection capability for wireless charging, or induction charging capabilities.
-
Camera 102 also comprises audio component(s) 204. In various embodiments, audio component(s) 204 may comprise one or more microphones for receiving, recording, and transmitting audio. -
Camera 102 further has processing component(s) 208 to enable it to perform processing functions discussed herein. Processing component(s) 208 may comprise at least one processor, static or main memory, and software such as firmware that is stored on the memory and executed by a processor. Processing component(s) 208 may further comprise a timer that operates in conjunction with the functions disclosed herein. - In various embodiments, a specialized video processor is utilized with a hardware accelerator and specially programmed firmware to identify triggering events, begin recording audio and/or video (in either Standard Definition or High Definition), cease recording of audio and/or video, process the captured video frames and insert metadata information regarding the specific video frame(s) containing a human face, and transmit the recorded audio, video, and metadata to a
video analysis system 122 operating via software in a cloud computing environment. -
Camera 102 also comprises networking component(s) 202, to enablecamera 102 to connect to network 110 in a wired or wireless manner, similar to networking capabilities utilized by persons of ordinary skill in the art. Further, networking component(s) 202 may also allow for remote control ofcamera 102. - In various embodiments, the networking communication capability of
camera 102 can be achieved via an antenna attached to any portion ofcamera 102, and/or via a network card.Camera 102 may communicate with network 110 via wired or wireless communication capabilities, such as radio frequency, Bluetooth, ZigBee, Wi-Fi, electromagnetic wave, RFID (radio frequency identification), etc. - A
human user 120 may further interact with, and control certain operations of thecamera 102 via a graphical user interface displayed on auser device 118. The graphical user interface can be accessed by ahuman user 120 via a web browser on the user device 118 (such as a desktop or laptop computer, netbook, smartphone, tablet, etc.). A human user may further interact with, and control certain operations of thecamera 102 via a dedicated software application on a smartphone, tablet, smartwatch, laptop or desktop computer, or any other computing device with a processor that is capable of wireless communication. In other embodiments, ahuman user 120 can interact with, and control certain operations of thecamera 102 via a software application utilized by theuser 120 for controlling and monitoring other aspects of a residential or commercial building, such as a security system, home monitoring system for Internet-enabled appliances, voice assistant such as Amazon Echo, Google Home, etc. - Returning to
FIG. 1 ,camera 102 captures video as discussed herein. The capturedvideo 112 is then transmitted tovideo analysis system 122 via network 110. - The network 110 may include the Internet or any other network capable of communicating data between devices. Suitable networks may include or interface with any one or more of, for instance, a local intranet, a Personal Area Network, a Local Area Network, a Wide Area Network, a Metropolitan Area Network, a virtual private network, a storage area network, a frame relay connection, an Advanced Intelligent Network connection, a synchronous optical network connection, a digital T1, T3, E1 or E3 line, Digital Data Service connection, Digital Subscriber Line connection, an Ethernet connection, an Integrated Services Digital Network line, a dial-up port such as a V.90, V.34 or V.34bis analog modem connection, a cable modem, an Asynchronous Transfer Mode connection, or a Fiber Distributed Data Interface or Copper Distributed Data Interface connection.
- Furthermore, communications may also include links to any of a variety of wireless networks, including Wireless Application Protocol, General Packet Radio Service, Global System for Mobile Communication, Code Division Multiple Access or Time Division Multiple Access, cellular phone networks, Global Positioning System, cellular digital packet data, Research in Motion, Limited duplex paging network, Bluetooth radio, or an IEEE 802.11-based radio frequency network. The network can further include or interface with any one or more of an RS-232 serial connection, an IEEE-1394 (FireWire) connection, a Fiber Channel connection, an IrDA (infrared) port, a SCSI (Small Computer Systems Interface) connection, a Universal Serial Bus (USB) connection or other wired or wireless, digital or analog interface or connection, mesh or Digi® networking.
- The network 110 may be a network of data processing nodes that are interconnected for the purpose of data communication. The network 110 may include any suitable number and type of devices (e.g., routers and switches) for forwarding commands, content, requests, and/or responses between each
user device 118, eachcamera 102, and thevideo analysis system 122. - The
video analysis system 124 may include a server-based distributed software application, thus thesystem 122 may include a central component residing on a server and one or more client applications residing on one or more user devices and communicating with the central component via the network 110. Theuser 120 may communicate with thesystem 122 via a client application available through theuser device 118. -
Video analysis system 122 may comprise software application(s) for processing capturedvideo 112, as well as other capabilities.Video analysis system 122 is further in communication with one or more data structures, such asdatabase 114. In exemplary embodiments, at least some components ofvideo analysis system 122 operate on one or more cloud computing devices or servers. -
Video analysis system 122 further comprises a system forfacial recognition 124. The system forfacial recognition 124 analyzes the specific video frames noted in metadata associated with capturedvideo 112. Through the analysis, which consists of one or more software algorithms executed by at least one processor, the system forfacial recognition 124 analyzes the video frames from capturedvideo 112 that have been noted as containing a human face. The human face detected in each video frame is then “recognized”, i.e., associated with a likely name of the person whose face is detected. - Face
recognition information 116, which may comprise a name of one or more people recognized in capturedvideo 112 is then transmitted by system forfacial recognition 124, through network 110, to auser device 118, at which point it can be viewed by a user. In some embodiments, additional information may be transmitted withface recognition information 116, such as a copy of the face image from the capturedvideo 112, and/or other information regarding the facial recognition. - Face
recognition information 116 is displayed via a user interface on a screen ofuser device 118, in the format of a pop-up alert, text message, e-mail message, or any other means of communicating withuser 120. - The
user device 118, in some example embodiments, may include a Graphical User Interface for displaying the user interface associated with thesystem 122. Theuser device 118 may include a mobile telephone, a desktop personal computer (PC), a laptop computer, a smartphone, a tablet, a smartwatch, intelligent personal assistant device, smart appliance, and so forth. -
FIG. 3 is a block diagram showing various modules of avideo analysis system 122 for processing capturedvideo 112, in accordance with certain embodiments. Thesystem 112 may include aprocessor 310 and adatabase 320. Thedatabase 320 may include computer-readable instructions for execution by theprocessor 310. Theprocessor 310 may include a programmable processor, such as a microcontroller, central processing unit (CPU), and so forth. In other embodiments, theprocessor 310 may include an application-specific integrated circuit or programmable logic array, such as a field programmable gate array, designed to implement the functions performed by thesystem 122. In various embodiments, thesystem 122 may be installed on a user device or may be provided as a cloud service residing in a cloud storage. The operations performed by theprocessor 310 and thedatabase 320 are described in further detail herein. -
FIG. 4 is a block diagram showing various modules of a system forfacial recognition 124, for identifying (recognizing) detected human faces in select frames of capturedvideo 112, in accordance with certain embodiments. Thesystem 124 may include aprocessor 410 and adatabase 420. Theprocessor 410 of system forfacial recognition 124 may be the same, or different fromprocessor 310 of thevideo analysis system 122. Further,database 420 of system forfacial recognition 124 may be the same or different thandatabase 320 ofvideo analysis system 122. -
Database 420 may include computer-readable instructions for execution by theprocessor 410. Theprocessor 410 may include a programmable processor, such as a microcontroller, central processing unit (CPU), and so forth. In other embodiments, theprocessor 410 may include an application-specific integrated circuit or programmable logic array, such as a field programmable gate array, designed to implement the functions performed by the system forfacial recognition 124. In various embodiments, the system forfacial recognition 124 may be installed on a user device or may be provided as a cloud service residing in a cloud storage. The operations performed by theprocessor 410 and thedatabase 420 are described in further detail herein. -
FIG. 5 is a process flow diagram showing amethod 500 for automatic facial detection via a camera system, within the environment described with reference toFIG. 1 . In some embodiments, the operations may be combined, performed in parallel, or performed in a different order. Themethod 500 may also include additional or fewer operations than those illustrated. Themethod 500 may be performed by processing logic that may comprise hardware (e.g., decision making logic, dedicated logic, programmable logic, and microcode), hardware accelerator, software (such as firmware oncamera 102 or other software run on a special-purpose computer system), or any combination of the above. - The
method 500 may commence withcamera 102 detecting a triggering event atoperation 502. As discussed herein,camera 102 may be located in aphysical space 108 and powered on, but not be actively recording video and/or audio. A triggering event may causecamera 102 to begin recording. The triggering event can be any preset condition or trigger. - In an example embodiment, the triggering event is a noise detected by a microphone on
camera 102 above a certain decibel threshold. In another example embodiment, a triggering event is a noise detected by a microphone oncamera 102 within a certain time period. In other example embodiments, a triggering event may be the detection of motion, temperature, smoke, humidity, gaseous substance, or any other environmental condition above a preset threshold, or occurring within a preset time period. The preset threshold may be configured by a manufacturer ofcamera 102, or configured by auser 120. - Upon detection of the triggering event,
camera 102 enters capture video and/or audio mode atoperation 504 for a certain predetermined time period. In some embodiments, capture mode may be enabled oncamera 102 for an amount of time that is pre-configured by a manufacturer ofcamera 102, or pre-configured by auser 120. Further, the predetermined time period that capture mode is enabled oncamera 102 may be variable based on the type of triggering event, time of day, or any other criterion. In an exemplary embodiment, capture mode is enabled oncamera 102 for 5-30 seconds. - At
operation 506, video is recorded bycamera 102 onto memory withincamera 102 hardware. Further, substantially simultaneously, recorded video is processed by firmware on a specialized video processor hardware and/or hardware accelerator withincamera 102. The firmware processes recorded video and detects select video frames within the recorded video that contain a human face or are likely to contain a human face. In various embodiments, a threshold confidence level may be preset for the facial detection, such that false positives are preferable to false negatives. The facial detection may occur substantially instantaneously (within 1 second) of the video capture bycamera 102. - Information regarding which specific frames, such as the time at which those frames occur in the recorded video clip, is added to metadata associated with the recorded video, at
operation 508. As a non-limiting example, exemplary metadata may comprise: -
typedef struct { unsigned int timestamp; /* timestamp when face was detected */ unsigned char face_detected; /* Face has been detected or not - YES/NO */ unsigned char no_of_faces; /* No of faces detected at the particular timestamp*/ unsigned char type; /* human/pet - HUMAN/PET */ unsigned char reserved; }METADATA; - Subsequently, the recorded video and updated metadata file are transmitted to
video analysis system 122 for further analysis, atoperation 510. In various embodiments,camera 102 is in wireless communication withvideo analysis system 122 andoperation 510 occurs in a wireless manner. In other embodiments, the transmission occurs via a wired communication network. In still further embodiments,video analysis system 122 may be executed by a module withincamera 102 itself. - As a non-limiting example, an exemplary transmission from
camera 102 tovideo analysis system 122 may comprise a plurality of computing files. For an example video clip “1495”, the following computing files may be transmitted: -
V_123456789_1_1495_1_92086ce4a84af783b1a2_2379.ts V_123456789_1_1495_2_92086ce4a84af783b1a2_2379.metadata V_123456789_1_1495_2_92086ce4a84af783b1a2_2379.ts V_123456789_1_1495_3_92086ce4a84af783b1a2_2379.last V_123456789_1_1495_3_92086ce4a84af783b1a2_2379.metadata V_123456789_1_1495_3_92086ce4a84af783b1a2_2379.ts
In this example, “.ts” files are video recordings and “.metadata” files are the respective metadata files associated with the recorded “.ts” file. -
FIG. 6 is a process flow diagram showing amethod 600 for automatic facial recognition via a camera system, within the environment described with reference toFIG. 1 . In some embodiments, the operations may be combined, performed in parallel, or performed in a different order. Themethod 600 may also include additional or fewer operations than those illustrated. Themethod 600 may be performed by processing logic that may comprise hardware (e.g., decision making logic, dedicated logic, programmable logic, and microcode), hardware accelerator, software (such as firmware or other software run on a special-purpose computer system or general purpose computer system), or any combination of the above. - Various operations of
method 600 may be performed byvideo analysis system 122, system forfacial recognition 124, or a combination of both systems. - The
method 600 may commence atoperation 602 withvideo analysis system 122 receiving recorded video and/or audio, along with corresponding metadata fromcamera 102. As discussed herein, the recorded video may be a video clip of any duration. In preferred embodiments, the recorded video clip is between 5-30 seconds, preferably about 20 seconds in length. The recorded video and corresponding metadata may be received fromcamera 102 via any wireless or wired communication mechanism. - At
operation 604, thevideo analysis system 122 examines the metadata received to determine which video frames of the recorded video clip have detected human faces. The select video frames are then extracted from the video clip for further analysis by the system forfacial recognition 124. - At
operation 606, system forfacial recognition 124 processes the extracted video frames from the received video clip. The processing may optionally entail verifying whether the detected human face by thecamera 102 is indeed a human face. Further, the processing may entail identifying the detected human face in the select video frames. This process is also referred to herein as recognizing the human face. - In various embodiments, the human face is compared to known human faces stored in a data structure such as a database to determine a match with a previously identified human face. The previously identified human faces may be identified by one or more machine learning algorithms, and/or by human input. In various embodiments, a threshold confidence level may be preset for the facial recognition, such that false positives are preferable to false negatives, or vice versa.
- The system for
facial recognition 124 may then generate a result withface recognition information 116. Facerecognition information 116 may comprise at least one of a first name associated with the human face, a last name associated with the human face, a relationship of the face recognized withuser 120, a location where the face was detected withinphysical space 108, or any other factor. In other embodiments, facerecognition information 116 may comprise a likely identity of the human, and a request that theuser 120 verify the likely identity. - In other embodiments, face
recognition information 116 may comprise a result that the human face detected is not recognized by system forfacial recognition 124. In such scenario, the user may be asked if she recognizes the detected human face. If so,user 120 may input identity information for the detected face, which is stored by the system forfacial recognition 124 in its database of known identified faces. In some embodiments,user 120 may not recognize the detected face, and system forfacial recognition 124 stores the detected face as an unknown person. - The
face recognition information 116 result is transmitted touser 120 atoperation 608, and displayed on auser device 118 atoperation 610. Theface recognition information 116 may be partially or fully displayed on a user interface ofuser device 118 via a pop-up notification, text message, e-mail message, via a web browser operating onuser device 118, or via a dedicated software application operating onuser device 118. - In this way, a human face can be automatically detected and recognized by a camera system quickly and efficiently, without needed significant user input.
-
FIG. 7 depicts anexemplary screenshot 700 of a video frame from capturedvideo 112. In an exemplary embodiment, atleast face 702 may be detected bycamera 102. System forfacial recognition 124 may compareface 702 to previously identified faces (familiar faces), and to previously unidentified faces (unfamiliar faces), as depicted in theexemplary screenshot 800 ofFIG. 8 . Inexemplary screenshot 800,face 702 is categorized as an “unfamiliar face” by system forfacial recognition 124. This information may be transmitted and presented on a graphical user interface onuser device 118. -
FIG. 9 depicts anexemplary screenshot 900 that may be further displayed on a graphical user interface onuser device 118. Inexemplary screenshot 900, a user has input that face 702, which was categorized as an “unfamiliar” or unrecognized face by the system forfacial recognition 124, is actually a known person to the user with the name “Kathleen”. System forfacial recognition 124 detects that a previously identified face with that name exists, and prompts the user as to whetherface 702 should be combined with the previously recognized face under the identifier “Kathleen”. In this way, system forfacial recognition 124 has learned that these different views are of the same person, and will recognizeface 702 accordingly in future video frames. -
FIG. 10 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of acomputer system 1000, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.Computer system 1000 may be implemented withincamera 102,video analysis system 122, and/or system forfacial recognition 124. - In various exemplary embodiments, the machine operates as a standalone device or can be connected (e.g., networked) to other machines. In a networked deployment, the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine can be a PC, a tablet PC, a set-top box, a cellular telephone, a digital camera, a portable music player (e.g., a portable hard drive audio device, such as an Moving Picture Experts
Group Audio Layer 3 player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. - The
example computer system 1000 includes a processor ormultiple processors 1002, ahard disk drive 1004, amain memory 1006, and astatic memory 1008, which communicate with each other via abus 1010. Thecomputer system 1000 may also include anetwork interface device 1012. Thehard disk drive 1004 may include a computer-readable medium 1020, which stores one or more sets ofinstructions 1022 embodying or utilized by any one or more of the methodologies or functions described herein. Theinstructions 1022 can also reside, completely or at least partially, within themain memory 1006 and/or within theprocessors 1002 during execution thereof by thecomputer system 1000. Themain memory 1006 and theprocessors 1002 also constitute machine-readable media. - While the computer-
readable medium 1020 is shown in an exemplary embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions. The term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, Random Access Memory (RAM), Read-Only Memory (ROM), and the like. - The exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware. The computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems.
- In some embodiments, the
computer system 1000 may be implemented as a cloud-based computing environment, such as a virtual machine operating within a computing cloud. In other embodiments, thecomputer system 1000 may itself include a cloud-based computing environment, where the functionalities of thecomputer system 1000 are executed in a distributed fashion. Thus, thecomputer system 1000, when configured as a computing cloud, may include pluralities of computing devices in various forms, as will be described in greater detail below. - In general, a cloud-based computing environment is a resource that typically combines the computational power of a large grouping of processors (such as within web servers) and/or that combines the storage capacity of a large grouping of computer memories or storage devices. Systems that provide cloud-based resources may be utilized exclusively by their owners, or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
- The cloud may be formed, for example, by a network of web servers that comprise a plurality of computing devices, such as a client device, with each server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource consumers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depends on the type of business associated with the user.
- It is noteworthy that any hardware platform suitable for performing the processing described herein is suitable for use with the technology. The terms “computer-readable storage medium” and “computer-readable storage media” as used herein refer to any medium or media that participate in providing instructions to a CPU for execution. Such media can take many forms, including, but not limited to, non-volatile media, volatile media and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk. Volatile media include dynamic memory, such as system RAM. Transmission media include coaxial cables, copper wire, and fiber optics, among others, including the wires that comprise one embodiment of a bus. Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM disk, digital video disk, any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a Programmable Read-Only Memory, an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory, a FlashEPROM, any other memory chip or data exchange adapter, a carrier wave, or any other medium from which a computer can read.
- Thus, computer-implemented methods and systems for automatically detecting and recognizing human faces via a camera system are described herein. As would be understood by persons of ordinary skill in the art, while the present disclosure describes the detection and recognition of human faces, the present disclosure may similar be utilized for the detection and recognition of other objects. For example, the present disclosure may be utilized for the detection and recognition of other body parts of a human, for the detection and recognition of animals, and/or for the detection and recognition of other inanimate objects.
- Although embodiments have been described herein with reference to specific exemplary embodiments, it will be evident that various modifications and changes can be made to these exemplary embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
Claims (20)
1. A system for facial detection and recognition via a camera system, the system comprising:
a camera comprising:
a lens;
at least one sensor;
a microphone;
a first processor and a first memory, configured to:
detect a triggering event in a physical space around the camera;
record video for a predetermined time period, in response to detecting the triggering event;
execute firmware on the camera to detect that at least one human face is present in at least one video frame of the recorded video;
update a metadata file associated with the recorded video with information regarding the at least one video frame that has the detected at least one human face; and
transmit the recorded video and the updated metadata file to a video analysis system in communication with the camera; and
the video analysis system comprising:
a second processor and second memory, the second processor configured to:
receive the transmitted recorded video and updated metadata file from the camera;
extract from the recorded video the at least one video frame with the detected at least one human face, based on information in the updated metadata file;
perform facial recognition analysis on the extracted at least one video frame to identify the detected at least one human face;
transmit and present a result of the identified detected at least one human face to a user interface on a user computing device in communication with the video analysis system.
2. The camera system of claim 1 , wherein the recorded video further comprises audio.
3. The camera system of claim 1 , wherein the detected triggering event is a loud noise in the physical space around the camera that is detected by the microphone.
4. The camera system of claim 1 , wherein the detected triggering event is a movement in the physical space around the camera that is detected by the at least one sensor.
5. The camera system of claim 1 , wherein the predetermined time period for recording video in response to detecting the triggering event is less than one minute.
6. The camera system of claim 1 , wherein the detecting that at least one human face is present in at least one video frame of the recorded video occurs substantially simultaneously with recording the video in response to detecting the triggering event.
7. The camera system of claim 1 , wherein the first processor is a specialized video processor.
8. The camera system of claim 1 , wherein the video analysis system further comprises a database communicatively coupled to the second processor, the database storing previously identified human faces and associated identify information.
9. The camera system of claim 1 , wherein the result of the facial recognition analysis is a name of the detected at least one human face.
10. The camera system of claim 1 , wherein the result of the facial recognition analysis is an unknown identity of the detected at least one human face.
11. A method for facial detection and recognition via a camera system, the method comprising:
detecting, by a camera, a triggering event in a physical space around the camera;
recording video for a predetermined time period by the camera, in response to detecting the triggering event;
executing firmware on the camera to detect that at least one human face is present in at least one video frame of the recorded video;
updating, by the camera, a metadata file associated with the recorded video with information regarding the at least one video frame that has the detected at least one human face;
transmitting the recorded video and the updated metadata file, by the camera to a video analysis system in communication with the camera;
receiving the recorded video and updated metadata file by the video analysis system;
extracting the at least one video frame with the detected at least one human face from the recorded video, based on information in the updated metadata file;
performing facial recognition analysis on the extracted at least one video frame, to identify the detected at least one human face; and
transmit and present a result of the identified detected at least one human face to a user interface on a user computing device in communication with the video analysis system.
12. The method of claim 11 , wherein the detected triggering event is a loud noise in the physical space around the camera that is detected by the microphone.
13. The method of claim 11 , wherein the detected triggering event is a movement in the physical space around the camera that is detected by the at least one sensor.
14. The method of claim 11 , wherein the predetermined time period for recording video in response to detecting the triggering event is less than one minute.
15. The method of claim 11 , wherein the detecting that at least one human face is present in at least one video frame of the recorded video occurs substantially simultaneously with recording the video in response to detecting the triggering event.
16. The method of claim 11 , wherein the first processor is a specialized video processor.
17. The method of claim 11 , wherein the video analysis system further comprises a database communicatively coupled to the second processor, the database storing previously identified human faces and associated identify information.
18. The method of claim 11 , wherein the result of the facial recognition analysis is a name of the detected at least one human face.
19. The method of claim 11 , wherein the result of the facial recognition analysis is an unknown identity of the detected at least one human face.
20. A system for object detection and recognition via a camera system, the system comprising:
a camera comprising:
a lens;
at least one sensor;
a microphone;
a first processor and a first memory, configured to:
detect a triggering event in a physical space around the camera;
record video for a predetermined time period, in response to detecting the triggering event;
execute firmware on the camera to detect that at least one known object is present in at least one video frame of the recorded video;
update a metadata file associated with the recorded video with information regarding the at least one video frame that has the detected at least one known object; and
transmit the recorded video and the updated metadata file to a video analysis system in communication with the camera; and
the video analysis system comprising:
a second processor and second memory, the second processor configured to:
receive the transmitted recorded video and updated metadata file from the camera;
extract from the recorded video the at least one video frame with the detected at least one object, based on information in the updated metadata file;
perform further recognition analysis on the extracted at least one video frame to identify the detected at least one known object;
transmit and present a result of the identified detected at least one known object to a user interface on a user computing device in communication with the video analysis system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/163,521 US20190138795A1 (en) | 2017-11-07 | 2018-10-17 | Automatic Object Detection and Recognition via a Camera System |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762582919P | 2017-11-07 | 2017-11-07 | |
US201762583875P | 2017-11-09 | 2017-11-09 | |
US201762585686P | 2017-11-14 | 2017-11-14 | |
US16/163,521 US20190138795A1 (en) | 2017-11-07 | 2018-10-17 | Automatic Object Detection and Recognition via a Camera System |
Publications (1)
Publication Number | Publication Date |
---|---|
US20190138795A1 true US20190138795A1 (en) | 2019-05-09 |
Family
ID=66327325
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/163,521 Abandoned US20190138795A1 (en) | 2017-11-07 | 2018-10-17 | Automatic Object Detection and Recognition via a Camera System |
US16/175,726 Active US10872231B2 (en) | 2017-11-07 | 2018-10-30 | Systems and methods of activity based recording for camera applications |
US16/182,483 Active US10929650B2 (en) | 2017-11-07 | 2018-11-06 | Activity based video recording |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/175,726 Active US10872231B2 (en) | 2017-11-07 | 2018-10-30 | Systems and methods of activity based recording for camera applications |
US16/182,483 Active US10929650B2 (en) | 2017-11-07 | 2018-11-06 | Activity based video recording |
Country Status (1)
Country | Link |
---|---|
US (3) | US20190138795A1 (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190272724A1 (en) * | 2018-03-05 | 2019-09-05 | Google Llc | Baby monitoring with intelligent audio cueing based on an analyzed video stream |
US20190278976A1 (en) * | 2018-03-11 | 2019-09-12 | Krishna Khadloya | Security system with face recognition |
US10579910B2 (en) * | 2016-05-06 | 2020-03-03 | Microsoft Technology Licensing, Llc | Dynamic classifier selection based on class skew |
US10872231B2 (en) | 2017-11-07 | 2020-12-22 | Ooma, Inc. | Systems and methods of activity based recording for camera applications |
CN112218024A (en) * | 2020-09-17 | 2021-01-12 | 浙江大华技术股份有限公司 | Courseware video generation and channel combination information determination method and device |
US20210133623A1 (en) * | 2019-11-04 | 2021-05-06 | International Business Machines Corporation | Self-supervised object detector training using raw and unlabeled videos |
US11138344B2 (en) | 2019-07-03 | 2021-10-05 | Ooma, Inc. | Securing access to user data stored in a cloud computing environment |
US11138845B2 (en) * | 2018-10-01 | 2021-10-05 | Digital Barriers Services Ltd | Video surveillance and object recognition |
WO2021230961A1 (en) * | 2020-05-14 | 2021-11-18 | Google Llc | Event length dependent cool-off for camera event based recordings |
WO2021257881A1 (en) * | 2020-06-17 | 2021-12-23 | Hewlett-Packard Development Company, L.P. | Image frames with unregistered users obfuscated |
EP3930317A1 (en) * | 2020-06-25 | 2021-12-29 | Yokogawa Electric Corporation | Apparatus, method, and program |
US11329891B2 (en) | 2019-08-08 | 2022-05-10 | Vonage Business Inc. | Methods and apparatus for managing telecommunication system devices |
CN114885117A (en) * | 2022-03-21 | 2022-08-09 | 安克创新科技股份有限公司 | Video composition method for image pickup apparatus, and storage medium |
US11450151B2 (en) * | 2019-07-18 | 2022-09-20 | Capital One Services, Llc | Detecting attempts to defeat facial recognition |
US11997376B2 (en) | 2021-03-17 | 2024-05-28 | Samsung Electronics Co., Ltd. | Image sensor and operating method of the image sensor |
Families Citing this family (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11356349B2 (en) | 2020-07-17 | 2022-06-07 | At&T Intellectual Property I, L.P. | Adaptive resource allocation to facilitate device mobility and management of uncertainty in communications |
US11232694B2 (en) * | 2018-03-14 | 2022-01-25 | Safely You Inc. | System and method for detecting, recording and communicating events in the care and treatment of cognitively impaired persons |
CN108566634B (en) * | 2018-03-30 | 2021-06-25 | 深圳市冠旭电子股份有限公司 | Method, device and bluetooth speaker for reducing continuous wake-up delay of bluetooth speaker |
US11527265B2 (en) | 2018-11-02 | 2022-12-13 | BriefCam Ltd. | Method and system for automatic object-aware video or audio redaction |
JPWO2020110659A1 (en) * | 2018-11-27 | 2021-10-14 | ソニーグループ株式会社 | Information processing equipment, information processing methods, and programs |
US10957186B2 (en) * | 2018-11-29 | 2021-03-23 | Omnivision Technologies, Inc. | Reducing false alarms in surveillance systems |
US10885342B1 (en) * | 2018-12-03 | 2021-01-05 | Ambarella International Lp | Intelligent monitoring camera using computer vision and intelligent personal audio assistant capabilities to maintain privacy |
CN110121060A (en) * | 2019-05-27 | 2019-08-13 | 中国电子科技网络信息安全有限公司 | A kind of intelligence enhancing device and method for IP Camera |
GB2585919B (en) * | 2019-07-24 | 2022-09-14 | Calipsa Ltd | Method and system for reviewing and analysing video alarms |
CN110881141B (en) * | 2019-11-19 | 2022-10-18 | 浙江大华技术股份有限公司 | Video display method and device, storage medium and electronic device |
CN111325139B (en) * | 2020-02-18 | 2023-08-04 | 浙江大华技术股份有限公司 | Lip language identification method and device |
US11368991B2 (en) | 2020-06-16 | 2022-06-21 | At&T Intellectual Property I, L.P. | Facilitation of prioritization of accessibility of media |
US11233979B2 (en) | 2020-06-18 | 2022-01-25 | At&T Intellectual Property I, L.P. | Facilitation of collaborative monitoring of an event |
US11411757B2 (en) | 2020-06-26 | 2022-08-09 | At&T Intellectual Property I, L.P. | Facilitation of predictive assisted access to content |
US11184517B1 (en) | 2020-06-26 | 2021-11-23 | At&T Intellectual Property I, L.P. | Facilitation of collaborative camera field of view mapping |
CN113938598A (en) | 2020-07-14 | 2022-01-14 | 浙江宇视科技有限公司 | Surveillance camera wake-up method, device, device and medium |
US11768082B2 (en) | 2020-07-20 | 2023-09-26 | At&T Intellectual Property I, L.P. | Facilitation of predictive simulation of planned environment |
US20220124287A1 (en) * | 2020-10-21 | 2022-04-21 | Michael Preston | Cloud-Based Vehicle Surveillance System |
CN112101311A (en) * | 2020-11-16 | 2020-12-18 | 深圳壹账通智能科技有限公司 | Double-recording quality inspection method and device based on artificial intelligence, computer equipment and medium |
US11455873B2 (en) * | 2021-01-14 | 2022-09-27 | Google Llc | Buffered video recording for video cameras |
WO2022164887A1 (en) * | 2021-01-28 | 2022-08-04 | Gyrus Acmi, Inc. D/B/A Olympus Surgical Technologies America | Environment capture management techniques |
US11856318B2 (en) * | 2021-04-27 | 2023-12-26 | Maiden Ai, Inc. | Methods and systems to automatically record relevant action in a gaming environment |
US20220385886A1 (en) * | 2021-05-27 | 2022-12-01 | Panasonic Autonmotive Systems Company of America, Divsion of Panasonic Corporation of North A merica | Methods and apparatus for monitoring wifi cameras |
WO2023049868A1 (en) | 2021-09-24 | 2023-03-30 | Maiden Ai, Inc. | Methods and systems to track a moving sports object trajectory in 3d using multiple cameras |
WO2025128641A1 (en) * | 2023-12-13 | 2025-06-19 | Ademco Inc. | Systems and methods for security camera control |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080319604A1 (en) * | 2007-06-22 | 2008-12-25 | Todd Follmer | System and Method for Naming, Filtering, and Recall of Remotely Monitored Event Data |
US20100287053A1 (en) * | 2007-12-31 | 2010-11-11 | Ray Ganong | Method, system, and computer program for identification and sharing of digital images with face signatures |
US20160253883A1 (en) * | 2015-02-27 | 2016-09-01 | Sensormatic Electronics, LLC | System and Method for Distributed Video Analysis |
US10372995B2 (en) * | 2016-01-12 | 2019-08-06 | Shanghai Xiaoyi Technology Co., Ltd. | System and method for previewing video |
Family Cites Families (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AUPQ217399A0 (en) * | 1999-08-12 | 1999-09-02 | Honeywell Limited | Realtime digital video server |
US6831556B1 (en) * | 2001-05-16 | 2004-12-14 | Digital Safety Technologies, Inc. | Composite mobile digital information system |
CN100459684C (en) | 2002-08-27 | 2009-02-04 | 索尼株式会社 | Data processing unit and method, and program |
WO2006090359A2 (en) * | 2005-02-24 | 2006-08-31 | Mteye Security Ltd. | Device, system, and method of reduced-power imaging |
US7843491B2 (en) * | 2005-04-05 | 2010-11-30 | 3Vr Security, Inc. | Monitoring and presenting video surveillance data |
US7129872B1 (en) | 2005-05-25 | 2006-10-31 | Audio Note Uk Ltd. | Audio signal analog-to-digital converter utilizing a transformed-based input circuit |
US7768548B2 (en) * | 2005-08-12 | 2010-08-03 | William Bradford Silvernail | Mobile digital video recording system |
US20070185989A1 (en) | 2006-02-07 | 2007-08-09 | Thomas Grant Corbett | Integrated video surveillance system and associated method of use |
US10847184B2 (en) * | 2007-03-07 | 2020-11-24 | Knapp Investment Company Limited | Method and apparatus for initiating a live video stream transmission |
US8749343B2 (en) | 2007-03-14 | 2014-06-10 | Seth Cirker | Selectively enabled threat based information system |
TW200847787A (en) | 2007-05-29 | 2008-12-01 | Appro Technology Inc | Application method and device by sensing infrared and sound |
US9386281B2 (en) * | 2009-10-02 | 2016-07-05 | Alarm.Com Incorporated | Image surveillance and reporting technology |
WO2014144628A2 (en) * | 2013-03-15 | 2014-09-18 | Master Lock Company | Cameras and networked security systems and methods |
WO2014168833A1 (en) * | 2013-04-08 | 2014-10-16 | Shafron Thomas | Camera assembly, system, and method for intelligent video capture and streaming |
ES2730404T3 (en) * | 2014-04-04 | 2019-11-11 | Red Com Llc | Camcorder with capture modes |
US9811748B2 (en) * | 2014-06-09 | 2017-11-07 | Verizon Patent And Licensing Inc. | Adaptive camera setting modification based on analytics data |
KR102150703B1 (en) * | 2014-08-14 | 2020-09-01 | 한화테크윈 주식회사 | Intelligent video analysing system and video analysing method therein |
EP3070695B1 (en) * | 2015-03-16 | 2017-06-14 | Axis AB | Method and system for generating an event video sequence, and camera comprising such system |
CN107925823A (en) | 2015-07-12 | 2018-04-17 | 怀斯迪斯匹有限公司 | Ultra low power ultra-low noise microphone |
US10115029B1 (en) * | 2015-10-13 | 2018-10-30 | Ambarella, Inc. | Automobile video camera for the detection of children, people or pets left in a vehicle |
US10134422B2 (en) | 2015-12-01 | 2018-11-20 | Qualcomm Incorporated | Determining audio event based on location information |
US9906722B1 (en) * | 2016-04-07 | 2018-02-27 | Ambarella, Inc. | Power-saving battery-operated camera |
US20170339343A1 (en) * | 2016-05-17 | 2017-11-23 | Tijee Corporation | Multi-functional camera |
CN106060391B (en) * | 2016-06-27 | 2020-02-21 | 联想(北京)有限公司 | A method and device for processing a working mode of a camera, and an electronic device |
US10475311B2 (en) | 2017-03-20 | 2019-11-12 | Amazon Technologies, Inc. | Dynamic assessment using an audio/video recording and communication device |
US20190138795A1 (en) | 2017-11-07 | 2019-05-09 | Ooma, Inc. | Automatic Object Detection and Recognition via a Camera System |
US20200388139A1 (en) | 2019-06-07 | 2020-12-10 | Ooma, Inc. | Continuous detection and recognition for threat determination via a camera system |
-
2018
- 2018-10-17 US US16/163,521 patent/US20190138795A1/en not_active Abandoned
- 2018-10-30 US US16/175,726 patent/US10872231B2/en active Active
- 2018-11-06 US US16/182,483 patent/US10929650B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080319604A1 (en) * | 2007-06-22 | 2008-12-25 | Todd Follmer | System and Method for Naming, Filtering, and Recall of Remotely Monitored Event Data |
US20100287053A1 (en) * | 2007-12-31 | 2010-11-11 | Ray Ganong | Method, system, and computer program for identification and sharing of digital images with face signatures |
US20160253883A1 (en) * | 2015-02-27 | 2016-09-01 | Sensormatic Electronics, LLC | System and Method for Distributed Video Analysis |
US10372995B2 (en) * | 2016-01-12 | 2019-08-06 | Shanghai Xiaoyi Technology Co., Ltd. | System and method for previewing video |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10579910B2 (en) * | 2016-05-06 | 2020-03-03 | Microsoft Technology Licensing, Llc | Dynamic classifier selection based on class skew |
US10872231B2 (en) | 2017-11-07 | 2020-12-22 | Ooma, Inc. | Systems and methods of activity based recording for camera applications |
US10929650B2 (en) | 2017-11-07 | 2021-02-23 | Ooma, Inc. | Activity based video recording |
US10593184B2 (en) * | 2018-03-05 | 2020-03-17 | Google Llc | Baby monitoring with intelligent audio cueing based on an analyzed video stream |
US20190272724A1 (en) * | 2018-03-05 | 2019-09-05 | Google Llc | Baby monitoring with intelligent audio cueing based on an analyzed video stream |
US11735018B2 (en) * | 2018-03-11 | 2023-08-22 | Intellivision Technologies Corp. | Security system with face recognition |
US20190278976A1 (en) * | 2018-03-11 | 2019-09-12 | Krishna Khadloya | Security system with face recognition |
US11138845B2 (en) * | 2018-10-01 | 2021-10-05 | Digital Barriers Services Ltd | Video surveillance and object recognition |
US11568723B2 (en) | 2018-10-01 | 2023-01-31 | Digital Barriers Services Ltd | Video surveillance and object recognition |
US11138344B2 (en) | 2019-07-03 | 2021-10-05 | Ooma, Inc. | Securing access to user data stored in a cloud computing environment |
US11450151B2 (en) * | 2019-07-18 | 2022-09-20 | Capital One Services, Llc | Detecting attempts to defeat facial recognition |
US11329891B2 (en) | 2019-08-08 | 2022-05-10 | Vonage Business Inc. | Methods and apparatus for managing telecommunication system devices |
US20210133623A1 (en) * | 2019-11-04 | 2021-05-06 | International Business Machines Corporation | Self-supervised object detector training using raw and unlabeled videos |
US11636385B2 (en) * | 2019-11-04 | 2023-04-25 | International Business Machines Corporation | Training an object detector using raw and unlabeled videos and extracted speech |
WO2021230961A1 (en) * | 2020-05-14 | 2021-11-18 | Google Llc | Event length dependent cool-off for camera event based recordings |
US12058437B2 (en) | 2020-05-14 | 2024-08-06 | Google Llc | Event length dependent cool-off for camera event based recordings |
US12356068B2 (en) | 2020-05-14 | 2025-07-08 | Google Llc | Event length dependent cool-off for camera event based recordings |
WO2021257881A1 (en) * | 2020-06-17 | 2021-12-23 | Hewlett-Packard Development Company, L.P. | Image frames with unregistered users obfuscated |
EP3930317A1 (en) * | 2020-06-25 | 2021-12-29 | Yokogawa Electric Corporation | Apparatus, method, and program |
US11721185B2 (en) | 2020-06-25 | 2023-08-08 | Yokogawa Electric Corporation | Apparatus, method and storage medium for detecting state change and capturing image |
CN112218024A (en) * | 2020-09-17 | 2021-01-12 | 浙江大华技术股份有限公司 | Courseware video generation and channel combination information determination method and device |
US11997376B2 (en) | 2021-03-17 | 2024-05-28 | Samsung Electronics Co., Ltd. | Image sensor and operating method of the image sensor |
CN114885117A (en) * | 2022-03-21 | 2022-08-09 | 安克创新科技股份有限公司 | Video composition method for image pickup apparatus, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
US10929650B2 (en) | 2021-02-23 |
US20190141298A1 (en) | 2019-05-09 |
US10872231B2 (en) | 2020-12-22 |
US20190141297A1 (en) | 2019-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190138795A1 (en) | Automatic Object Detection and Recognition via a Camera System | |
US20200388139A1 (en) | Continuous detection and recognition for threat determination via a camera system | |
US20230209017A1 (en) | Methods and Systems for Person Detection in a Video Feed | |
US20220245396A1 (en) | Systems and Methods of Person Recognition in Video Streams | |
US10555393B1 (en) | Face recognition systems with external stimulus | |
US10599950B2 (en) | Systems and methods for person recognition data management | |
US11138344B2 (en) | Securing access to user data stored in a cloud computing environment | |
US10192415B2 (en) | Methods and systems for providing intelligent alerts for events | |
US10957171B2 (en) | Methods and systems for providing event alerts | |
US8171129B2 (en) | Smart endpoint and smart monitoring system having the same | |
WO2017049612A1 (en) | Smart tracking video recorder | |
KR102237086B1 (en) | Apparatus and method for controlling a lobby phone that enables video surveillance through a communication terminal that can use a 5G mobile communication network based on facial recognition technology | |
KR102390405B1 (en) | Doorbell | |
US10212778B1 (en) | Face recognition systems with external stimulus | |
US20240265731A1 (en) | Systems and Methods for On-Device Person Recognition and Provision of Intelligent Alerts | |
CN114244644B (en) | Control method and device for intelligent home, storage medium and electronic device | |
US20210360201A1 (en) | Methods, systems, apparatuses, and devices for facilitating monitoring of an environment | |
US20180167585A1 (en) | Networked Camera | |
CN108597164B (en) | Anti-theft method, anti-theft device, anti-theft terminal and computer readable medium | |
CN108401247B (en) | Method for controlling Bluetooth device, electronic device and storage medium | |
CN115103157A (en) | Video analysis method and device based on edge cloud cooperation, electronic equipment and medium | |
CN107131607A (en) | Monitoring method, device and system based on air conditioner and air conditioner | |
CN112491669A (en) | Data processing method, device and system | |
CN113158842B (en) | Identification method, system, device and medium | |
CN206656471U (en) | Air conditioner and monitoring system based on air conditioner |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: OOMA, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VAIDYA, GOVIND;REEL/FRAME:047237/0990 Effective date: 20181011 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |