US20190138795A1

US20190138795A1 - Automatic Object Detection and Recognition via a Camera System

Info

Publication number: US20190138795A1
Application number: US16/163,521
Authority: US
Inventors: Govind Vaidya
Original assignee: Ooma Inc
Current assignee: Ooma Inc
Priority date: 2017-11-07
Filing date: 2018-10-17
Publication date: 2019-05-09
Also published as: US10929650B2; US20190141298A1; US10872231B2; US20190141297A1

Abstract

Systems and methods for automatically detecting and recognizing human faces by a smart camera system are described herein. A camera located in a physical space may detect a triggering event and begin recording video and/or audio for a predetermined time period. The camera also processes the video to determine if any human faces are present. If so, the camera updates the metadata file associated with the recorded video with information identifying specific video frames that contain a human face. The recorded video and metadata file are transmitted to video analysis software, which performs further facial analysis on the selected video frames, in an attempt to identify the person(s) detected in the video. Results of the facial recognition process are presented on a user computing device.

Description

CROSS-REFERENCE TO RELATED APPLICATION

The present utility patent application is related to, and claims the priority benefit under 35 U.S.C. 119(e) of: U.S. Provisional Application No. 62/585,686 filed on Nov. 14, 2017 and entitled “Unique Method to Detect Faces in Videos and Process Selective Frames in Recorded Videos to Recognize and Analyze Faces for Camera Applications”; U.S. Provisional Application No. 62/582,919 filed on Nov. 7, 2017 and entitled “Activity Based Recording (ABR) for Camera Applications”, and U.S. Provisional Application No. 62/583,875 filed on Nov. 9, 2017 and entitled “Sound Detection Sensing Logic for Camera Applications”. The disclosure of all of the above-referenced applications is incorporated herein by reference for all purposes to the extent that such subject matter is not inconsistent herewith or limiting hereof.

TECHNICAL FIELD

The present disclosure relates to object detection and recognition via camera hardware and software.

BACKGROUND

Security cameras have been widely deployed in physical spaces for many years. Typically, cameras record video of a space the entire time they are on, which is twenty four hours a day, for seven days a week. This results in lots of video data to be stored. Additionally, a person needs to view many hours of video where nothing of significance is happening, simply to determine if anything significant even occurred in the space or not.
Activity based recording cameras are gradually being introduced into the market, which record video only if a specific activity or event is detected in the physical space around the camera. For consumer use, such cameras are typically sold for $100-$300. However, these cameras still require a human to review recorded video after the fact to determine if an unusual event happened that the owner of the space should be concerned about, or if the activity recorded by the camera is one that is routine and not concerning. Thus, this can still be burdensome on the user of the camera.
To overcome these challenges, a camera system is needed that is easy to use for the average consumer, and can automatically detect and recognize an object present in a physical space around the camera, without the need for burdensome user intervention.

SUMMARY

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Provided are processor-implemented systems and methods for automatically detecting and recognizing objects, such as human faces, via a smart camera system. In exemplary embodiments, when a camera located in a physical space detects the occurrence of a triggering event (such as motion or sound activity), the camera begins recording video for a predetermined time period. Substantially simultaneously, the camera firmware processes video frames to detect whether a human face is present in any of the video frames of the recorded video.
If at least one human face is detected in at least one video frame of the recorded video, a metadata file associated with the recorded video is updated with information regarding the specific video frames in the recorded video where human faces are present.
Further facial analysis is then conducted on the selective video frames to identify the human faces detected as being familiar faces or unfamiliar faces. Results of the further facial analysis are presented to a user via a user computing device.
Other example embodiments of the disclosure and aspects will become apparent from the following description taken in conjunction with the following drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.

FIG. 1 illustrates an environment within which systems and methods for automatic facial detection and recognition via a camera system can be implemented, according to an example embodiment.

FIG. 2 illustrates an exemplary camera that can be used for automatic facial detection and recognition via the disclosed camera system.

FIG. 3 is a block diagram showing various modules of a video analysis system for processing captured video, in accordance with certain embodiments.

FIG. 4 is a block diagram showing various modules of a system for facial recognition, in accordance with certain embodiments.

FIG. 5 is a process flow diagram showing a method for automatic facial detection via a camera system, according to an example embodiment.

FIG. 6 is a process flow diagram showing a method for automatic facial recognition via a camera system, according to an example embodiment.

FIG. 7 depicts an exemplary screenshot of a video frame from recorded video, according to an example embodiment.

FIG. 8 and FIG. 9 depict an exemplary screenshot of a user interface provided on a user computing device.

FIG. 10 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.

DETAILED DESCRIPTION

The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with exemplary embodiments. These exemplary embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other embodiments can be utilized, or structural, logical, and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
The disclosure relates to a system and methods for facilitating automatic facial recognition and detection in camera systems. More specifically, the system allows a consumer-friendly camera to be deployed in a physical space, such as a house, office building, or other room. The camera, through use of specially designed hardware and specially programmed software, can record short video clips when a triggering event is detected by one or more sensors on the camera, and/or by a microphone on the camera. Also referred to as activity based recording, this method provides a smart and automated way to enable and disable a camera recording, so that only potentially significant events are detected and recorded, thus obviating the need for many hours of recorded video of an empty room with no activity occurring.
Further, pre-processing software operating on the camera itself can mark specific video frames that contain a human face. Recorded video folders have additional information on frame(s) containing human faces so that further analysis of these frames can be conducted. That is, while recording the video on the camera itself, individual frames are processed simultaneously in camera firmware and a metadata file is generated and updated with information regarding the specific video frame(s) in which a human face was detected. The recorded video along with the metadata information is transmitted to facial recognition software for further analysis. The facial recognition software may operate via a processor separate from the camera itself, such as on a separate connected server in a cloud or elsewhere.
A typical facial recognition method processes an entire video clip to detect and recognize faces, which is time consuming and significantly increases the compute power required. In embodiments of the present disclosure, video analysis software processes the recorded video, extracts the frames that have been previously marked by the camera as containing a human face, and applies facial recognition algorithm(s) to those selected video frames. That is, instead of performing facial recognition analysis on the entire video clip, the video analysis software processes the metadata file from the camera and carries out further facial analysis on the selective frames identified in the metadata file. This unique method significantly reduces the compute time and storage resources required. The results of the further facial analysis may constitute facial recognition information, which may then be transmitted to a user of the camera system via a user interface on a software application operating on a user device (also referred to herein sometimes as a user computing device).
By bifurcating the facial detection and facial recognition processes, the camera itself can be manufactured at a consumer-friendly price point, and deployed by a consumer quickly and easily in a physical space. Further, by pre-processing video clips in the camera firmware itself, only selected video frames need to be analyzed by the software algorithm for recognizing the detected faces. This significantly reduces the computing burden and time for the facial recognition process, as well as significantly reducing the computing storage resources necessary. Further, this allows the facial recognition analysis to occur quickly, in substantially real-time. Thus, if an unrecognized face is detected in the video clip, the user can be alerted quickly and take appropriate action in a timely manner.
In other embodiments, the camera can identify specific video frames that contain a known object other than a human. For example, a camera can identify video frames containing another body part of a human, a body part of an animal, and/or an inanimate object.
FIG. 1 illustrates an environment 100 within which systems and methods for automatic facial detection and recognition via a camera system can be implemented, in accordance with some embodiments. The environment 100 may include a camera 102 containing a camera lens 104, and camera sensor(s) 106. The camera 102 may be deployed in a physical space 108, such as a house. Though not explicitly shown in exemplary FIG. 1, camera 102 also has one or more additional components in various embodiments, that enable its operation for the purposes of the present disclosure.
The captured video 112 from the camera 102 may be transmitted via a network 110 to a cloud video analysis system 122, which may include a system for facial recognition 124. The cloud video analysis system 122 may further utilize a database 114 and one or more computing processors and volatile and non-volatile memory.
After processing captured video 112, the system for facial recognition 124 may generate facial recognition information 116, which is transmitted through network 110 to an application operating on a user device 118, which in turn can be viewed by a user 120. Each of these components is discussed in further detail below.
A camera 102 may be deployed in any physical space 108 to record audio and/or video around the physical space 108. While physical space 108 is depicted in exemplary FIG. 1 as a house, a person of ordinary skill in the art will understand that camera 102 may be deployed in any physical space, such as an office building, or any other space. Further, while only one camera 102 is depicted in FIG. 1 for simplicity, there can be any number of cameras in physical space 108. If multiple cameras are located in space 108, one or more of the cameras may be in wireless communication with one another, in exemplary embodiments. Further, while camera 102 is depicted in FIG. 1 as a standalone device, in other embodiments, camera 102 may be incorporated as a part of other electronic devices. For example, camera 102 may be incorporated as part of a smartphone, tablet, intelligent personal assistant, or other smart electronic device.
Camera 102 is described in further detail with respect to FIG. 2. In various embodiments, camera 102 is a consumer-friendly camera that can be utilized by a human user without needing to have any specialized camera expertise. The camera 102 may have one or more lens 104, with which video is captured. In exemplary embodiments, lens 104 may be any type of lens typically found in consumer cameras, such as a standard prime lens, zoom lens, and wide angle lens.
Camera 102 further has one or more sensors 106. Sensor(s) 106 may be any type of sensor to monitor conditions around the camera 102. By way of non-limiting example, sensor 106 may comprise one or more of a PIR (passive infrared) sensor that can bring to life colored night vision, a motion sensor, a temperature sensor, a humidity sensor, a GPS, etc. As would be understood by persons of ordinary skill in the art, other types of sensors can be utilized to preset other types of conditions or triggers as well for camera 102.
Referring to FIG. 2, camera 102 has additional components that enable its operation. For example, camera 102 may have power component(s) 206. Power component(s) 206 may comprise an electrical connector interface for electronically coupling a power source to, or for providing power to the camera 102. Electrical connector interface may comprise, for example, an electrical cable (the electrical cable can be any of a charging cable, a FireWire cable, a USB cable, a micro-USB cable, a lightning cable, a retractable cable, a waterproof cable, a cable that is coated/covered with a material that would prevent an animal from chewing through to the electrical wiring, and combinations thereof), electrical ports (such as a USB port, micro-USB port, microSD port, etc.), a connector for batteries (including rechargeable battery, non-rechargeable battery, battery packs, external chargers, portable power banks, etc.), and any other standard power source used to provide electricity/power to small electronic devices.
In an exemplary embodiment, power component(s) 206 comprises at least one battery provided within a housing unit. The battery may also have a wireless connection capability for wireless charging, or induction charging capabilities.
Camera 102 also comprises audio component(s) 204. In various embodiments, audio component(s) 204 may comprise one or more microphones for receiving, recording, and transmitting audio.
Camera 102 further has processing component(s) 208 to enable it to perform processing functions discussed herein. Processing component(s) 208 may comprise at least one processor, static or main memory, and software such as firmware that is stored on the memory and executed by a processor. Processing component(s) 208 may further comprise a timer that operates in conjunction with the functions disclosed herein.
In various embodiments, a specialized video processor is utilized with a hardware accelerator and specially programmed firmware to identify triggering events, begin recording audio and/or video (in either Standard Definition or High Definition), cease recording of audio and/or video, process the captured video frames and insert metadata information regarding the specific video frame(s) containing a human face, and transmit the recorded audio, video, and metadata to a video analysis system 122 operating via software in a cloud computing environment.
Camera 102 also comprises networking component(s) 202, to enable camera 102 to connect to network 110 in a wired or wireless manner, similar to networking capabilities utilized by persons of ordinary skill in the art. Further, networking component(s) 202 may also allow for remote control of camera 102.
In various embodiments, the networking communication capability of camera 102 can be achieved via an antenna attached to any portion of camera 102, and/or via a network card. Camera 102 may communicate with network 110 via wired or wireless communication capabilities, such as radio frequency, Bluetooth, ZigBee, Wi-Fi, electromagnetic wave, RFID (radio frequency identification), etc.
A human user 120 may further interact with, and control certain operations of the camera 102 via a graphical user interface displayed on a user device 118. The graphical user interface can be accessed by a human user 120 via a web browser on the user device 118 (such as a desktop or laptop computer, netbook, smartphone, tablet, etc.). A human user may further interact with, and control certain operations of the camera 102 via a dedicated software application on a smartphone, tablet, smartwatch, laptop or desktop computer, or any other computing device with a processor that is capable of wireless communication. In other embodiments, a human user 120 can interact with, and control certain operations of the camera 102 via a software application utilized by the user 120 for controlling and monitoring other aspects of a residential or commercial building, such as a security system, home monitoring system for Internet-enabled appliances, voice assistant such as Amazon Echo, Google Home, etc.
Returning to FIG. 1, camera 102 captures video as discussed herein. The captured video 112 is then transmitted to video analysis system 122 via network 110.
The network 110 may include the Internet or any other network capable of communicating data between devices. Suitable networks may include or interface with any one or more of, for instance, a local intranet, a Personal Area Network, a Local Area Network, a Wide Area Network, a Metropolitan Area Network, a virtual private network, a storage area network, a frame relay connection, an Advanced Intelligent Network connection, a synchronous optical network connection, a digital T1, T3, E1 or E3 line, Digital Data Service connection, Digital Subscriber Line connection, an Ethernet connection, an Integrated Services Digital Network line, a dial-up port such as a V.90, V.34 or V.34bis analog modem connection, a cable modem, an Asynchronous Transfer Mode connection, or a Fiber Distributed Data Interface or Copper Distributed Data Interface connection.
Furthermore, communications may also include links to any of a variety of wireless networks, including Wireless Application Protocol, General Packet Radio Service, Global System for Mobile Communication, Code Division Multiple Access or Time Division Multiple Access, cellular phone networks, Global Positioning System, cellular digital packet data, Research in Motion, Limited duplex paging network, Bluetooth radio, or an IEEE 802.11-based radio frequency network. The network can further include or interface with any one or more of an RS-232 serial connection, an IEEE-1394 (FireWire) connection, a Fiber Channel connection, an IrDA (infrared) port, a SCSI (Small Computer Systems Interface) connection, a Universal Serial Bus (USB) connection or other wired or wireless, digital or analog interface or connection, mesh or Digi® networking.
The network 110 may be a network of data processing nodes that are interconnected for the purpose of data communication. The network 110 may include any suitable number and type of devices (e.g., routers and switches) for forwarding commands, content, requests, and/or responses between each user device 118, each camera 102, and the video analysis system 122.
The video analysis system 124 may include a server-based distributed software application, thus the system 122 may include a central component residing on a server and one or more client applications residing on one or more user devices and communicating with the central component via the network 110. The user 120 may communicate with the system 122 via a client application available through the user device 118.
Video analysis system 122 may comprise software application(s) for processing captured video 112, as well as other capabilities. Video analysis system 122 is further in communication with one or more data structures, such as database 114. In exemplary embodiments, at least some components of video analysis system 122 operate on one or more cloud computing devices or servers.
Video analysis system 122 further comprises a system for facial recognition 124. The system for facial recognition 124 analyzes the specific video frames noted in metadata associated with captured video 112. Through the analysis, which consists of one or more software algorithms executed by at least one processor, the system for facial recognition 124 analyzes the video frames from captured video 112 that have been noted as containing a human face. The human face detected in each video frame is then “recognized”, i.e., associated with a likely name of the person whose face is detected.
Face recognition information 116, which may comprise a name of one or more people recognized in captured video 112 is then transmitted by system for facial recognition 124, through network 110, to a user device 118, at which point it can be viewed by a user. In some embodiments, additional information may be transmitted with face recognition information 116, such as a copy of the face image from the captured video 112, and/or other information regarding the facial recognition.
Face recognition information 116 is displayed via a user interface on a screen of user device 118, in the format of a pop-up alert, text message, e-mail message, or any other means of communicating with user 120.
The user device 118, in some example embodiments, may include a Graphical User Interface for displaying the user interface associated with the system 122. The user device 118 may include a mobile telephone, a desktop personal computer (PC), a laptop computer, a smartphone, a tablet, a smartwatch, intelligent personal assistant device, smart appliance, and so forth.
FIG. 3 is a block diagram showing various modules of a video analysis system 122 for processing captured video 112, in accordance with certain embodiments. The system 112 may include a processor 310 and a database 320. The database 320 may include computer-readable instructions for execution by the processor 310. The processor 310 may include a programmable processor, such as a microcontroller, central processing unit (CPU), and so forth. In other embodiments, the processor 310 may include an application-specific integrated circuit or programmable logic array, such as a field programmable gate array, designed to implement the functions performed by the system 122. In various embodiments, the system 122 may be installed on a user device or may be provided as a cloud service residing in a cloud storage. The operations performed by the processor 310 and the database 320 are described in further detail herein.
FIG. 4 is a block diagram showing various modules of a system for facial recognition 124, for identifying (recognizing) detected human faces in select frames of captured video 112, in accordance with certain embodiments. The system 124 may include a processor 410 and a database 420. The processor 410 of system for facial recognition 124 may be the same, or different from processor 310 of the video analysis system 122. Further, database 420 of system for facial recognition 124 may be the same or different than database 320 of video analysis system 122.
Database 420 may include computer-readable instructions for execution by the processor 410. The processor 410 may include a programmable processor, such as a microcontroller, central processing unit (CPU), and so forth. In other embodiments, the processor 410 may include an application-specific integrated circuit or programmable logic array, such as a field programmable gate array, designed to implement the functions performed by the system for facial recognition 124. In various embodiments, the system for facial recognition 124 may be installed on a user device or may be provided as a cloud service residing in a cloud storage. The operations performed by the processor 410 and the database 420 are described in further detail herein.
FIG. 5 is a process flow diagram showing a method 500 for automatic facial detection via a camera system, within the environment described with reference to FIG. 1. In some embodiments, the operations may be combined, performed in parallel, or performed in a different order. The method 500 may also include additional or fewer operations than those illustrated. The method 500 may be performed by processing logic that may comprise hardware (e.g., decision making logic, dedicated logic, programmable logic, and microcode), hardware accelerator, software (such as firmware on camera 102 or other software run on a special-purpose computer system), or any combination of the above.
The method 500 may commence with camera 102 detecting a triggering event at operation 502. As discussed herein, camera 102 may be located in a physical space 108 and powered on, but not be actively recording video and/or audio. A triggering event may cause camera 102 to begin recording. The triggering event can be any preset condition or trigger.
In an example embodiment, the triggering event is a noise detected by a microphone on camera 102 above a certain decibel threshold. In another example embodiment, a triggering event is a noise detected by a microphone on camera 102 within a certain time period. In other example embodiments, a triggering event may be the detection of motion, temperature, smoke, humidity, gaseous substance, or any other environmental condition above a preset threshold, or occurring within a preset time period. The preset threshold may be configured by a manufacturer of camera 102, or configured by a user 120.
Upon detection of the triggering event, camera 102 enters capture video and/or audio mode at operation 504 for a certain predetermined time period. In some embodiments, capture mode may be enabled on camera 102 for an amount of time that is pre-configured by a manufacturer of camera 102, or pre-configured by a user 120. Further, the predetermined time period that capture mode is enabled on camera 102 may be variable based on the type of triggering event, time of day, or any other criterion. In an exemplary embodiment, capture mode is enabled on camera 102 for 5-30 seconds.
At operation 506, video is recorded by camera 102 onto memory within camera 102 hardware. Further, substantially simultaneously, recorded video is processed by firmware on a specialized video processor hardware and/or hardware accelerator within camera 102. The firmware processes recorded video and detects select video frames within the recorded video that contain a human face or are likely to contain a human face. In various embodiments, a threshold confidence level may be preset for the facial detection, such that false positives are preferable to false negatives. The facial detection may occur substantially instantaneously (within 1 second) of the video capture by camera 102.
Information regarding which specific frames, such as the time at which those frames occur in the recorded video clip, is added to metadata associated with the recorded video, at operation 508. As a non-limiting example, exemplary metadata may comprise:


typedef struct {
unsigned int timestamp;	/* timestamp when face was detected */
unsigned char face_detected;	/* Face has been detected or
	not - YES/NO */
unsigned char no_of_faces;	/* No of faces detected at the
particular timestamp*/
unsigned char type;	/* human/pet - HUMAN/PET */
unsigned char reserved;
}METADATA;

Subsequently, the recorded video and updated metadata file are transmitted to video analysis system 122 for further analysis, at operation 510. In various embodiments, camera 102 is in wireless communication with video analysis system 122 and operation 510 occurs in a wireless manner. In other embodiments, the transmission occurs via a wired communication network. In still further embodiments, video analysis system 122 may be executed by a module within camera 102 itself.
As a non-limiting example, an exemplary transmission from camera 102 to video analysis system 122 may comprise a plurality of computing files. For an example video clip “1495”, the following computing files may be transmitted:


V_123456789_1_1495_1_92086ce4a84af783b1a2_2379.ts
V_123456789_1_1495_2_92086ce4a84af783b1a2_2379.metadata
V_123456789_1_1495_2_92086ce4a84af783b1a2_2379.ts
V_123456789_1_1495_3_92086ce4a84af783b1a2_2379.last
V_123456789_1_1495_3_92086ce4a84af783b1a2_2379.metadata
V_123456789_1_1495_3_92086ce4a84af783b1a2_2379.ts

In this example, “.ts” files are video recordings and “.metadata” files are the respective metadata files associated with the recorded “.ts” file.

FIG. 6 is a process flow diagram showing a method 600 for automatic facial recognition via a camera system, within the environment described with reference to FIG. 1. In some embodiments, the operations may be combined, performed in parallel, or performed in a different order. The method 600 may also include additional or fewer operations than those illustrated. The method 600 may be performed by processing logic that may comprise hardware (e.g., decision making logic, dedicated logic, programmable logic, and microcode), hardware accelerator, software (such as firmware or other software run on a special-purpose computer system or general purpose computer system), or any combination of the above.
Various operations of method 600 may be performed by video analysis system 122, system for facial recognition 124, or a combination of both systems.
The method 600 may commence at operation 602 with video analysis system 122 receiving recorded video and/or audio, along with corresponding metadata from camera 102. As discussed herein, the recorded video may be a video clip of any duration. In preferred embodiments, the recorded video clip is between 5-30 seconds, preferably about 20 seconds in length. The recorded video and corresponding metadata may be received from camera 102 via any wireless or wired communication mechanism.
At operation 604, the video analysis system 122 examines the metadata received to determine which video frames of the recorded video clip have detected human faces. The select video frames are then extracted from the video clip for further analysis by the system for facial recognition 124.
At operation 606, system for facial recognition 124 processes the extracted video frames from the received video clip. The processing may optionally entail verifying whether the detected human face by the camera 102 is indeed a human face. Further, the processing may entail identifying the detected human face in the select video frames. This process is also referred to herein as recognizing the human face.
In various embodiments, the human face is compared to known human faces stored in a data structure such as a database to determine a match with a previously identified human face. The previously identified human faces may be identified by one or more machine learning algorithms, and/or by human input. In various embodiments, a threshold confidence level may be preset for the facial recognition, such that false positives are preferable to false negatives, or vice versa.
The system for facial recognition 124 may then generate a result with face recognition information 116. Face recognition information 116 may comprise at least one of a first name associated with the human face, a last name associated with the human face, a relationship of the face recognized with user 120, a location where the face was detected within physical space 108, or any other factor. In other embodiments, face recognition information 116 may comprise a likely identity of the human, and a request that the user 120 verify the likely identity.
In other embodiments, face recognition information 116 may comprise a result that the human face detected is not recognized by system for facial recognition 124. In such scenario, the user may be asked if she recognizes the detected human face. If so, user 120 may input identity information for the detected face, which is stored by the system for facial recognition 124 in its database of known identified faces. In some embodiments, user 120 may not recognize the detected face, and system for facial recognition 124 stores the detected face as an unknown person.
The face recognition information 116 result is transmitted to user 120 at operation 608, and displayed on a user device 118 at operation 610. The face recognition information 116 may be partially or fully displayed on a user interface of user device 118 via a pop-up notification, text message, e-mail message, via a web browser operating on user device 118, or via a dedicated software application operating on user device 118.
In this way, a human face can be automatically detected and recognized by a camera system quickly and efficiently, without needed significant user input.
FIG. 7 depicts an exemplary screenshot 700 of a video frame from captured video 112. In an exemplary embodiment, at least face 702 may be detected by camera 102. System for facial recognition 124 may compare face 702 to previously identified faces (familiar faces), and to previously unidentified faces (unfamiliar faces), as depicted in the exemplary screenshot 800 of FIG. 8. In exemplary screenshot 800, face 702 is categorized as an “unfamiliar face” by system for facial recognition 124. This information may be transmitted and presented on a graphical user interface on user device 118.
FIG. 9 depicts an exemplary screenshot 900 that may be further displayed on a graphical user interface on user device 118. In exemplary screenshot 900, a user has input that face 702, which was categorized as an “unfamiliar” or unrecognized face by the system for facial recognition 124, is actually a known person to the user with the name “Kathleen”. System for facial recognition 124 detects that a previously identified face with that name exists, and prompts the user as to whether face 702 should be combined with the previously recognized face under the identifier “Kathleen”. In this way, system for facial recognition 124 has learned that these different views are of the same person, and will recognize face 702 accordingly in future video frames.
FIG. 10 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system 1000, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed. Computer system 1000 may be implemented within camera 102, video analysis system 122, and/or system for facial recognition 124.
In various exemplary embodiments, the machine operates as a standalone device or can be connected (e.g., networked) to other machines. In a networked deployment, the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine can be a PC, a tablet PC, a set-top box, a cellular telephone, a digital camera, a portable music player (e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The example computer system 1000 includes a processor or multiple processors 1002, a hard disk drive 1004, a main memory 1006, and a static memory 1008, which communicate with each other via a bus 1010. The computer system 1000 may also include a network interface device 1012. The hard disk drive 1004 may include a computer-readable medium 1020, which stores one or more sets of instructions 1022 embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 1022 can also reside, completely or at least partially, within the main memory 1006 and/or within the processors 1002 during execution thereof by the computer system 1000. The main memory 1006 and the processors 1002 also constitute machine-readable media.
While the computer-readable medium 1020 is shown in an exemplary embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions. The term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, Random Access Memory (RAM), Read-Only Memory (ROM), and the like.
The exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware. The computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems.
In some embodiments, the computer system 1000 may be implemented as a cloud-based computing environment, such as a virtual machine operating within a computing cloud. In other embodiments, the computer system 1000 may itself include a cloud-based computing environment, where the functionalities of the computer system 1000 are executed in a distributed fashion. Thus, the computer system 1000, when configured as a computing cloud, may include pluralities of computing devices in various forms, as will be described in greater detail below.
In general, a cloud-based computing environment is a resource that typically combines the computational power of a large grouping of processors (such as within web servers) and/or that combines the storage capacity of a large grouping of computer memories or storage devices. Systems that provide cloud-based resources may be utilized exclusively by their owners, or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
The cloud may be formed, for example, by a network of web servers that comprise a plurality of computing devices, such as a client device, with each server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource consumers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depends on the type of business associated with the user.
It is noteworthy that any hardware platform suitable for performing the processing described herein is suitable for use with the technology. The terms “computer-readable storage medium” and “computer-readable storage media” as used herein refer to any medium or media that participate in providing instructions to a CPU for execution. Such media can take many forms, including, but not limited to, non-volatile media, volatile media and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk. Volatile media include dynamic memory, such as system RAM. Transmission media include coaxial cables, copper wire, and fiber optics, among others, including the wires that comprise one embodiment of a bus. Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM disk, digital video disk, any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a Programmable Read-Only Memory, an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory, a FlashEPROM, any other memory chip or data exchange adapter, a carrier wave, or any other medium from which a computer can read.
Thus, computer-implemented methods and systems for automatically detecting and recognizing human faces via a camera system are described herein. As would be understood by persons of ordinary skill in the art, while the present disclosure describes the detection and recognition of human faces, the present disclosure may similar be utilized for the detection and recognition of other objects. For example, the present disclosure may be utilized for the detection and recognition of other body parts of a human, for the detection and recognition of animals, and/or for the detection and recognition of other inanimate objects.
Although embodiments have been described herein with reference to specific exemplary embodiments, it will be evident that various modifications and changes can be made to these exemplary embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

What is claimed is:

1. A system for facial detection and recognition via a camera system, the system comprising:

a camera comprising:

a lens;

at least one sensor;

a microphone;

a first processor and a first memory, configured to:

detect a triggering event in a physical space around the camera;

record video for a predetermined time period, in response to detecting the triggering event;

execute firmware on the camera to detect that at least one human face is present in at least one video frame of the recorded video;

update a metadata file associated with the recorded video with information regarding the at least one video frame that has the detected at least one human face; and

transmit the recorded video and the updated metadata file to a video analysis system in communication with the camera; and

the video analysis system comprising:

a second processor and second memory, the second processor configured to:

receive the transmitted recorded video and updated metadata file from the camera;

extract from the recorded video the at least one video frame with the detected at least one human face, based on information in the updated metadata file;

perform facial recognition analysis on the extracted at least one video frame to identify the detected at least one human face;

transmit and present a result of the identified detected at least one human face to a user interface on a user computing device in communication with the video analysis system.

2. The camera system of claim 1, wherein the recorded video further comprises audio.

3. The camera system of claim 1, wherein the detected triggering event is a loud noise in the physical space around the camera that is detected by the microphone.

4. The camera system of claim 1, wherein the detected triggering event is a movement in the physical space around the camera that is detected by the at least one sensor.

5. The camera system of claim 1, wherein the predetermined time period for recording video in response to detecting the triggering event is less than one minute.

6. The camera system of claim 1, wherein the detecting that at least one human face is present in at least one video frame of the recorded video occurs substantially simultaneously with recording the video in response to detecting the triggering event.

7. The camera system of claim 1, wherein the first processor is a specialized video processor.

8. The camera system of claim 1, wherein the video analysis system further comprises a database communicatively coupled to the second processor, the database storing previously identified human faces and associated identify information.

9. The camera system of claim 1, wherein the result of the facial recognition analysis is a name of the detected at least one human face.

10. The camera system of claim 1, wherein the result of the facial recognition analysis is an unknown identity of the detected at least one human face.

11. A method for facial detection and recognition via a camera system, the method comprising:

detecting, by a camera, a triggering event in a physical space around the camera;

recording video for a predetermined time period by the camera, in response to detecting the triggering event;

executing firmware on the camera to detect that at least one human face is present in at least one video frame of the recorded video;

updating, by the camera, a metadata file associated with the recorded video with information regarding the at least one video frame that has the detected at least one human face;

transmitting the recorded video and the updated metadata file, by the camera to a video analysis system in communication with the camera;

receiving the recorded video and updated metadata file by the video analysis system;

extracting the at least one video frame with the detected at least one human face from the recorded video, based on information in the updated metadata file;

performing facial recognition analysis on the extracted at least one video frame, to identify the detected at least one human face; and

12. The method of claim 11, wherein the detected triggering event is a loud noise in the physical space around the camera that is detected by the microphone.

13. The method of claim 11, wherein the detected triggering event is a movement in the physical space around the camera that is detected by the at least one sensor.

14. The method of claim 11, wherein the predetermined time period for recording video in response to detecting the triggering event is less than one minute.

15. The method of claim 11, wherein the detecting that at least one human face is present in at least one video frame of the recorded video occurs substantially simultaneously with recording the video in response to detecting the triggering event.

16. The method of claim 11, wherein the first processor is a specialized video processor.

17. The method of claim 11, wherein the video analysis system further comprises a database communicatively coupled to the second processor, the database storing previously identified human faces and associated identify information.

18. The method of claim 11, wherein the result of the facial recognition analysis is a name of the detected at least one human face.

19. The method of claim 11, wherein the result of the facial recognition analysis is an unknown identity of the detected at least one human face.

20. A system for object detection and recognition via a camera system, the system comprising:

a camera comprising:

a lens;

at least one sensor;

a microphone;

a first processor and a first memory, configured to:

detect a triggering event in a physical space around the camera;

execute firmware on the camera to detect that at least one known object is present in at least one video frame of the recorded video;

update a metadata file associated with the recorded video with information regarding the at least one video frame that has the detected at least one known object; and

the video analysis system comprising:

a second processor and second memory, the second processor configured to:

extract from the recorded video the at least one video frame with the detected at least one object, based on information in the updated metadata file;

perform further recognition analysis on the extracted at least one video frame to identify the detected at least one known object;

transmit and present a result of the identified detected at least one known object to a user interface on a user computing device in communication with the video analysis system.