WO2021050369A1

WO2021050369A1 - Autonomous comfort systems

Info

Publication number: WO2021050369A1
Application number: PCT/US2020/049324
Authority: WO
Inventors: Ali Ghahramani; Ronnen Michael LEVINSON; Syung Kee MIN; Kaifei Chen; Andrew Yi WANG
Original assignee: University of California Berkeley; University of California San Diego UCSD
Current assignee: University of California Berkeley; University of California San Diego UCSD
Priority date: 2019-09-10
Filing date: 2020-09-04
Publication date: 2021-03-18
Anticipated expiration: 2022-03-10

Abstract

An autonomous comfort system has a visible camera to produce a visible image, an infrared camera to produce a temperature map, at least one processor to perform image registration between the visible image and the temperature map to produce a registered image having facial features with temperature points, analyze the registered image to determine values of temperatures at the temperature points, determine adjustments to operation of a comfort device based upon the values of the temperatures, and adjust operation of the comfort device.

Description

AUTONOMOUS COMFORT SYSTEMS

RELATED APPLICATION

[0001] This application claims priority to and the benefit of US Provisional Application No. 62/898,262 filed September 11, 2019, which is incorporated herein by reference in its entirety.

TECHNICAL FIELD

[0002] This disclosure relates to environmental comfort systems, more particularly to autonomous comfort systems based upon personal comfort data.

BACKGROUND

[0003] Traditionally, Heating, Ventilation, and Air Conditioning (HVAC) systems are responsible for providing acceptable thermal environment in buildings and are operated based upon set points derived from thermal comfort standards, such as the ASHRAE (American Society of Heating, Refrigeration and Air-Conditioning Engineers) Standard 55 “Thermal Environment Conditions for Human Occupancy,” 2004. However, these set points are most often not representative of actual occupant comfort needs due to different static factors such as gender, and dynamic factors such as acclimation, or transient thermal conditions. This results in low comfort ratings in buildings and HVAC system energy waste of 10-32%.

Energy waste may vary based on the building size, type, construction materials, and climate. [0004] In addition, HVAC system set points are adjusted to accommodate occupant comfort requirements for the entire space at the zone/room level, which is energy intensive and could also be slow to respond to an occupant’s transient needs, such as entering a space from a hot outdoor. To eliminate the need for the HVAC system to condition the entire zone/room, researchers have investigated the applicability of using robots for providing comfort at a personal level by conditioning only the microclimate around the person. Robots having fans, heating, and/or cooling equipment provide the opportunity to address comfort needs quickly with low energy demands and no spatial limitation. However, these robots rely on surveys asking individuals about their comfort state through a web interface or app. Therefore, they require frequent occupant feedback and thus have become impractical due to survey fatigue. [0005] To address this challenge, researchers have focused on methods that use occupant feedback as training labels and build personal comfort models based on environmental measurements, such as air temperature. Even though these models reduce the need for frequent occupant feedback, they are difficult to generalize since they are may be incapable of addressing short-term comfort needs. To eliminate the need for frequent user feedback, some approaches have studied physiological responses such as skin temperature, heart rate, and core temperature as a proxy for personal comfort. However, existing physiological-based methods still require user’s feedback to train the comfort model in a supervised learning fashion. The need for this is a significant drawback for these systems and often prevent them from largescale deployments.

BRIEF DESCRIPTION OF THE DRAWINGS [0006] Figure 1 shows an embodiment of an autonomous comfort system.

[0007] Figures 2 and 3 show results of experiments using a detection system.

[0008] Figures 4-7 show photographs of different detection results.

[0009] Figure 8 shows a graph of wind speed versus distance.

[00010] Figure 9 show another embodiment of an autonomous comfort system.

[00011] Figure 10 shows various face positions.

[00012] Figures 11-14 show different view of an embodiment of a simplified comfort system device. DET AILED DESCRIPTION OF THE EMBODIMENTS

[00013] The embodiments here use a computer vision-based data acquisition system that provides a non-intrusive way of collecting personal comfort data. The system blends the two comfort learning algorithm methods and integrate them into controlling a robotic fan for providing air with a desired temperature and speed to the occupant.

[00014] In one embodiment, a fish-eye camera detects occupants in its field of view and the fan nozzle is efficiently moved and pointed towards the person. A visible light camera is located on the nozzle and used to detect for human face features, such as eyes, nose, and lips. Images from a thermal camera also located on the nozzle are then registered onto the visible light image and temperatures of different face components are captured and used to infer the comfort state. The system may also use pose estimation to filter images and improve the accuracy of comfort prediction. Accordingly, the fan/heater system blows air with a specific velocity and temperature toward the occupant via a closed-loop feedback control. Since the system can track a person in an environment, it addresses issues with prior data collection systems that needed occupants to be positioned in a specific location.

[00015] Results demonstrate the applicability of using this system for autonomously providing comfort to the building occupants. Different components of this system can be leveraged to build autonomous personal comfort devices such as desk fans, heated and ventilated seats and central air conditioning systems such as variable air volume (VAV) systems. In addition, providing a weak air movement around the breathing zone significantly reduces the CO2 concentration in the breathing zone of an occupant, improving the well being and productivity.

[00016] Figure 1 illustrates a 3D rendering and the block diagram of one embodiment of a robotic system setup 10. The setup includes a fish-eye camera 14, two stepper motors 32 and 34, a visible light camera and an IR Thermal Camera shown later, two microcontrollers, and a fan, not shown here. The fisheye camera provides a wide view of the entire room to be used for one embodiment of the system that uses human detection. The two stepper motors are used to provide motion in the two degrees of freedom: pitch and yaw, to direct the fan nozzle towards the person of interest. The visible light camera is used to generate images such as 18 detect locations of facial features of the nozzle’s target at which the temperatures are measured using the temperature map array created by the IR Camera such as 20. These two cameras are aligned as closely to each other as possible to minimize the image registration required. A microcontroller is used to transmit fish-eye images such as 22 from the client 12 to a cloud server 16 for detection of occupants such as 24 and receives coordinates at 30 to orient the nozzle to the desired direction by determining the target angle at 26 and controlling the stepper motor rotation at 28. The second microcontroller is used to transmit the visible light camera and IR temperature map to the server 16 which performs image registration, temperature querying, and comfort calculation. The microcontroller then controls the fan strength based on the calculated comfort values.

[00017] One should note that a fan is used here as the comfort device, but other types of comfort devices may also be controlled by this system. No intention of limiting the comfort device to a fan is intended, nor should any be implied.

[00018] In an indoor office, a fish-eye camera from a ceiling, or other highest point in a region of interest, allows for a wide field of view that is mostly unobstructed by cubicle walls and furniture. However, the camera and its position relative to the occupants’ poses two problems that are not present in conventional vision systems. First, the fisheye camera’s field of view causes a strong lens distortion around the edge of its view. Second, the camera’s position above users puts the image at a skewed perspective. [00019] In order to address these challenges, the embodiments use an algorithm to detect and track multiple people simultaneously. In this algorithm, a K-nearest neighbor’s background subtractor is used to identify and isolate any non-background pixels. Then a series of pixel erosions and dilations are implemented to filter any noise in the subtracted image. Pixels that are considered part of a shadow are then removed via thresholding. The resulting image contains large groups of pixels that are target candidates. From the candidates, the system selects targets whose size and shapes are within a specific range. For each valid target, the system creates a bounding rectangle. Since the fish-eye camera creates a spherical image where taller elements are on the outer areas of the image, the system approximates the head of the person to be on the midpoint of the bounding box’s edge that is furthest from center of the image.

[00020] Using the coordinate of the target’s head in the python image frame: the

embodiments calculate the pitch and yaw in the camera frame assuming the fish-eye

camera to be at the center of the image.

Using these, the process determines the 3D position of the target in the camera frame. The embodiments here used the average human height to determine the z position:

The embodiments treat the fan frame to be linearly offset from the camera frame in the XY plane to get the 3D coordinates of the person’s head in the fan frame:

The process uses these coordinates to calculate target pitch and yaw angles to orient the fan towards the user’s location:

[00021] Figure 1 illustrates the relationships of image, camera, and the fan frame and the target coordinates, pitch and yaw angles at the server 16 and the determination made at 30. [00022] In order to validate the performance of the embodiments of the occupant tracking system, two experiments were conducted. A first experiment tested the fish-eye occupant tracking system in different scenarios of 1, 2, and 3 people as occupants walking into the room. In the experiment, the subjects were asked to randomly walk around the room for 1 minute. The experiment also mixed the subjects during the 2 people scenarios to introduce more diversity. The experiments were repeated 5 times at different times of day overall several days to reduce the effects of lighting or different clothing. Approximately 370 images, ±10, were collected in each of the 5 experiments. Figures 2 and 3 demonstrate the results of the first test. The experiment tracked two metrics: recall and precision. Recall means the percentage of successful detection of a person in the fisheye camera’s field of view. of people detected in image group recall of people present in image group and precision is defined as successful approximation of the head position on a detected person as seen in Figure 4. In Figure 4, the system detected 3 people at 36, 38 and 42, and correctly detected the head position 40 of person 38.

# of correct approximations of the head precision = - - - - — - - -

# of people detected

The experiments also ran the entire HVAC system and collected air movement around the room using a hot-wire anemometer.

[00023] The first experiment found two main sources of unsuccessful detections. First, a person 44 by a window in Figure 5 went undetected because the bloom from overexposure due to the window, and inherent blur of the camera, caused the contours of the person to be hard to detect. Second, when multiple people such as 46 and 48 stand close to each other as in Figure 6, the system detected the pair as one person. Another source of detection error resulted from a person having an extended limb as shown in Figure 7 at 50, causing a shift in the approximation of the person’s head position at 52. However, the system process these images contiguously at 4 Hz, so the system performance appears to be very smooth.

[00024] Figure 8 shows a graph of average wind speed from a fan versus distance.

[00025] As shown in the embodiment 60 of Figure 9, thermal camera 66 and visible light camera 62 have positioned aimed directly at the occupant. In order to capture the temperature from the in facial points in the thermal image 68, the system uses the visible image 64 to detect facial components such as eyes, nose, forehead, etc. An elastic image registration then aligns the visible and thermal images at 68, resulting in combined image 72 and reads the temperatures from the coordinates of the facial components. A heuristic-based algorithm then uses pose estimation and some predefined rules, an embodiment of which are shown in 74, to filter the measurements and get reliable facial temperature measurements. [00026] In the embodiment shown at 74, the probability of the most likely path between the different conditions is determined using recursive processing including past observations. This results in the fan power control 76 and pulse width modulation (PWM) 78 to operate the fan 80. In other applications, the system may just capture facial temperature without following up with the fan power control

[00027] Several options for face component detection exist and the assessment of each implementation of face component detection, meaning face alignment, was based on several considerations. First, it must detect and identify facial features at different orientations and locations. It would be unrealistic to assume that occupants will remain in the same location with their heads facing directly towards the visible light camera for an extended period. Second, it must also label facial features with enough landmarks that can be employed for inferring the locations of the cheeks and forehead. The measured temperatures from these regions are reliable indicators of the occupant’s comfort level. Third, its computational complexity should not detrimentally impact the performance of the personal comfort system. The system should respond to the occupant’s pose with little to no system delay.

[00028] Based on these considerations three different embodiments would seem appropriate for performing face alignment. The first candidate was developed by Bulat et al, capable of generating 68 unique facial landmark coordinates for the eyes, eyebrows, nose, mouth, and jawline. This is achieved through a convolutional neural network based on a stacked hourglass network and hierarchical, parallel, and multi-scale (HPM) residual blocks for landmark localization. A stacked hourglass network enables inferencing by first processing features down to low resolutions before the network begins up sampling and combining features across scales to produce a set of predictions. [00029] The second candidate was developed by Xiaochang Liu, who expanded on work done by InsightFace. Although it also employed a stacked hourglass network and can produce 68 unique coordinates, it uses channel aggregation residual blocks rather than HPM residual blocks. Although they share similar structures, the difference lies in the number of weight layers used for computing the output.

[00030] The third candidate was an implementation of face alignment developed by Ivan de Paz Centeno (i.e., FaceNet) using multitask cascaded convolutional neural network first explored by Zhang et al. Rather than generating 68 unique coordinates, it only indicates five unique facial landmarks with a single point. The first two algorithms are start of the art and were developed very recently. The third algorithm is adapted from FaceNet, a facial recognition software developed by Google researchers in 2015.

[00031] In order to select the best algorithm, visible images were collected of ten individuals at 11 unique facial orientations and two different lighting settings and the algorithms were compared in terms of their precision, recall, and computation time. Recall here means the fraction of all images where detection has occurred, regardless of whether or not they are correct. Precision here means the fraction of detected images that have correctly identified facial features on an individual’s face. Lastly, computation time is the average amount of time required by the central processing unit to process each image containing a certain facial orientation.

[00032] Each individual was asked to sit on a stool with the visible camera located approximately a meter above and away from the stool at an angle of 30 degrees. They were also asked to only move their heads and not their bodies when prompted to look in a particular direction. This was done under two lighting conditions, where the room lights were either on or off. The precision, recall, and computation time for all three face alignment candidates are shown in Figure 10, with the accompanying table showing the precision and recall for Bulat, Liu and Centeno. The computation times for each are: Bulat 2.32 seconds (s); Liu 4.29; and Centeno 0.22.

The average precision and recall under dark and light conditions are shown below:

[00033] Both Bulat and Liu showed comparable precision and recall for upward and lateral head orientations. Results were more mixed for downward head orientations having better precision and recall respectively. Additionally, both showed better performance when lights were turned on rather than off. The embodiments here use the Bulat et al method for face alignment because of its average computation time, which was approximately twice as fast as the computation time required to process a single image. However, any of the above embodiments, as well as many others could be used. No limitation to any particular face alignment technique is intended, nor should any be implied.

[00034] In operation, the system of Figure 9 works on a server-client architecture, allowing extensive computational work to be spread out between the server and client. In one embodiment a Raspberry Pi 3, an inexpensive board computer, serves as the client and a laptop containing a dedicated NVIDIA graphics processing unit as the server. In this embodiment, the client is directly interfaced with the 62 visible camera and infrared camera 66 for gathering a visible image 64 and IR image 68, respectively, of the environment within their field of vision in real-time. Once the image and temperature have been packaged by a messaging library such as zeromq, they are wirelessly sent from the client and received by the server through TCP/IP. The server acknowledges that it has received the images and temperature before it proceeds to facial detection, where it will attempt to identify facial landmarks in the image. In one embodiment, only when the system identifies facial landmarks in the image will the server initiate image registration. This will prevent unnecessary computations from being performed in the absence of an occupant in the visible camera’s field of vision.

[00035] Due to the focal differences in the IR camera versus the visible light camera, image registration allows the system to query temperature at the correct points of the IR temperatures array, captured as an image. First, the IR camera is placed as close to the visible light camera to maximize the common view area. Then the images from the visible camera are manually scaled and registered to the images from the IR camera as close as possible empirically. Canny edges are then extracted from both the visible image and the IR image captured at the same time. On the extracted canny edges, phase correlation is performed to

-li identify the pixel shift between the edges of the IR image to the edges of the visible image. The calculated pixel shift is then used to shift the IR image to register on the visible image. For visualization, the visible image’s edge is overlaid on the registered IR image at 72. [00036] The above division of tasks between the client and server is merely one example of such a division. The tasks could be allocated in a completely different manner, or could be located in anon-client server architecture and could just involve one processor. However, at least one processor will perform the tasks.

[00037] Previous work had used systems that do not require user input for training comfort models based on skin temperature measurements on several points on the human face. The embodiments here focus on the human face since the density of veins are considerably higher in the region. Additionally, the face is often visible for thermal imaging. The methods are based on the principle that the human thermoregulation system adjusts heat exchange with the environment to achieve thermal homeostasis by modifying the blood flow to the skin through cutaneous veins. By monitoring the thermoregulation performance, the system identifies the thermoneutral zone, which is a prerequisite to achieve personal thermal comfort. One method uses the relative differences between face components for identifying extreme conditions (too hot or too cold), and the other methods use the time series of temperature measurements and a hidden Markov model-based learning method to track occupant comfort over time. However, since the method used infrared sensors installed on glass for the data acquisition and occupants were required to wear the glass, there it was not a practical data collection system.

[00038] Since the system can track a person in an environment, it addresses issues with prior data collection systems that needed occupants to be positioned in a specific location. Results demonstrate the applicability of using this system for autonomously providing comfort to the building occupants.

[00039] Different components of this system can be leveraged to build autonomous personal comfort devices such as desk fans, heated and ventilated seats and central air conditioning systems such as variable air volume (VAV) systems. In addition, providing a weak air movement around the breathing zone significantly reduces the CO2 concentration in the breathing zone of an occupant, improving the well-being and productivity.

[00040] The embodiments have two main components, a system or systems of a tracking robot with a fan, and an infrared-fused, vision-driven, human thermal comfort sensor. The robotic system has both the fish-eye camera of Figure 1 combined with the system of Figure 9. However, these can be separately used. The tracking fine and the fish-eye camera could be one system, and the infrared/visible system could be used separately such as at a desk or in a car.

[00041] The comfort sensor could be ceiling or wall mounted, have a portable desk version, infused into personal computers, integrated into drones, take the form of mobile personal comfort systems, integrated into human-operated industrial equipment, and exercise equipment like treadmills. The robotic device, with or without the comfort sensor, could be wall or ceiling mounted, have a portable desk version, be integrated into drones, integrated into human-operated industrial equipment, and exercise equipment. Either could be in one of many locations, including buildings, vehicles, airplanes, public transportation, hospital rooms, hazardous sites, data centers, bedrooms, fitness centers, enclosed public spaces and kitchens, just as examples. They also could be employed in different contexts, including athletics, monitoring thermal stress in workers in extreme conditions, heat cramps and exhaustion in children and patients, physical activities, monitoring effectiveness of exercise, stress management and several of the medical fields.

[00042] In addition, the components of the system can be mixed and matched to provide different versions of the system. For example, Figures 11-14 show different views of an embodiment 90 of a simpler system with a visible camera 96, and an IR camera 98, integrated into a housing 92. The stepper motors 99 and 100 are under control of the controller board 102 to control the fan 94 in response to the image processing, which could be done local to the device or the device could be network enabled.

[00043] It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the embodiments.

Claims

WHAT IS CLAIMED IS:

1. An autonomous system, comprising: a visible camera to produce a visible image; an infrared camera to produce an infrared image as a temperature map; at least one processor to execute code to cause the processor to: perform image registration between the visible image and the temperature map to produce a registered image having facial features with temperature points; analyze the registered image to determine values of temperatures at the temperature points; determine adjustments to operation of a comfort device based upon the values of the temperatures; and adjust operation of the comfort device.

2. The system as claimed in claim 1, wherein the infrared camera and the visible camera are positioned adjacent to each other to maximize a common view area.

3. The system as claimed in claim 1, wherein the at least one processor executes code to further cause the processor to identify facial landmarks and will only cause the processor to perform image registration when facial landmarks have been identified.

4. The system as claimed in claim 1, wherein the processor to perform image registration comprises scaling images from the visible camera to match images from the IR camera.

5. The system as claimed in claim 1, wherein the processor to perform image registration comprises extracting canny edges from the visible image and the infrared image, and phase correlating the canny edges to identify pixel shift between the edges of the infrared image and the edges of the visible image.

6. The system as claimed in claim 1, wherein the comfort device comprises at least one of a fan, a heated seat, a ventilated seat, a central air conditioning system, and a variable air volume system.

7. An autonomous system, comprising: a fish-eye camera to produce an image of a field of view; a positionable comfort device; and at least one processor to execute code to cause the processor to: process pixels in the data of the image to produce groups of pixels that are target candidates; identify targets with size and shape in a specific ranges and creating a boundary around the targets; approximate a location of a head of a person that is a target; and determine changes to be made to a position of the comfort device based upon the location of the head of the person.

8. The system as claimed in claim 7, wherein the fish-eye camera is positioned at a height of a region of interest.

9. The system as claimed in claim 7, wherein the code that causes the processor to process pixels further comprises code to cause the processor to identify and subtract any non background pixels.

10. The system as claimed in claim 7, wherein the code that causes the processor to process pixels further comprises code to cause the processor to filter noise from pixels in the image.

11. The system as claimed in claim 7, wherein the code that causes the processor to process pixels further comprises code to cause the processor to compare pixels against a threshold to identify pixels that are part of a shadow, and removing those pixels from the image data.

12. The system as claimed in claim 7, wherein the code that causes the processor to approximate the position of the head further comprises code to cause the processor to approximate the head to be on a midpoint of an edge of the boundary further from a center of the image.

13. The system as claimed in claim 7, wherein the code that causes the processor to determine changes to be made to a position of the comfort device comprises code to cause the processor to determine a three dimensional position of the target in the camera frame, calculate target pitch and yaw angles to orient the comfort device.

14. The system as claimed in claim 7, wherein the comfort device comprises at least one of a fan, a heated seat, a ventilated seat, a central air conditioning system, and a variable air volume system.

15. An autonomous system, comprising: a visible camera to produce a visible image of a face of an occupant; an infrared camera to produce an infrared image; and at least one processor to execute code to cause the processor to: detect facial components in the visible image; perform elastic image registration between the visible image and the infrared image to produce an aligned image having facial features with temperature points; analyze the registered image to collect measurements of temperatures at the facial components; and filter the measurements to produce facial temperature measurements.

16. The system as claimed in claim 15, wherein the code to cause the processor to filter the measurements comprises code to cause the processor to use pose estimation to filter the measurements.

17. The system as claimed in claim 15, wherein the code to cause the processor to filter the measurements comprises code to cause the processor to recursively operate on past observations.

18. The system as claimed in claim 15, wherein the code to cause the processor to detect facial components in the visible image comprises code to cause the processor to detect and identify facial features at different orientations and locations, and then label the facial features with landmarks.