HK1181134B

HK1181134B - User interface control based on head orientation

Info

Publication number: HK1181134B
Application number: HK13108180.8A
Authority: HK
Inventors: D．J．桑布拉诺; C．皮科洛; J．W．哈汀; S．M．卢卡斯
Original assignee: 微软技术许可有限责任公司
Priority date: 2011-12-02
Filing date: 2013-07-12
Publication date: 2017-01-13

Description

Head direction based user interface control

Technical Field

The invention relates to user interface control.

Background

Some existing systems allow a user to access a computer or easily access a computer using various input methods. For example, some existing systems include cameras and face tracking algorithms. These algorithms identify facial features such as the user's eyes, nose, and mouth. For example, an eye may be identified by flashing infrared light at the user to locate the retina. The tip of the nose can be identified by calculating the curvature of the nose. The orientation of the head may be determined by calculating a mathematical normal of the face or by using two cameras to generate a three-dimensional model of the face.

However, such existing systems either require expensive and dedicated hardware or require intensive computations that are not practical for real-time or near real-time use.

Disclosure of Invention

Embodiments of the present invention differentiate between multiple user interface elements based on head orientation. The computing device receives coordinates representing a set of at least three reference points gazing at a subject of the plurality of user interface elements. The set includes at least a first reference point and a second reference point, which are located on opposite sides of a third reference point. The computing device determines a first distance between the first reference point and the third reference point, and a second distance between the second reference point and the third reference point. The determined first distance is compared to the determined second distance to calculate a head direction value of the subject. At least one of the plurality of user interface elements is selected based on the calculated head direction value.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Drawings

FIG. 1 is an exemplary block diagram illustrating a head orientation module interfacing with a capture device and a user interface.

FIG. 2 is an exemplary block diagram illustrating a computing device for implementing a head direction module.

FIG. 3 is an exemplary flow chart illustrating the operation of the head direction module comparing distances between fiducial points to calculate a head direction value.

FIG. 4 is an exemplary flow diagram illustrating the operation of the head direction module applying a head direction value calculated using two eyes and a nose to a user interface to identify a user interface element.

Fig. 5 is an exemplary face showing the distance between the two eyes and the nose.

FIG. 6 is an exemplary screen shot of a user interface for a game console illustrating selection of a user interface element based on a calculated head orientation of a user.

FIG. 7 is an exemplary user interface showing a coordinate system.

Corresponding reference characters indicate corresponding parts throughout the drawings.

Detailed Description

Referring to the figures, embodiments of the present invention allow the user 102 to control or navigate the user interface 104 using a set of fiducials 212 derived from the user's 102 image without hand movement. In some embodiments, fiducial points on the face of the user 102 are compared to identify a head direction value for the user 102. The head direction value is mapped to or aligned with the user interface 104 to identify at least one user interface element 214 displayed thereon. Aspects of the present invention operate in real-time or near real-time (e.g., 25 frames per second) and are integrated with other systems (e.g., face tracking algorithms) to enhance the user experience.

Although some aspects of the present invention are described and illustrated herein with reference to a gaming environment, other embodiments may be used in operating environments such as those on laptop computers, those in video conferencing scenarios, those for remote surveillance operations, and so forth.

Referring next to FIG. 1, FIG. 1 is an exemplary block diagram illustrating a head orientation module 108 interfacing with a capture device 106 and a user interface 104. In the example of fig. 1, a user 102 is viewing a user interface 104. The capture device 106 includes any means for capturing an image of the user 102. The capture device 106 may include one or more components, such as a motion sensor, a camera, a low-light or night vision lens, a beam projector and/or detector, a Radio Frequency (RF) beam projector and/or detector, and so forth. The images collectively represent the motion of the user 102. An exemplary capture device 106 includes a camera that may have associated therewith a computing system for processing images captured by the camera. The computing system may be built into or separate from the capture device 106. The images are processed to perform one or more controls or actions within one or more applications (e.g., application 210) associated with the user interface 104. In some embodiments, the capture device 106 is a camera associated with a game console. In other embodiments, the capture device 106 is a camera associated with a computing device 202 (such as a laptop computer) of the user 102.

The head orientation module 108 receives one or more images captured by the capture device 106. The images are sent to or made accessible to the head orientation module 108 in real-time or near real-time (e.g., as the images are captured and/or processed) to allow the user 102 responsive control of the user interface 104 based on the captured images. Head orientation module 108 represents any logic (e.g., implemented as software, hardware, or both software and hardware executed by computing device 202) for processing captured images. The head orientation module 108 and the capture device 106 may be disposed in the same hardware rack and communicate via a bus or other internal communication means. The head direction module 108 and the capture device 106 may also be implemented on the same semiconductor chip. In other embodiments, the head direction module 108 is local to the capture device 106, but not within the same chassis or on the same chip. In such embodiments, the capture device 106 and the head direction module 108 exchange data via any communication protocol or bus (e.g., a universal serial bus). In still other embodiments, the head direction module 108 is implemented as a cloud service that communicates with the capture device 106 and the user interface 104 via a network 110, such as the internet.

The head orientation module 108 operates as described herein to process images from the capture device 106 to control the user interface 104. In some embodiments, head direction module 108 generates head direction values that may be mapped to a portion of user interface 104. In other embodiments, head direction module 108 also maps or applies head direction values to user interface 104 to determine the control or action to perform. Head orientation module 108 may perform the determined control or action, or identify the determined control or action to another module (e.g., capture device 106 or a computing system).

In some embodiments, the user interface 104 includes a video card for displaying data to the user 102. The user interface 104 may also include computer-executable instructions (e.g., a driver) for operating the graphics card. Further, the user interface 104 represents a display (e.g., a television, a laptop display, or a touch screen display) and/or computer-executable instructions (e.g., a driver) for operating the display.

In a camera embodiment, the capture device 106, the head orientation module 108, and the user interface 104 are part of a mobile computing device, laptop, or other user computing device. For example, head direction module 108 is implemented as software executing on a user computing device. In such embodiments, the user computing device also includes one or more of the following to provide data to the user 102 or receive data from the user 102: a speaker, a sound card, a microphone, a vibration motor, one or more accelerometers, a bluetooth communication module, Global Positioning System (GPS) hardware, and a light sensitive sensor.

Referring next to fig. 2, an exemplary block diagram illustrates a computing device 202 for implementing head direction module 108. In some embodiments, the computing device 202 represents a system that differentiates between multiple user interface elements 214 based on head orientation. Computing device 202 represents any device executing instructions (e.g., application programs, operating system functions, or both) that implement the operations and functions associated with computing device 202. Computing device 202 may include a game console or other multimedia device. In some embodiments, the computing device 202 includes a mobile phone, laptop, tablet, computing pad, netbook, portable media player, desktop personal computer, kiosk, and/or desktop device. Additionally, computing device 202 may represent a group of processing units or other computing devices.

The computing device 202 has at least one processor 204 and a memory area 208. In some embodiments, the computing device 202 may also include a user interface 104. Processor 204 includes any number of processing units and is programmed to execute computer-executable instructions for implementing aspects of the invention. These instructions may be executed by processor 204 or by multiple processors executing within computing device 202, or by a processor external to computing device 202. In some embodiments, processor 204 is programmed to execute instructions such as those shown in the various figures (e.g., fig. 3 and 4).

Computing device 202 also has one or more computer-readable media, such as a memory area 208. The memory area 208 includes any number of media associated with the computing device 202 or accessible to the computing device 202. The memory area 208 may be internal to the computing device 202 (as shown in fig. 2), external to the computing device 202 (not shown), or both (not shown).

The memory area 208 stores one or more applications 210, and the like. The application 210, when executed by the processor 204, operates to perform functions on the computing device 202. Exemplary applications 210 include gaming applications and non-gaming applications. Non-gaming applications include, for example, mail applications, web browsers, calendar applications, address book applications, messaging programs, media applications, location-based services, search programs, and the like. The applications 210 may communicate with corresponding applications or services, such as web services accessible via, for example, the network 110. For example, the application 210 may represent a downloaded client-side application corresponding to a server-side service executing in the cloud.

The memory area 208 also stores one or more sets of reference points 212, such as set #1 of reference points to set # N of reference points. Each set 212 may include one or more of the fiducials. In some embodiments, each fiducial point comprises coordinates of a point on an image captured by the capture device 106. Exemplary coordinates are one-dimensional, two-dimensional, three-dimensional coordinates. In some embodiments, the image includes an object, and the fiducial points include a first fiducial point and a second fiducial point, the fiducial points being located on the object on opposite sides of a third fiducial point. In examples where the object comprises a face, the set of fiducial points 212 represents a set of facial fiducial points 401, each facial fiducial point comprising at least two eyes (e.g., the center of an eye or any of the corners of an eye) and a nose. In other examples, the facial fiducial points 401 correspond to other facial features such as the mouth or two corners of the mouth, ears, eyebrows, chin, etc. In some embodiments, each set of facial fiducial points 401 is derived from a single frame of video captured by the capture device 106.

The memory area 208 also stores one or more user interface elements 214. User interface elements 214 include any media elements for consumption by user 102, including, for example, menus, menu items, sound clips, video clips, and images. A user interface element 214 or a representation thereof or a counterpart thereof is displayed on the user interface 104.

The memory area 208 also stores one or more computer-executable components. Exemplary components include a communication interface component 216, a detection component 218, a score component 220, and an interaction component 222. The operation of the computer-executable components is described below with reference to FIG. 3.

Referring next to fig. 3, fig. 3 is an exemplary flowchart illustrating the operation of the head direction module 108 comparing the distances between the fiducials to calculate a head direction value. The computing device 202 implementing the head direction module 108 receives or obtains coordinates of a set 212 of at least three reference points representing a subject gazing at the plurality of user interface elements 214 at 302. Exemplary coordinates may take the form of (X, Y). Although some embodiments describe the subject as a human, aspects of the invention may be used to view any subject of the user interface element 214 (e.g., a robot). The computing device 202 receives the set of fiducials 212, for example, from the capture device 106 or a computing system associated with the capture device 106. In the example of fig. 3, the set of fiducials 212 includes at least a first fiducial and a second fiducial that are located on opposite sides of a third fiducial.

In some embodiments, the computing device 202 receives the coordinates of the first fiducial and the second fiducial, but does not receive the coordinates of the third fiducial. In the example of a face, the third fiducial point corresponds to a nose of the face. In such embodiments, the computing device 202 may instead receive the height of the subject and the distance of the subject from the capture device 106. Based on this information, the computing device 202 calculates a third reference point.

The coordinates of the received fiducial points may reflect the user's 102 calibration of the capture device 106 or the head orientation module 108. An exemplary calibration process centers the subject's field of view by establishing the center as a straight-ahead looking subject, the top as an upward looking subject, and the bottom as a downward looking subject.

At 304, the computing device 202 determines a first distance between the first reference point and the third reference point, and at 306 determines a second distance between the second reference point and the third reference point. In some embodiments, determining the distance includes determining a difference between the first reference point and the third reference point along a horizontal axis and/or a vertical axis. The difference along the horizontal axis determines whether the subject is gazing to the left or right. The difference along the vertical axis determines whether the subject is gazing up or down. Combinations of the above are also contemplated.

In some embodiments, the first distance along the horizontal axis is determined by the expression in equation (1) below, and the second distance along the vertical axis is determined by the expression in equation (2) below.

X_{Third reference point}-X_{First reference point}= first distance (1)

X_{Third reference point}-X_{Second reference point}= second distance (2)

In an embodiment in which the first, second, and third reference points correspond to the left, right, and nose, the first and second distances are determined by the following equations (3) and (4), respectively.

X_Nose–X_{Left eye}= first distance (3)

X_Nose–X_{Right eye}= second distance (4)

In an example, if the exemplary coordinate system shown in fig. 7 is used, the first reference point has coordinates (-0.5,0.1) and the third reference point has coordinates (0, -0.1), the first distance is determined to be "0- (-0.5) = 0.5" along the horizontal axis. If the second reference point has coordinates (0.8,0.2), the second distance is determined to be "0-0.8 = (-0.8)" along the horizontal axis.

At 308, the computing device 202 compares the determined first distance to the determined second distance to calculate a head direction value for the subject. The calculation of the head direction value may be performed in various ways. In some embodiments, the first distance and the second distance are added or subtracted. In other embodiments, the computing device 202 generates a weighted combination of the first distance and the second distance. In such embodiments, the first fiducial or the second fiducial may be given greater weight, for example, based on the height of the subject, the distance of the subject from the capture device 106, or the features in the subject to which the fiducials correspond (e.g., eyes, ears, mouth, etc.). The head direction values may also represent a combination of head direction values (e.g., average, mean or median, weighted, or otherwise) computed for multiple captured images.

In some embodiments, the head direction value is calculated using equation (5) below.

First distance + second distance = head direction value (5)

Continuing with the above example of the first reference point and the second reference point, the computing device 202 calculates the head direction value along the horizontal axis as "0.5 + (-0.8) = (-0.3)". If coordinates (0,0) are used for the center of the user interface 104, if the camera is facing the subject, and if the camera is not flipping the captured image centered on the vertical axis, the subject is therefore determined to be gazing to the right of the center on the user interface 104. Alternatively, some capture devices 106 used with the present invention may flip the captured image centered on a vertical axis. In such an alternative embodiment, the subject is determined to be gazing to the right of the user interface 104 if the calculated head direction value is positive, and the subject is determined to be gazing to the left of the user interface 104 if the calculated head direction value is negative.

Alternatively or additionally, the computing device 202 calculates head direction values along a vertical axis. In this example, the head direction value may be determined as shown in equation (6) below.

Y_Nose–((Y_{Left eye}+Y_{Right eye})/2)(6)

In equation (6), the vertical axis coordinates of both eyes are averaged and compared with the vertical axis coordinate of the nose. If the calculated head direction value is positive, the subject is determined to be gazing toward an upper portion of the user interface 104, and if the calculated head direction value is negative, the subject is determined to be gazing toward a lower portion of the user interface 104.

At 310, the computing device 202 selects at least one of the user interface elements 214 displayed in the user interface 104 based on the calculated head direction value. The head direction value is mapped or overlaid onto the user interface 104 to identify at least one of the user interface elements 214 displayed in the user interface 104. For example, a coordinate system for defining a fiducial point is applied to the user interface 104 to correlate the head direction value with at least one of the user interface elements 214. In the above example, the head direction value (-0.3) is mapped to a point or region on a menu, icon, text, avatar, or other displayed user interface element 214. In some embodiments, the mapped point is indicated by a circle with a boundary that is the progress indicator at the focus. When the progress indicator completes the rotation around the circle without changing the focus, an action is taken on the user interface element 214 at the focus.

In some embodiments, the position of the circle or ball on the user interface may be determined using equation (7) and equation (8) below.

Ball point X = center point X + X hair (molt) factor (direction value X-original calibration direction X)

(7)

Ball point, Y = center point, Y + Y hair-changing factor (direction, value, Y-original calibration direction, Y)

(8)

In the above equation, the "ball point" variable represents a circle to be drawn, the "center point" variable represents coordinates of the center of the user interface, the "X hair-changing factor" and "Y hair-changing factor" variables represent factors that accelerate or decelerate movement of the circle or ball along the X and Y axes, respectively, and the "original calibration direction" variable represents calibration coordinate values.

The selected user interface element 214 is distinguished from other unselected user interface elements 214. For example, the selected user interface element 214 may be highlighted, enlarged, outlined, animated, or otherwise altered (e.g., undergo a color change). Further, selecting a user interface element 214 may also activate the selected user interface element 214 or otherwise cause an action associated with the selected user interface element 214 to be performed. For example, the computing device 202 may execute one of the applications 210 that corresponds to the selected user interface element 214 or is represented by the selected user interface element 214.

In some embodiments, the computer-executable components shown in FIG. 2 may be executed to implement the operations shown in FIG. 3. For example, the communication interface component 216, when executed by the processor 204, causes the processor 204 to receive coordinates representing the first set 212 of at least three reference points from a first video frame gazing at a subject of the plurality of user interface elements 214. The set 212 includes a first fiducial and a second fiducial on opposite sides of a third fiducial. In some embodiments, the communication interface component 216 includes a network interface card and/or computer-executable instructions (e.g., a driver) for operating the network interface card.

The detection assembly 218, when executed by the processor 204, causes the processor 204 to determine a first distance between the first reference point and the third reference point, and determine a second distance between the second reference point and the third reference point. The score component 220, when executed by the processor 204, causes the processor 204 to compare the first distance determined by the detection component 218 with the second distance determined by the detection component 218 to calculate a first head direction value for the subject.

The detection component 218 and the score component 220 then operate on the second set 212 of at least three fiducial points from the second video frame to calculate a second head direction value for the subject. The interaction component 222, when executed by the processor 204, causes the processor 204 to select at least one of the plurality of user interface elements 214 based on a comparison between the first head direction value calculated by the score component 220 and the second head direction value calculated by the score component 220. For example, if user interface elements 214 represent a menu, interaction component 222 selects one of user interface elements 214 to navigate the menu.

In some embodiments, the communication interface receives additional input from the subject, such as a predefined gesture or voice command. In such embodiments, the interaction component 222 selects the user interface element 214 based on the comparison and the received, predefined gesture. For example, the predefined gesture may include an arm movement, an eye blink, or other predefined gesture to confirm selection of one of the user interface elements 214 that is highlighted based on the subject's current focus.

In another example, the score component 220 may detect a pause from the subject while the subject is focused on a particular user interface element 214. A pause may be detected by calculating and comparing head direction values for multiple video frames of the subject. For example, the computing device 202 may receive a plurality of streaming video frames that view the body of the user interface 104. The score component 220 detects a pause if a difference between the head direction values over a period of time (e.g., a defined number of video frames or a defined subset of video frames) satisfies a threshold. The threshold may, for example, correspond to a limit on the error of subject movement (e.g., determined from subject calibration with computing device 202, or set by user 102 or computing device 202). For example, if the difference between the head direction values exceeds a threshold, the score component 220 concludes that a pause occurred.

If the score component 220 detects a pause, the action associated with the user interface element 214 selected by the interaction component 222 is performed. For example, one of the applications 210 associated with the selected user interface element 214 is executed.

Referring next to fig. 4, an exemplary flow chart illustrates operation of head direction module 108 applying a head direction value calculated using two eyes and a nose to user interface 104 to identify one of displayed user interface elements 214. The head direction module 108 accesses 402 a set of facial fiducial points 401. The head direction module 108 determines a first distance between a first of the two eyes and the nose and determines a second distance between a second of the two eyes and the nose at 404. At 406, the determined first distance is compared to the determined second distance to calculate a head direction value. At 408, head direction module 108 applies the calculated head direction value to user interface 104 to identify at least one of user interface elements 214 displayed by user interface 104.

Head direction module 108 may also distinguish identified user interface elements 214 on user interface 104 among one or more user interface elements 214 and also perform actions associated with the identified user interface elements 214, as described herein.

Referring next to fig. 5, an exemplary face 502 shows the distance between the two eyes and the nose. The face 502 in fig. 5, for example, represents the user 102 viewing the user interface 104. The capture device 106 may be a game console or a user computing device such as a laptop with a camera. In some embodiments, the face 502 is isolated from a larger image captured by the capture device 106 (e.g., from among a plurality of faces in the larger image, the face 502 is selected for analysis). Image processing may be performed by head orientation module 108 or a component separate from, but capable of communicating with, head orientation module 108.

For example, the capture device 106 may capture images of several users 102 in a room. Software and/or hardware image processing identifies the user 102 by creating a skeletal representation of the user 102. One of the skeletal representations is selected for tracking. For example, one of the users 102 raises a hand during a calibration process, and that user 102 may be designated as the user 102 to be tracked.

After removing the non-designated users 102 from the captured image, the face 502 of the designated user 102 may be isolated, cropped, and analyzed to identify the set of fiducial points 212. In this example, the set of fiducial points 212 includes two eyes and a nose, as shown in fig. 5. The coordinates of the two eyes and nose are provided to the head direction module 108 for processing. Alternatively, the head direction module 108 may identify the set of fiducial points 212.

As described with reference to fig. 3 and 4, the head direction module 108 calculates the first distance and the second distance. In the example of fig. 5, the first and second distances are shown as D1 and D2, respectively. Distances D1 and D2 are measured on the face along the horizontal axis. In other embodiments (not shown), the distances D1 and D2 may be measured as "lines of sight" between each eye and the nose.

Referring next to fig. 6, an exemplary screenshot of a user interface 602 of a game console illustrates selection of at least one of the user interface elements based on a calculated head direction value of the user 102. The user interface 602 shown in FIG. 6 illustrates several menu options (e.g., user interface elements) including "My Account", "store", "Game", and "Community". The "games" menu option includes several game options for the user 102 to select. In the example of fig. 6, user 102 is gazing at user interface element 604 corresponding to game # 3.

The sensed focus of the user 102 calculated from the head direction values remains fixed on the user interface element 604 for a sufficient time to cause the computing device 202 to zoom in and highlight the user interface element 604. If the user 102 pauses for a longer time, or provides additional gestures (e.g., nodding head, blinking eyes, lifting hands, etc.), the computing device 202 will execute game #3 corresponding to the user interface element 604.

Referring next to FIG. 7, an exemplary user interface 702 illustrates a coordinate system for use with embodiments of the present invention. In the example of fig. 7, the center point of the user interface 702 has coordinates (0, 0). The lower left corner has coordinates (-1, -1) and the upper right corner has coordinates (1, 1). The coordinate system shown in FIG. 7 may be applied to fiducial points in an image captured by the capture device 106 and to head direction values calculated by the head direction module 108. In other embodiments, such as embodiments in which the capture device 106 flips the image centered on the vertical axis, the positive and negative values shown in FIG. 7 are reversed (e.g., -1 in the lower corner and (1,1) in the upper corner). In such an embodiment, the user 102 looks to the right at a head direction value of 7.31 and looks to the left at a head direction value of-6.11.

Other examples

Aspects of the present invention improve playability of a game by, for example, allowing a user 102 to control characters in the game by glancing at corners, selecting cars or weapons, and/or navigating terrain.

In an example, the user 102 scans article titles and article summaries in a digital newspaper. While the user 102 pauses on a particular headline, the computing device 202 increases the font of the article abstract, making it easier for the user 102 to read, while at the same time decreasing the font of other article abstracts or headlines. After the user 102 scans over another article title or abstract, the font of the previously enlarged article returns to its previous size.

Some embodiments of the present invention compensate for head tilt based on the coordinates of the angle of head tilt. After determining the tilt angle, the coordinates of the eyes, nose, or other reference points are adjusted based on the determined tilt angle to allow for accurate calculation of the head direction value.

At least a portion of the functionality of the elements in fig. 1 and 2 may be performed by other elements in fig. 1 or 2 or entities not shown in fig. 1 or 2 (e.g., processors, web services, servers, applications, computing devices, etc.).

In some embodiments, the operations illustrated in fig. 3 and 4 may be implemented in software instructions encoded on a computer-readable medium, in hardware programmed or designed to perform the operations, or both. For example, aspects of the invention may be implemented as a system on a chip.

Although aspects of the present disclosure do not track personally identifiable information, embodiments are described with reference to data monitored and/or collected from users 102. In such embodiments, the user 102 is provided with a notification of the collected data (e.g., via a dialog box or preference setting) and the user 102 is given the opportunity to give consent or denial to the monitoring and/or collection. The consent may take the form of opt-in consent or opt-out consent.

Exemplary operating Environment

Exemplary computer readable media include flash drives, Digital Versatile Disks (DVDs), Compact Disks (CDs), floppy disks and magnetic cassettes. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media excludes propagated data signals. In some embodiments, the computer storage medium is implemented in hardware. Exemplary computer storage media include hard disks, flash drives, and other solid state memory. In contrast, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Although described in connection with an exemplary computing system environment, embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with aspects of the invention include, but are not limited to: mobile computing devices, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, gaming consoles, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile telephones, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

Embodiments of the invention may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. The computer-executable instructions may be organized into one or more computer-executable components or modules. Generally, program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types. Aspects of the invention may be implemented with any number and organization of such components or modules. For example, aspects of the invention are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other embodiments of the invention may include different computer-executable instructions or components having more or less functionality than illustrated and described herein.

Aspects of the invention transform a general-purpose computer into a special-purpose computing device when the general-purpose computer is configured to execute the instructions described herein.

The embodiments shown and described herein, as well as embodiments not specifically described herein but within the scope of aspects of the invention, constitute exemplary means for distinguishing between multiple user interface elements 214 based on the relative positions of the eyes and nose, as well as exemplary means for determining head orientation to navigate menus on user interface 104.

The order of execution or performance of the operations in embodiments of the invention illustrated and described herein is not essential, unless otherwise specified. That is, the operations may be performed in any order, unless otherwise specified, and embodiments of the invention may include additional or fewer operations than those disclosed herein. For example, it is within the scope of aspects of the invention to perform one operation before, at the same time, or after another operation.

When introducing elements of aspects of the invention or the embodiments thereof, "a," "an," and "the" are intended to mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements.

Having described aspects of the invention in detail, it will be apparent that various modifications can be made without departing from the scope of aspects of the invention as defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the invention, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.

Claims

1. A method for distinguishing between a plurality of user interface elements based on head orientation, the method comprising:

accessing a set of facial fiducial points stored in a memory area associated with a computing device, the memory area storing a set of facial fiducial points comprising coordinates of at least two eyes and one nose, the memory area further storing a plurality of user interface elements for display on a user interface;

determining a first distance between a first one of the eyes and the nose;

determining a second distance between a second one of the eyes and the nose;

comparing the determined first distance with the determined second distance to calculate a head direction value; and

applying the calculated head direction value to the user interface to identify at least one of the one or more user interface elements displayed by the user interface.

2. The method of claim 1, further comprising distinguishing the identified user interface elements on the user interface between the one or more user interface elements, wherein distinguishing the identified user interface elements comprises one or more of: highlighting the identified user interface element, zooming in on the identified user interface element, outlining the identified user interface element, changing a color of the identified user interface element, and animating the identified user interface element.

3. The method of claim 1, further comprising performing an action associated with the identified user interface element, and further comprising providing the set of facial fiducials.

4. The method of claim 1, further comprising:

differentiating between the plurality of user interface elements based on the relative positions of the eyes and nose; and

determining a head direction to navigate a menu on the user interface.

5. A method for distinguishing between a plurality of user interface elements based on head orientation, comprising:

receiving coordinates representing a set of at least three fiducial points in an image of a subject gazing at a plurality of user interface elements, the set including a first fiducial point and a second fiducial point located on opposite sides of a third fiducial point;

determining a first distance between the first reference point and the third reference point;

determining a second distance between the second reference point and the third reference point;

comparing the determined first distance with the determined second distance to calculate a head direction value for the subject; and

selecting at least one of the plurality of user interface elements based on the calculated head direction value.

6. The method of claim 5, wherein the subject is viewed by a camera and receiving coordinates comprises receiving a height of the subject and a distance of the subject from the camera, and further comprising calculating the third reference point based on the height and distance.

7. The method of claim 5, wherein receiving the coordinates comprises receiving two-dimensional coordinates for each of the fiducial points, wherein determining a first distance comprises calculating X_Nose–X_{Left eye}Wherein determining the two distances comprises calculating X_Nose–X_{Right eye}And wherein calculating the head direction value comprises calculating X_Nose–X_{Left eye}+X_Nose–X_{Right eye}Wherein X is_Nose、X_{Left eye}And X_{Right eye}Two-dimensional coordinates of a nose, a left eye, and a right eye, respectively, wherein selecting at least one of the plurality of user interface elements comprises: selecting at least one of the plurality of user interface elements on a right side of the user interface if the calculated head direction value is regular and selecting at least one of the plurality of user interface elements on a left side of the user interface if the calculated head direction value is negative.

8. The method of claim 5, wherein the image comprises a plurality of faces, and further comprising selecting one of the faces for analysis, wherein selecting at least one of the plurality of user interface elements comprises: selecting at least one of the plurality of user interface elements that is in an upper portion of the user interface if the calculated head direction value is positive, and selecting at least one of the plurality of user interface elements that is in a lower portion of the user interface if the calculated head direction value is negative.

9. The method of claim 5, further comprising receiving a plurality of streaming video frames gazing at a subject of the plurality of user interface elements, wherein the receiving coordinates representing a set of at least three reference points in an image gazing at a subject of a plurality of user interface elements, the determining a first distance between the first reference point and the third reference point, the determining a second distance between the second reference point and the third reference point, and the comparing the determined first distance to the determined second distance are performed for a subset of the received video frames to calculate the head direction value.

10. A system for distinguishing between a plurality of user interface elements based on head orientation, the system comprising:

a communication interface component for receiving coordinates representing a first set of at least three reference points from a first video frame gazing at a subject of a plurality of user interface elements, the set including a first reference point and a second reference point located on opposite sides of a third reference point;

a detection assembly for determining a first distance between the first reference point and the third reference point and determining a second distance between the second reference point and the third reference point;

a score component to compare a first distance determined by the detection component to a second distance determined by the detection component to calculate a first head orientation value for the subject, wherein the detection component and the score component subsequently operate on a second set of at least three fiducial points from a second video frame to calculate a second head orientation value for the subject; and

an interaction component for selecting at least one of the plurality of user interface elements based on a comparison between a first head direction value calculated by the score component and a second head direction value calculated by the score component.