WO2025106065A1

WO2025106065A1 - Gaze-based user interface control with localized gaze search

Info

Publication number: WO2025106065A1
Application number: PCT/US2023/079611
Authority: WO
Inventors: Dongeek Shin
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2023-11-14
Filing date: 2023-11-14
Publication date: 2025-05-22
Anticipated expiration: 2026-05-14

Abstract

A first gaze point may be identified within a user interface (UI) that includes a set of UI elements included within a corresponding set of bounding areas, so that the first gaze point is within a first bounding area of the set of bounding areas. A first search subset of the set of bounding areas may be selected, with the first search subset at least partially surrounding the first gaze point and the first bounding area. A second gaze point may be identified within a second bounding area of the first search subset, based on a search of the first search subset. A second search subset of the set of bounding areas may be selected, with the second search subset at least partially surrounding the second gaze point and the second bounding area, and overlapping with the first search subset. An identified UI element may thus be determined.

Description

GAZE-BASED USER INTERFACE CONTROL WITH

LOCALIZED GAZE SEARCH

TECHNICAL FIELD

[0001] This description relates to input/output (I/O) techniques for wearable devices.

BACKGROUND

[0002] Wearable devices, such as head-mounted devices (HMDs), provide various types of I/O techniques that differ from traditional keyboard and mouse techniques, and that utilize features of the wearable devices themselves. For example, HMDs may leverage built- in cameras to track an eye gaze of a user/wearer, then use the results of such eye gaze tracking as an I/O mechanism to enable, e.g., user interface (UI) icon selection, or other interactions between the user and the HMD.

SUMMARY

[0003] In a general aspect, a method includes identifying a first gaze point within a user interface (UI) that includes a set of UI elements included within a corresponding set of bounding areas, the first gaze point being included within a first bounding area of the set of bounding areas. The method includes selecting a first search subset of the set of bounding areas, the first search subset at least partially surrounding the first gaze point and the first bounding area and identifying a second gaze point within a second bounding area of the first search subset, based on a search of the first search subset. The method includes selecting a second search subset of the set of bounding areas, the second search subset at least partially surrounding the second gaze point and the second bounding area, and overlapping with the first search subset, and determining an identified UI element of the set of UI elements that is included within the second bounding area.

[0004] In another general aspect, a head mounted device (HMD) includes at least one frame, at least one gaze tracker including an image sensor mounted on the at least one frame, at least one processor, and at least one memory, the at least one memory storing a set of instructions. When executed, the instructions cause the at least one processor to identify a first gaze point within a user interface (UI) that includes a set of UI elements included within a corresponding set of bounding areas, the first gaze point being included within a first bounding area of the set of bounding areas. When executed, the instructions cause the at least one processor to select a first search subset of the set of bounding areas, the first search subset at least partially surrounding the first gaze point and the first bounding area, and identify a second gaze point within a second bounding area of the first search subset, based on a search of the first search subset. When executed, the instructions cause the at least one processor to select a second search subset of the set of bounding areas, the second search subset at least partially surrounding the second gaze point and the second bounding area, and overlapping with the first search subset, and determine an identified UI element of the set of UI elements that is included within the second bounding area.

[0005] In another general aspect, a non-transitory computer-readable medium may store executable instructions that when executed by at least one processor cause the at least one processor to identify a first gaze point within a user interface (UI) that includes a set of UI elements included within a corresponding set of bounding areas, the first gaze point being included within a first bounding area of the set of bounding areas. When executed, the instructions may cause the at least one processor to select a first search subset of the set of bounding areas, the first search subset at least partially surrounding the first gaze point and the first bounding area, and identify a second gaze point within a second bounding area of the first search subset, based on a search of the first search subset. When executed, the instructions may cause the at least one processor to select a second search subset of the set of bounding areas, the second search subset at least partially surrounding the second gaze point and the second bounding area, and overlapping with the first search subset, and determine an identified UI element of the set of UI elements that is included within the second bounding area.

[0006] The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] FIG. 1 is a block diagram of a system for gaze-based user interface control with localized gaze search.

[0008] FIG. 2 is a flowchart illustrating example operations of the system of FIG. 1 .

[0009] FIG. 3 is a block diagram of an example system that may be used in the system of FIG. 1.

[0010] FIG. 4A illustrates a first example of a modified search space that may be used in the examples of FIGS. 1-3.

[0011] FIG. 4B illustrates a second example of a modified search space that may be used in the examples of FIGS. 1-3.

[0012] FIG. 5 illustrates an example implementation with boundary areas generated as Voronoi regions.

[0013] FIG. 6 is a flowchart illustrating more detailed example operations of the systems of FIGS. 1 and 3.

[0014] FIG. 7 is a third person view of a user in an ambient computing environment.

[0015] FIGS. 8 A and 8B illustrate front and rear views of an example implementation of a pair of smartglasses.

DETAILED DESCRIPTION

[0016] Described systems and techniques enable UI control using gaze tracking of a gaze point of a user, as detected by a HMD being worn by the user. As a result, for example, UI control may be enabled or enhanced in a responsive, accurate, and reliable manner, with efficient use of available computing resources.

[0017] Gaze tracking provides the potential for convenient, intuitive UI control. Conventional gaze tracking techniques, however, may not provide sufficient or desired levels of accuracy or responsiveness, particularly when a UI being controlled includes a large number and/or dense arrangement of UI elements.

[0018] For example, some conventional gaze tracking systems attempt to continuously track a gaze point of a user in a coordinate space of a UI being controlled. Such approaches may be challenging, e.g., because many users may have high degrees of jitter or variance in their gaze patterns, which may make continuous gaze point tracking challenging. For example, when two UI elements are very close within a UI. it may be difficult for some users to select a desired one of the UI elements.

[0019] Moreover, computations required for continuous gaze point tracking may limit a responsiveness of such systems. As a result, users may experience a delay or lag in their use of a UI control element, which may cause frustration and inconvenience when interacting with a UI.

[0020] It is also possible to use a quantized or discrete approach to gaze tracking. For example, a UI with four selectable elements may be divided into four quadrants, with a selectable element in each of the four quadrants. Then, detection of a gaze point within one of the four quadrants may be determined to correspond to, or indicate, selection of the UI element within that quadrant. In other words, each UI element may be provided or associated with a surrounding area, so that selection of the surrounding area is tantamount to selection of the included UI element. As a result, accuracy requirements may be reduced, and a selection accuracy may be improved. More generally, UI elements of a UI may each be provided with a bounding area that surrounds the corresponding UI element, so that a UI control element may be validated with respect to the bounding area(s), rather than (or in addition to, or in conjunction with) the UI element(s).

[0021] The concept of using such a bounding area occurs in various UI control contexts, and such bounding areas may be referred to using various terminologies. For example, in the context of collision detection (e.g., when determining a collision between two elements within a video game), such a bounding area may be referred to as a hitbox. As used herein, the terms hitbox, cell, bounding box, bounding area, and similar terms, should all be understood to refer to UI areas surrounding corresponding UI elements. Such areas may be defined in virtually any desired shape, and need not be limited to a box, circle, or any regular or uniform shape. Moreover, multiple types of such areas may be used within the context of a single UI, and may be designed in a manner optimized for the UI in question.

[0022] Then, gaze point tracking may be performed, for example, by examining received image frames to detect a gaze point location within each such image frame. For example, in some gaze tracking systems, infrared light is projected onto a user's eye, and an image is captured of the subsequently reflected light and aligned with a UI being controlled.

[0023] Thus, for example, a first image frame may be examined (e.g., searched) to locate a gaze point with respect to a UI, or, more specifically, with respect to the various bounding areas (and included UI elements) defined with respect to the UI. As referenced above, such an approach enables relation of a UI selection element (e.g., a cursor or pointer) displayed on the UI with the detected gaze point, and with a desired UI element(s).

[0024] In other words, using the above techniques, the UI selection element detected within a particular bounding area may be related to the UI element within that bounding area. This process may be repeated for a subsequent image frame(s), so that the UI selection element may effectively be tracked over time and with respect to the various UI elements of the UI.

[0025] Such methods may be effective in many scenarios, e.g., for UIs with relatively small numbers of UI elements. In other scenarios, however, such as UIs with a large number of UI elements and/or UIs with closely spaced UI elements, a search time needed to search defined bounding areas and locate the UI selection element may be unacceptably long, and may result in delays in response time that may be frustrating or inconvenient for users.

[0026] Described techniques therefore define a first search subset of bounding areas that are within a determined maximum eye movement range, e.g., the farthest that a gaze point is estimated to possibly move between consecutive gaze point determinations. Then, by definition of the search subset, a second/subsequent gaze point location should be within the search subset. Therefore, in a second image frame at a second time, a second gaze point location may be used to define a second search subset that overlaps the first search subset. Put another way, consecutive search subsets provide “marching islands’’ or search spaces of bounding areas, so that each search process is constrained by the number of bounding areas in each subset, and it is not necessary to search an entirety of the bounding areas of the UI.

[0027] Additionally, the size(s) and/or shape(s) of the search subsets may be dynamically updated for further optimizations, e.g., based on a speed and/or direction of the eye movements, so that searching is further optimized to occur only in subsets of bounding areas most likely to contain a desired UI element(s). For example, for relatively quick eye movements, the search subset may be elongated in a direction(s) of the eye movements. Other optimizations, some of which are described below, are also available, so that described techniques provide fast, responsive, reliable, and efficient gaze tracking for UI control, even for UIs with large numbers of densely provided UI elements.

[0028] FIG. 1 is a block diagram of a system for gaze-based user interface control with localized gaze search. In the example of FIG. 1, a HMD 102 is illustrated as being worn by a user 104. The HMD 102 is illustrated as generating, or otherwise being associated with, a user interface (UI) 106. As described in detail, below, FIG. 1 illustrates the UI 106 at a first time ti as UI 106a, and illustrates the UI 106 at a second time t2 as UI 106b. In the following description, the UI 106 at the first time ti is thus referred to as UI 106a. while the UI 106 at the second time t2 is referred to as UI 106b.

[0029] The HMD 102 should be understood to represent any device that may be worn on a head of the user 104, and that may be configured to provide the resources and features illustrated in the exploded view of FIG. 1 (which are described in more detail, below).

Various examples of the HMD 102 are illustrated and described, e.g., with respect to FIGS. 3, 7, and FIGS. 8A and 8B, including various types of smartglasses or goggles.

[0030] The UI 106 should thus be understood to represent any UI that is controllable by, e.g., in communication with, the HMD 102. For example, in many of the following examples, the UI 106 is described as a UI that is projected and otherwise rendered by the HMD 102 itself, such as when the UI is shown on a display of the smartglasses of FIGS. 8 A and 8B. In other examples, however, the UI 106 may be generated by a separate device, such as a smartphone, or using a stand-alone monitor or display.

[0031] Accordingly, the UI 106 should be understood to represent any 2D or 3D UI with which the user 104 may interact. That is, in addition to 2D examples that may occur with respect to a smartphone or stand-alone monitor, the UI 106 may provide a panoramic or 3D view. In some examples, an immersive 3D experience may be provided, e.g., through the use of smartglasses or googles. A more specific example of the UI 106 is provided below, with respect to FIG. 3.

[0032] Thus, the UI 106 should be understood to include multiple selectable UI elements that may be selected by the user 104. Such UI elements may be located in any location specified by a designer of the UI 106, or, in some cases in which the UI 106 is configurable, may be positioned or arranged at least in part by the user 104.

[0033] As described above, the UI 106 may be provided with a plurality of bounding areas 107, with each bounding area of the bounding areas 107 encompassing or enclosing a corresponding UI element. For example, in the simplified example of FIG. 1. the bounding areas 107 are illustrated as a grid of 10x10 bounding areas. For the sake of the simplified example of FIG. 1, the UI 106 is illustrated as including a UI element 108 within a corresponding bounding area 110. Remaining UI elements are not illustrated in FIG. 1, for the sake of clarity and simplicity.

[0034] UI element 108, and remaining UI elements not illustrated in FIG. 1, should be understood to represent any UI element that may be included in the types of examples of the UI 106 just referenced, or similar examples, and that are used to control function(s) of the UI 106. For example, the UI element(s) 108 may represent control elements or navigational elements including, but not limited to. buttons, checkboxes, toggles, text fields, links, highlights, tabs, bars/sliders, menus, or any suitable ty pe of icon(s) available in a display environment of the UI 106.

[0035] For example, the UI element 108 and similar UI elements may represent explicit selection elements intended to give the user 104 the option of making a selection in a context of the UI 106, such as to advance to a subsequent screen of the UI 106. More generally, the HMD 102 may be used to access and/or provide many different applications, so that the UI element(s) 108 may represent any UI element needed to implement a corresponding function of such an application(s).

[0036] For purposes of ease of explanation and understanding, in the remainder of the discussion of the simplified example of FIG. 1, the HMD 102 is generally described or referenced as smartglasses or goggles, with the UI 106 described as being displayed by display-related components of the smartglasses/goggles. As just referenced, however, such examples are non-limiting, and various other types and combinations of HMDs, UIs, and/or other devices may be used to implement the system of FIG. 1, some of which are illustrated and described, below.

[0037] The bounding areas 107 are illustrated in FIG. 1 as regular, uniformly sized/spaced bounding areas, therefore corresponding to uniformly sized and spaced UI elements. Individual ones of the bounding areas 107, such as the bounding area 110, when square as in FIG. 1, may be referred to as a bounding box, hitbox, gaze box, or using similar terminology. Of course, in various example implementations, UI elements may be sized and spaced in many different configurations, so that corresponding bounding areas may be generated accordingly, in which case terminology more descriptive of resulting bounding areas may be used. Further details and examples related to generation of the bounding areas 107 are provided below in the context of FIG. 1, as well as with respect to FIGS. 3 and 5.

[0038] As referenced above, the bounding areas 107, including the bounding area 110, may represent invisible areas surrounding a corresponding UI element, although boundaries of the bounding areas 107 are not required to be invisible. Each of the bounding areas 107 may be used to determine collisions or other interactions between an included UI element (such as the UI element 108), and any UI element controlled by the user 104 via the HMD 102.

[0039] In the UI 106a at time ti, a gaze point 112a represents a tracked gaze of the user 104 within a coordinate space of the UI 106a. Example details related to the generation of the gaze point 112a are provided below, but, as referenced above, the gaze point 112a generally represents an estimated location within the UI 106a that aligns with a current gaze of the user 104. In other words, the gaze point 112a represents an estimate of a location within the UI 106a at which the user 104 is looking at the time ti.

[0040] For example, the gaze point 112a may be determined in conjunction with an image of an eye of the user 104, captured by the HMD 102. In other words, the UI 106a may correspond to an eye image captured at the time ti, while the UI 106b may correspond to an eye image captured at the time t2. In the following description, therefore, the UI 106a may be referenced with respect to an image frame captured at time ti, while the UI 106b may be referenced with respect to an image frame captured at time t2.

[0041] As referenced above, use of the bounding areas 107 enables discrete or quantized tracking of the gaze point 112a with respect to UI elements of the UI 106. In other words, it is possible to attempt to continuously track the gaze point 112a and relate the gaze point 112a to corresponding locations of the various UI elements and intervening locations/ areas between the various UI elements. However, such approaches may be computationally demanding, and may be prone to failures in accuracy that lead to false or undesired selections of UI elements, which may be frustrating and inconvenient for the user 104. In contrast, use of the bounding areas 107 relaxes accuracy constraints, since it is only necessary to determine whether the gaze point 112a is within a particular bounding area 113a of the bounding areas 107, as compared to determining the location of the gaze point 112a wi th respect to an individual UI element, such as the UI element 108.

[0042] More specific examples of techniques for determining the location of the gaze point 112a are provided below. In general, however, the HMD 102 may be configured to perform a check with respect to each of the bounding areas 107 in order to determine whether the gaze point 112a is within the bounding area being checked at the time ti, and this process may be repeated for each of the bounding areas 107 in order to validate a location of the gaze point 112a within the specific bounding area 113a for the time ti.

[0043] It is possible to locate the gaze point 112a by searching or otherwise inspecting each of the 10 x 10 = 100 bounding areas of the UI 106a. More generally, it is possible to locate a gaze point by performing a gaze point validation with respect to all N elements of a UI. Such a search process therefore increases linearly in time required as the number N of elements increases.

[0044] However, saccadic movements of the user 104 may occur, e g., on the order of individual milliseconds. Consequently, as a number of UI elements of the UI 106 increase, it may be impossible or impractical to locate and validate the gaze point 112a with respect to the bounding area 113a in a sufficiently fast, reliable, and/or accurate manner by performing a full search of the bounding areas 107.

[0045] Therefore, in the system of a FIG. 1, a first search subset 114 of bounding areas may be determined with respect to the gaze point 112a at the time ti, to facilitate and optimize determination of (e.g., search for) the gaze point 112b within a bounding area 113b at the time t2. Then, the gaze point 112b may be similarly used to determine a second search subset 116 of bounding areas, where, as illustrated in FIG. 1 and described in more detail, below, an overlapping subset 118 refers to bounding areas that are included in both the first search subset 114 and the second search subset 116. Thus, in FIG. 1, the first search subset 114 is illustrated with a first hatching pattern, while the second search subset 116 is shown with a second hatching pattern, so the overlapping subset 118 is shown with a cross-hatching patern to indicate the bounding areas of overlap. It should be understood from the present description that it is not necessary or required to maintain an identification of the first search subset 114 once the second gaze point 112b is located and validated with respect to the bounding area 113b, but the first search subset 114 is illustrated in the UI 106b at least for purposes of illustrating a progression of the search subsets 114, 116, and subsequent/future search subsets, as shown in FIGS. 4A and 4B.

[0046] Specifically, although not shown separately in FIG. 1 but illustrated in the example implementations of FIGS. 4 A and 4B, below, the second search subset 116 may be used to identify a third gaze point at a third time t3, which is present within the second subset 116 and which may then be used to identify a third search subset (overlapping with the second subset 116), as also shown separately in FIGS. 4A and 4B. In this way, a new search subset may be identified each time a new gaze point is detected, and iterations may continue, e.g., until a selection of a particular UI element is received, as show n in the example flowchart of FIG. 6.

[0047] As referenced above, each search subset of bounding areas (e.g., search subsets 114, 116) may be determined based on or using one or more assumptions and/or measurements characterizing eye movements of the user 104 between the time ti and the time t2. One or more of multiple techniques may be used to characterize such eye movements for purposes of generating the search subsets 114. 116.

[0048] For example, a saccade generally refers to a rapid eye movement between fixation points, and a saccadic amplitude refers to a measurement of angular distance traveled by an eye during a given movement. Therefore, an angular speed of the eye may be measured in units of angles/second.

[0049] Eye movements may be characterized in other units and/or using other techniques, as well. For example, eye movements may be characterized in terms of an absolute or relative distance traveled across the UI 106.

[0050] Additionally, or alternatively, characteristics of the UI 106 may be used to facilitate eye movement characterizations. For example, a distribution of selectable UI elements within the UI 106 may be used to make assumptions regarding an extent to which different types or extents of eye movements may occur.

[0051] Further, eye movement characterizations may be made with respect to a population of users as a whole, such as when a maximum or average human saccadic amplitude is used as a basis for eye movement characterizations. In other examples, eye movement characterizations may be made with respect to a smaller user group, or with respect to an individual user.

[0052] In case of an error in determining the search subsets 114, 116, and/or in the event of a statistically unlikely eye movement by the user 104, or some other unanticipated occurrence, it is possible that a gaze point may occur that is outside of a calculated search subset. For example, in FIG. 1, it could occur that the gaze point occurs within a bounding area 117, which is outside of the second search subset 116.

[0053] In such cases, the gaze point would not be detected by searching within the second search subset 116, and one or more various techniques may be used to detect the gaze point in such a scenario. For example, a comprehensive linear search of the UI 106 may be performed to re-locate the gaze point in such a scenario, after which future search subsets may be defined (expanded) to be larger than the example of the search subsets 114, 116 in FIG. 1. In other examples, rather than searching an entirety of the UI space (e.g., all available bounding areas 107), the second search subset 116 may be incrementally expanded until the gaze point is located.

[0054] Thus, FIG. 1 illustrates the use of dynamic, iteratively calculated search subsets of bounding areas 107 to validate a presence of gaze points with respect to one or more of the bounding areas 107, and thereby to corresponding ones of UI elements of the UI 106, without having to analyze an entire gaze space of the UI 106. As a result, the gaze space of the UI 106 may be discretized to determine whether a gaze estimate occurs within one of the bounding areas 107, in a fast, accurate, and reliable manner, e.g., able to provide gaze point updates at a frequency on the order of individual milliseconds, while using available computing resources in an efficient manner.

[0055] To provide the above and related features, and as shown in the exploded view of FIG. 1, the HMD 102 may be configured with a number of hardware and software elements and features. For example, the HMD 102 may include a processor 120 (which may represent one or more processors), as well as a memory 122 (which may represent one or more memories (e.g., non-transitory computer readable storage media)). Although not shown separately in FIG. 1, the HMD 102 may also include a battery, which may be used to power operations of the processor 120, the memory 122, and various other resources of the HMD 102. As noted above, more detailed examples of the HMD 102 and various associated hardware/software resources, as well as alternate implementations of the system of FIG. 1, are provided below; e.g., with respect to FIGS. 3, 7, 8A. and 8B.

[0056] For purposes of the simplified example of FIG. 1, the HMD 102 should be further understood to include at least one gaze tracker 124. Any know n or future gaze tracking technique(s) may be used. For example, the gaze tracker 124 may represent, or include, an infrared light source for projecting infrared light into the eye of the user 104, a camera positioned to capture reflections of the projected infrared light, and one or more algorithms/calculations (e.g., fdters, inference engines) designed to perform gaze point locations based on the projected infrared light and the captured reflections.

[0057] Further in FIG. 1. the HMD 102 may include a selection handler 126, which represents one or more of various techniques that may be implemented to enable use of a selection element (e.g., pointer or cursor) to trigger a function or feature of the UI 106 by selecting, e.g., one of the UI elements represented by the UI element 108. For example, the user 104 may use gaze tracking and related techniques described herein to move such a selection element in the context of the UI 106 and direct the selection element to the UI element 108. The user 104 may then implement the selection handler 126 to select the UI element 108 and invoke a designated function of the UI element 108.

[0058] The selection handler 126 may represent any suitable or available selection technique. For example, in the context of gaze tracking, the selection handler 126 may implement a selection function after a detected gaze point hovers over a particular UI element for a predetermined quantity of time.

[0059] Additionally or alternative selection techniques may be used, as well. For example, selections may be inferred from the user 104 blinking in a defined manner, as may be captured by a camera of the gaze tracker 124. In other examples, the selection handler 126 may represent a hardware button on the HMD 102 that may be pressed by the user 104 to initiate a selection. In some examples, the HMD 102 may include a gesture detector, and the selection handler 126 may detect a hand gesture of the user 104 for selection purposes.

[0060] A UI generator 128 refers to any application and associated hardware needed to generate the UI 106. For example, the UI generator 128 may include a rendering engine and associated projector for generating the UI 106. More specific examples of the UI generator 128, as well as of the gaze tracker 124 and the selection handler 126, are provided below, e.g., with respect to FIGS. 3, 7. 8A, and 8B.

[0061] A bounding area generator 130 may (or may not) be included within the UI generator 128, but is illustrated separately in FIG. 1 for the sake of explanation. The bounding area generator 130 may be configured to generate suitable bounding areas for any UI provided by the UI generator 128, including customizing such bounding areas for the user 104.

[0062] In the simplified example of FIG. 1, the bounding areas 107 are illustrated as a simple grid of uniformly shaped (e.g., square) bounding boxes. As already referenced, the bounding area generator 130 may generate virtually any desired shape(s), size(s), and distribution(s) of bounding areas determined to be optimal for the system of FIG. 1, including for the user 104. For example, a Voronoi distribution may be used, as described in more detail, below, with respect to the example of FIG. 5.

[0063] In example implementations, the bounding area generator 130 may be configured to ensure that generated bounding areas are adjacent to one another, at least within a UI region in which UI element selection (and associated gaze point detection/validation) may occur. For example, if generated bounding areas are all adjacent to, and contiguous with, one another, then any gaze point detected will be validated with respect to one of the bounding areas and its included UI element.

[0064] On the other hand, if the generated bounding areas are not adjacent and/or contiguous with one another, such that UI space (e.g., gaze space) exists between two or more bounding areas, then a gaze point falling within such a dead space may be undetected or undetectable by the system of FIG. 1, which may result in a deadlock situation in which, e.g., a gaze point freezes or otherwise becomes unavailable. In such cases, the bounding area generator 130 may be configured to re-generate or update relevant bounding areas to correct the deadlock situation.

[0065] Various other factors, including user preferences, user characteristics of the user 104. HMD characteristics of the HMD 102 (e.g.. available resolution or field of view), and/or characteristics of the UI 106 (or a corresponding application) may be used by the bounding area generator 130. For example, as discussed in more detail, below, the user 104 may have a higher or lower level of variation (e.g., jitter) in the user’s eye movements, which may cause the bounding area generator 130 to generate relatively larger bounding areas and/or to communicate with the UI generator 128 to request a larger total gaze space for the UI 106 within a field of view of the HMD 102.

[0066] Agaze point validation manager 132 may be configured to implement and execute the types of gaze point validation described herein, e.g., to determine a specific bounding area in which the gaze point 112a or the gaze point 112b is positioned. That is, the gaze tracker 124 may provide gaze point tracking results with varying levels of accuracy and various sources of inaccuracy.

[0067] For example, as referenced above, the gaze tracker 124 may be limited by various potential sources of inaccuracy. For example, a camera view of a camera of the HMD 102 and/or an infrared light source may be slightly misaligned. In other examples, a gaze algorithm of the gaze tracker 124 may introduce errors in gaze tracking results. In other examples, the user 104 may exhibit more or less jitter in detected eye movements, which may reduce gaze tracking accuracy.

[0068] These and other sources of potential error result in the gaze tracker 124 providing a gaze tracking result that may be considered an estimated gaze point, which, by itself, may be insufficient to avoid false triggers or selections of undesired UI elements, particularly for small, close, numerous, and/or dense arrangements of UI elements.

[0069] As described above, the use of described bounding areas 107 may mitigate or eliminate such false triggers/selections. For example, if the user 104 directs a gaze point at a desired UI element, such as the UI element 108, then, even with the types of inaccuracies just referenced, the resulting estimated gaze point may still be very likely to be within a surrounding bounding area, e.g., the bounding area 110.

[0070] As also described, eyes (and an associated gaze point) of the user 104 may move rapidly and abruptly from one UI location (e.g., UI element) to another, e.g., between one image frame captured by the gaze tracker 124 and a subsequent image frame. As the gaze points move accordingly, the gaze point validation manager 132 may be configured to implement the above-described techniques to validate each estimated gaze point from the gaze tracker 124 with respect to a corresponding bounding area of the bounding areas generated by the bounding area generator 130.

[0071] For example, a search manager 134 may be configured to execute a search of all or some of the relevant bounding areas 107, at each captured/corresponding image frame, e.g., to validate a location of the gaze point 112a within the bounding area 113a, and to validate the location of the gaze point 112b within the bounding area 113b. As described, it is possible for the search manager 134 to search each and all of the bounding areas 107, if necessary, to locate and validate a current gaze point.

[0072] For example, during an initialization stage, it may occur that no gaze point has yet been validated with respect to a particular bounding area of the bounding areas 107. Such an initialization stage may occur, e.g., when the UI 106 is first used, or when a gaze point location is inadvertently lost. However, as also described, such a comprehensive or brute force approach to gaze point validation may be undesirable, impractical, or impossible, for desired levels of accuracy and/or responsiveness, and/or may unnecessarily consume computing resources.

[0073] Therefore, a subset selector 136 may be configured to define, identify, and otherwise determine search subsets to be used by the search manager 134 to improve or optimize operations of the search manager 134. For example, the search manager 134 may initially validate the gaze point 112a within the bounding area 113a during an exhaustive search performed during an initialization stage. Then, the subset selector 136 may determine and define the first search subset 114.

[0074] Iterations may continue with the search manager 134 then searching only the bounding areas of the first search subset 114 to find the second gaze point 112b within the specific bounding area 113b. The subset selector 136 may then determine and define the second search subset 116. Iterations may further continue, e.g., by validating a third or subsequent gaze point(s) and corresponding search subset(s), as shown and described with respect to FIGS. 4A, 4B, and 6.

[0075] In the context of such iterations, the search subsets may be dynamically sized and/or shaped to further optimize the gaze tracking processes described herein. For example, the subset selector 136 may determine a gaze vector characterizing a direction and velocity of movements of the gaze point, and may expand or contract a subsequent search subset in the direction by an amount determined from the velocity.

[0076] As referenced above, the subset selector 136 may determine the search subset 116 (e.g., a size of the search subset 116) based on an anticipated maximum gaze point movement threshold that characterizes a maximum anticipated distance between the gaze point 112a at time ti in the context of the UI 106a and the gaze point 112b at time 12 in the context of the UI 106b. As also referenced, this gaze point movement threshold may be determined based on an analysis of a population of users, and/or may be based on characteristics of the user 104 as an individual.

[0077] For example, a calibration manager 138 may be configured to perform a calibration test with respect to the user 104. For example, the calibration manager 138 maycause the HMD 102 to instruct the user 104 to visually track a series of movements of UI elements. The calibration manager 138 may then measure velocities of these movements, and determine, from the measured velocities, a maximum gaze point distance observed for the user 104. The measured maximum gaze point distance may then be used to set an individualized gaze point movement threshold for the user 104, which may then be used by the subset selector 136 when determining a search subset, such as the search subset 114 or the search subset 116.

[0078] In FIG. 1. the various components of the HMD 102 are illustrated as being present within (e.g., mounted on or in) the HMD 102. For example, the various components may be provided using a glasses frame when the HMD 102 represents a pair of smartglasses. [0079] The illustrated components of the head-based UI control manager 132 may be implemented as one or more software module(s). That is, for example, the memory 122 may be used to store instructions that are executable by the processor 120, which, when executed, cause the processor 120 to implement the gaze point validation manager 132 as described herein.

[0080] As referenced above and shown in more detail with respect to FIG. 7, various ones of the illustrated components of the HMD 102, e.g., components of the gaze point validation manager 132, may be provided in a separate device that is in communication with the HMD 102. For example, the selection handler 126 may utilize or be in communication with a smartwatch or smartphone that provides a selection feature. In other examples, one or more components of the gaze point validation manager 132 may be provided using a separate device(s), including a remote (e.g., cloud-based) device.

[0081] FIG. 2 is a flowchart illustrating example operations of the system of FIG. 1. In the example of FIG. 2, the various operations are illustrated as separate, sequential operations. However, in various example implementations, the various operations may be implemented in a different order than illustrated, in an overlapping or parallel manner, and/or in a nested, iterative, looped, or branched fashion. Further, various ones of the operations or sub-operations may be included, omitted, or substituted.

[0082] In FIG. 2. a first gaze point may be identified within a user interface (UI) that includes a set of UI elements included within a corresponding set of bounding areas, the first gaze point within a first bounding area of the set of bounding areas (202). For example, the first gaze point 112a may be identified within the UI 106a, representing an instance of the UI 106 at a first time ti. For example, the search manager 134 may perform a search of the bounding areas 107 to identify the first gaze point 112a within the bounding area 113a.

[0083] A first search subset of the set of bounding areas may be selected, the first search subset at least partially surrounding the first gaze point and the first bounding area (204). For example, the subset selector 136 may be configured to select the hatched bounding areas of the search subset 114 of FIG. 1 as the first search subset. As shown, the search subset 114 may be selected as a set of bounding areas enclosing or surrounding the first gaze point 112a. For example, the search subset 114 may be generated with the bounding area 113a as a central bounding area of the plurality of bounding areas of the search subset 114.

[0084] As also described above, a size of the search subset 114 may be determined based on at least one gaze point movement threshold. For example, in addition to the various eye/gaze characteristics discussed herein that may be used to determine a gaze point movement threshold, a time difference between time ti of the UI 106a and time t2 of the UI 106b may be used. That is. for example, a greater time difference (e.g., lower frequency of image capture) may correspond to a larger gaze point movement threshold. Further, multiple gaze point movement thresholds may be used. For example, the search subset 114 may be generated based on a first gaze point movement threshold, but if the search manager fails to find a subsequent gaze point that is located within the search subset 114, then a second, larger gaze point movement threshold may be used to expand the search radius for the subsequent gaze point.

[0085] The first search subset may be searched to identify a second bounding area of the first search subset corresponding to a second gaze point obtained after the first gaze point (206). For example, the search manager 134 may locate the second gaze point 112b within the second bounding area 113b.

[0086] A second search subset of the set of bounding areas may be selected, the second search subset at least partially surrounding the second gaze point and the second bounding area, and overlapping with the first search subset (208). For example, the subset selector 136 may be configured to select the hatched bounding areas of the search subset 116 of FIG. 1 as the second search subset, so that cross-hatched bounding areas 118 represent overlapping bounding areas between the first search subset 114 and the second search subset 116. As shown, the search subset 116 may be selected as a set of bounding areas enclosing or surrounding the second gaze point 112b. For example, the search subset 116 may be generated with the bounding area 113b as a central bounding area of the plurality of bounding areas of the search subset 116.

[0087] An identified UI element of the set of UI elements that is included within the second bounding area may be determined (210). For example, the search manager 134, upon identify ing the second gaze point 112b within the bounding area 113b, may thereby identify a UI element within the bounding area 113b. (not illustrated separately in FIG. 1, but analogous to the UI element 108 of FIG. 1). Thus, for example, a selection of the identified UI element within the bounding area 113b may be received, e.g., by way of the selection handler 126.

[0088] FIG. 3 is a block diagram of an example system that may be used in the system of FIG. 1. In FIG. 3, the user 104 is illustrated as wearing the HMD 102 in the form factor of a pair of goggles.

[0089] Image frames 302 may thus be captured by the HMD 102 and processed by an inference engine 304. As referenced above, the inference engine 304 may be configured to infer, for an eye image 306, a relationship between an incident light 308 (e.g., infrared light generated by the HMD 102) at a retinal center and reflected light 310, to thereby determine a gaze direction.

[0090] The resulting gaze point may thus be determined with respect to a UI 312. In the example of FIG. 3, the UI 312 is illustrated in an augmented reality or mixed reality context, in which UI elements 314 are overlaid on any background context in a vicinity of the user 104.

[0091] Further in FIG. 3. the tracked, estimated gaze point may be related and represented to the user 104 within the context of the UI 312 as a cursor 316. That is, the cursor 316 may thus be moved within the context of the UI 312 to control (e.g., select) a desired one of the UI elements 314.

[0092] Bounding areas may be generated for the UI 312 and enclosing each of the UI elements, but, as referenced above, are not visibly rendered for the user 104. Nonetheless, as described with respect to FIGS. 1 and 2, the invisible bounding areas surrounding each of the UI elements 314 may be use to facilitate enhanced accuracy and reliability with respect to gaze-based control of the cursor 316.

[0093] For example, in FIG. 3, the cursor 316 is illustrated as being approximately midway between UI element 318 and UI element 320. As may be understood from the above discussion of FIGS. 1 and 2, invisible, adjacent bounding areas enclosing the UI element 318 and the UI element 320 enable a clear delineation between those two UI elements.

Specifically, as shown, the cursor 316 is closer to the UI element 320 than to the UI element 318, and may thus be located within a bounding area of the UI element 320 by the gaze point validation manager 132 of FIG. 1. As a result, the UI element 320 may be highlighted, enlarged, or otherwise visually indicated to be selected/selectable by the cursor 316.

[0094] FIG. 4A illustrates a first example of a modified search space that may be used in the examples of FIGS. 1-3. In FIG. 4A, a UI 402 may be understood to be viewed by the user 104, with a gaze of the user 104 moving in accordance with a gaze vector illustrated by dashed arrow 404 (that is, the dashed arrow 404 should be understood to be included merely to indicate such a gaze vector, and should not be interpreted as being included in. or used by. the UI 402). Further, in the simplified example of FIG. 4A, the UI 402 should be understood to include suitable bounding areas with included UI elements, although such bounding areas and included UI elements are not explicitly illustrated in FIG. 4A.

[0095] Consistent with the above descriptions of FIGS. 1 and 2, a first gaze point 406 may be located within the UI 402. For example, if the first gaze point 406 represents an initial gaze point, the search manager 134 may perform a search based on all available bounding areas in order to relate the first gaze point 406 to a corresponding bounding area and enclosed UI element.

[0096] Then, the subset selector 136 may determine a first search subset 414 of bounding areas around the first gaze point 406. In the example, the gaze tracker 124 of FIG. 1 may determine a gaze vector associated with the first gaze point 406 and consistent with the gaze vector 404. For example, the first gaze point 406 may be determined relative to a preceding gaze point (not shown in FIG. 4A), indicating that a gaze of the user 104 is moving in a detected direction and/or at a detected velocity.

[0097] In such a scenario, it may be more likely that the gaze of the user 104 will continue in accordance with the determined gaze vector. Accordingly, the subset selector 136 may define a first search subset 414 in an elliptical shape, with a major axis in a direction of the gaze vector.

[0098] The search manager 134 may thus perform an optimized search, on the assumption that a subsequent gaze point will be more likely to be detected in a direction of the gaze vector. That is, the search manager 134 may user fewer computing resources searching in a direction of a minor axis of the ellipse of the first search subset 414, since subsequent gaze points are less likely to be in those directions, than if the first search subset was symmetrical around the first gaze point, as in the examples of FIG. 1. Accordingly, a second gaze point 408 may be determined more quickly, and using fewer resources, than if a symmetrical search subset were used.

[0099] Then, as referenced above, the gaze of the user 104 may travel along the direction of the dashed arrow 404, moving to the second gaze point 408, a third gaze point 410, and a fourth gaze point 412. Each gaze movement may also be associated with a corresponding gaze vector indicating a direction, velocity, and/or distance of gaze movements between pairs of gaze points, and consistent with the overall gaze vector 404.

[00100] Therefore, a second search subset 416 may be determined by the subset selector 136 based on an updated gaze vector determined with respect to the first gaze point 406 and the second gaze point 408. A subsequent search of the second search subset 416 by the search manager 134 may thus identify the third gaze point 410 with respect to a corresponding bounding area.

[00101] Then, a third search subset 418 may be determined by the subset selector 136 based on an updated gaze vector determined with respect to the second gaze point 408 and the third gaze point 410. A subsequent search of the third search subset 418 by the search manager 134 may thus identify the fourth gaze point 412 with respect to a corresponding bounding area.

[00102] FIG. 4B illustrates a second example of a modified search space that may be used in the examples of FIGS. 1-3. In FIG. 4B, a UI 420 may be understood to be viewed by the user 104, with a gaze of the user 104 moving in accordance with a gaze vector illustrated by dashed arrow 422 (similar to the gaze vector 404 of FIG. 4A). Further, in the simplified example of FIG. 4B, the UI 420. like the UI 402 of FIG. 4A, should be understood to include suitable bounding areas with included UI elements.

[00103] In FIG. 4B, a first gaze point 424 may be located within the UI 420. For example, if the first gaze point 424 represents an initial gaze point, the search manager 134 may perform a search based on all available bounding areas in order to relate the first gaze point 424 to a corresponding bounding area and enclosed UI element.

[00104] Then, the subset selector 136 may determine a first search subset 430 of bounding areas around the first gaze point 424. In the example, the gaze tracker 124 of FIG. 1 may determine a gaze vector associated with the first gaze point 406, and consistent with the gaze vector 422. For example, the first gaze point 424 may be determined relative to a preceding gaze point (not shown in FIG. 4B), indicating that a gaze of the user 104 is moving in a detected direction and/or at a detected velocity.

[00105] In FIG. 4B, in contrast with FIG. 4A, the detected gaze vector may indicate a relatively slow velocity and/or small distance traveled. Accordingly, the subset selector 136 may define the first search subset 430 to be smaller in size then in the examples of FIG. 4A, in order to conserve time and resources when searching for a second gaze point 426.

[00106] Then, as referenced above, the gaze of the user 104 may travel along the direction of the dashed arrow- 422, moving to the second gaze point 426, and then to a third gaze point 428. Each gaze movement may also be associated with a corresponding gaze vector indicating a direction, velocity, and/or distance of gaze movements betw een pairs of gaze points, and consistent with the overall gaze vector 422.

[00107] Therefore, a second search subset 432 may be determined by the subset selector 136 based on an updated gaze vector determined with respect to the first gaze point 424 and the second gaze point 426. A subsequent search of the second search subset 432 by the search manager 134 may thus identify the third gaze point 428 with respect to a corresponding bounding area. Then, a third search subset 434 may be determined by the subset selector 136 based on an updated gaze vector determined with respect to the second gaze point 426 and the third gaze point 428.

[00108] FIGS. 4 A and 4B therefore generally illustrate that properties of a search subset may be adjusted based on, e.g., a velocity of a relevant eye gaze and associated dynamic qualities. For fast moving gaze coordinates such as in FIG. 4A, the search subsets may be relaxed to extend along the major axes of the ellipses of the search subsets 414, 416, 418, to thereby include more potentially relevant UI elements. Put another way, the search subsets may be elongated in a direction of movement. For a slower gaze, as in FIG. 4B, the search subset(s) area(s) may be smaller and define a tighter, more constrained search area. Accordingly, it is possible to optimize a tradeoff between reducing search complexity and maintaining reliable eye tracking accuracy.

[00109] FIG. 5 illustrates an example implementation w ith boundary areas generated as Voronoi regions. lin FIG. 5, a UI 502 includes a plurality of UI elements enclosed within corresponding adjacent and contiguous bounding areas, including a UI element 504 within a bounding area 506, and a UI element 508 within a bounding area 510.

[00110] In FIG. 5, it is assumed that the various UI elements of the UI 502, including the UI elements 504, 508, are positioned in accordance with relevant design requirements or preferences. That is, as noted with respect to FIG. 1, and illustrated with respect to the example of FIG. 3, UI elements may not be assumed to be positioned using any regular, known, or symmetrical pattern.

[00111] For example, UI elements may be positioned based on relative likelihood of use. For example, a more popular (more frequently chosen) UI element may be larger, or positioned more prominently, than a less popular UI element. In such cases, the bounding area generator 130 may be configured to generate a relatively larger bounding area for the more popular UI element(s), and a relatively smaller bounding area for the less popular UI element(s).

[00112] In other scenarios, however, there may be no know n or Ul-specific criteria to use in generating suitable bounding areas. In these and similar scenarios, bounding areas may be generated as Voronoi regions.

[00113] A Voronoi pattern, also known as a Dirichlet tessellation or Thiessen polygons, is a type of tessellation pattern (having no gaps or overlaps between adjacent regions) in which UI elements are enclosed in a portion of the UI that is closest to each UI element. Put another way, each boundary line of a bounding area is based on the distance(s) to UI elements sharing that boundary line, so that each boundary line is equidistant between two UI elements.

[00114] For example, as shown in FIG. 5, the UI element 504 and the UI element 506 share a boundary line 512 that is shared by the bounding area 506 and the bounding area 510. As may be observed, the boundary line 512 is equidistant between the UI elements 504, 508.

[00115] When using the Voronoi pattern of FIG. 5, the search manager 134 may thus relate gaze points to individual bounding areas by comparing relative distances between the gaze point in question and a center or centroid of the various UI elements. For example, a gaze point 514 may be determined to have a smallest relative distance to the UI element 508, as compared to the UI element 504 or any of the remaining UI elements of FIG. 5.

[00116] Consistent with the examples above, calculations associated with such distance comparisons may be minimized by restricting the distance comparisons only to a suitably chosen search subset determined by the subset selector 136. In other words, using described techniques, fewer Euclidean distance comparisons are required, and only those comparisons most likely to result in a correct determination of a current gaze point/bounding area are calculated.

[00117] FIG. 6 is a flowchart illustrating more detailed example operations of the systems of FIGS. 1 and 3. In FIG. 6, for a UI with included UI elements, bounding areas may be generated (602). For example, the bounding area generator 130 of FIG. 1 may generate bounding areas using the Voronoi regions of FIG. 5.

[00118] A first gaze point may then be initialized (604). For example, the search manager 134 of FIG. 1 may perform an exhaustive search of the generated bounding areas to initially locate the first gaze point. For the Voronoi regions of FIG. 5, and similar bounding areas, relative distance comparisons may be made between the gaze point and each center of each UI element. However, other search techniques may be used for performing boundary check algorithms to relate the gaze point to a particular bounding area, such as the even-odd rule algorithm, or the winding number algorithm, where the choice of such algorithms may depend in part on the design of the UI and/or bounding areas in question.

[00119] Based on the first gaze point and associated, determined first bounding area, a first search space subset may be generated (606). Various techniques usable by the search subset selector 136 are provided above, but in general, boundaries and other parameters of the search subset generated may be determined by thresholding a Gaussian distribution describing potential locations of a subsequent gaze point, relative to the current/first gaze point and in the context of the UI being searched, and determined based on a maximum expected gaze movement between image frames for the user 104.

[00120] For example, a defined number of standard deviations of a determined Gaussian distribution may be used to establish boundaries of a search subset being generated. Then, more or fewer standard deviations for the same underlying Gaussian distribution may be use to obtain the types of dynamic search subset updates described above with respect to FIGS. 4A and 4B.

[00121] Using the generated (e.g., first) search subset a subsequent (e.g., second) gaze point may be validated within, and with respect to, a specific bounding area and included UI element (608) of the generated search subset. For example, the search manager 134 may search the bounding areas of the first search subset, e.g., using information stored in memory for the first search subset, such as an earlier image frame, to determine the specific bounding area thereof in which the subsequent/updated gaze point is present.

[00122] If the attempted validation is not successful ( 10), then an updated search subset may be generated (606). For example, the generated search space may be too small or may be oriented incorrectly, so that the subsequent (e.g., second) gaze point landed outside of the generated search subset. In such a scenario, a new search subset may be generated (606) that is larger and/or oriented differently.

[00123] Such searching of the search subset(s) by the search manager 134 may continue until the validation is successful (610). Once the search completes, a gaze vector may be calculated (612), to be used in the subsequent generation of new search subsets.

[00124] If a selection of the determined UI element is received (614), then appropriate action may be taken, such as rendering a selection result (616). Otherwise (614), a new (e.g., second) search subset may be generated (606), and a subsequent (e.g., third) gaze point may be validated with respect thereto. As long as no selection is received, iterations may continue with the generation of a third (and subsequent) search subset(s), corresponding to third and subsequent image frames. At each iteration, e.g., at each frame, a current/updated gaze vector may be used during generation of the search subset for that iteration, as described with respect to FIGS. 4A and 4B.

[00125] FIGS. 1-6 illustrate that marching islands of search subsets may be used for fast, accurate, reliable, and efficient gaze point determinations. As described, based on current gaze point coordinate, a local island or search subset of UI elements and/or corresponding bounding areas may be used to validate a desired gaze-based UI element selection. For example, a UI element determination may be made for the UI element that gives the minimal distance among the UI elements in that island/search subset, where the search subset may be updated over time due to the dynamic nature of the gaze point coordinates.

[00126] The described approaches utilize a continuity of gaze, referring to natural UI selections in which the eye moves from one coordinate to another in a near-straight line that can be drawn in coordinate space, and that quantifies a finite velocity associated with the gaze movement. Then, when the gaze point moves to a new coordinate over time, the originally selected search subset or island will still be used to compare against the new gaze point coordinate. Once a new UI element bounding area is chosen to be the closest to the new gaze point coordinate, then the search subset or island may be updated accordingly. In this way, iterations may continue until a desired UI result is achieved.

[00127] FIG. 7 is a third person view of a user 702 (analogous to the user 104 of FIG. 1) in an ambient environment 7000, with one or more external computing systems shown as additional resources 752 that are accessible to the user 702 via a network 7200. FIG. 7 illustrates numerous different wearable devices that are operable by the user 702 on one or more body parts of the user 702. including a first wearable device 750 in the form of glasses worn on the head of the user, a second wearable device 754 in the form of ear buds worn in one or both ears of the user 702, a third wearable device 756 in the form of a w atch worn on the wrist of the user, and a computing device 706 held by the user 702. In FIG. 7, the computing device 706 is illustrated as a handheld computing device but may also be understood to represent any personal computing device, such as a table or personal computer.

[00128] In some examples, the first wearable device 750 is in the form of a pair of smart glasses including, for example, a display, one or more images sensors that can capture images of the ambient environment, audio input/output devices, user input capability, computing/processing capability and the like. Additional examples of the first wearable device 750 are provided below with respect to FIGS. 8A and 8B.

[00129] In some examples, the second wearable device 754 is in the form of an ear worn computing device such as headphones, or earbuds, that can include audio input/output capability, an image sensor that can capture images of the ambient environment 7000, computing/processing capability, user input capability and the like. In some examples, the third w earable device 756 is in the form of a smart watch or smart band that includes, for example, a display, an image sensor that can capture images of the ambient environment, audio input/output capability, computing/processing capability, user input capability and the like. In some examples, the handheld computing device 706 can include a display, one or more image sensors that can capture images of the ambient environment, audio input/output capability, computing/processing capability, user input capability, and the like, such as in a smartphone. In some examples, the example wearable devices 750, 754, 756 and the example handheld computing device 706 can communicate with each other and/or with external computing system(s) 752 to exchange information, to receive and transmit input and/or output, and the like. The principles to be described herein may be applied to other types of wearable devices not specifically shown in FIG. 7 or described herein.

[00130] The user 702 may choose to use any one or more of the devices 706, 750, 754, or 756, perhaps in conjunction with the external resources 752, to implement any of the implementations described above with respect to FIGS. 1-6. For example, the user 702 may use an application executing on the device 706 and/or the smartglasses 750 to execute the gaze point validation manager 132 of FIG. 1.

[00131] As referenced above, the device 706 may access the additional resources 752 to facilitate the various Ul-related operations described herein, or related techniques. In some examples, the additional resources 752 may be partially or completely available locally on the device 706 or the first wearable device (HMD) 750. In some examples, some of the additional resources 752 may be available locally on the device 706, and some of the additional resources 752 may be available to the device 706 via the network 7200. As shown, the additional resources 752 may include, for example, server computer systems, processors, databases, memory storage, and the like. In some examples, the processor(s) may include training engine(s), transcription engine(s), translation engine(s), rendering engine(s), and other such processors. In some examples, the additional resources may include ML model(s).

[00132] The device 706 (and or the first wearable device (HMD) 750) may operate under the control of a control system 760. The device 706 can communicate with one or more external devices, either directly (via wired and/or wireless communication), or via the network 7200. In some examples, the one or more external devices may include various ones of the illustrated wearable computing devices 750, 754, 756, another mobile computing device similar to the device 706, and the like. In some implementations, the device 706 includes a communication module 762 to facilitate external communication. In some implementations, the device 706 includes a sensing system 764 including various sensing system components. The sensing system components may include, for example, one or more image sensors 765, one or more position/ orientation sensor(s) 764 (including for example, an inertial measurement unit, an accelerometer, a gyroscope, a magnetometer and other such sensors), one or more audio sensors 766 that can detect audio input, one or more image sensors 767 that can detect visual input, one or more touch input sensors 768 that can detect touch inputs, and other such sensors. The device 706 can include more, or fewer, sensing devices and/or combinations of sensing devices. Various ones of the various sensors may be used individually or together to perform the types of UI control described herein.

[00133] Captured still and/or moving images may be displayed by a display device of an output system 772, and/or transmitted externally via a communication module 762 and the network 7200, and/or stored in a memory 770 of the device 706. The device 706 may include one or more processor(s) 774. The processors 774 may include various modules or engines configured to perform various functions. In some examples, the processor(s) 774 may include, e.g, training engine(s), transcription engine(s), translation engine(s), rendering engine(s), and other such processors. The processor(s) 774 may be formed in a substrate configured to execute one or more machine executable instructions or pieces of software, firmware, or a combination thereof. The processor(s) 774 can be semiconductor-based including semiconductor material that can perform digital logic. The memory 770 may include any type of storage device or non-transitory computer-readable storage medium that stores information in a format that can be read and/or executed by the processor(s) 774. The memory 770 may store applications and modules that, when executed by the processor(s) 774, perform certain operations. In some examples, the applications and modules may be stored in an external storage device and loaded into the memory 770.

[00134] Although not shown separately in FIG. 7, it will be appreciated that the various resources of the computing device 706 may be implemented in whole or in part within one or more of various wearable devices, including the illustrated smartglasses 750, as well as the earbuds 754 and smartwatch 756, any or all of which may be in communication with one another to provide the various features and functions described herein.

[00135] An example head mounted wearable device 800 in the form of a pair of smart glasses is shown in FIGS. 8A and 8B, for purposes of discussion and illustration. The example head mounted wearable device 800 includes a frame 802 having rim portions 803 surrounding glass portion, or lenses 807, and arm portions 830 coupled to a respective rim portion 803. In some examples, the lenses 807 may be corrective/prescription lenses. In some examples, the lenses 807 may be glass portions that do not necessarily incorporate corrective/prescription parameters. A bridge portion 809 may connect the rim portions 803 of the frame 802. In the example shown in FIGS. 8 A and 8B, the wearable device 800 is in the form of a pair of smart glasses, or augmented reality glasses, simply for purposes of discussion and illustration, and may also be implemented as goggles or other types of HMDs.

[00136] In some examples, the wearable device 800 includes a display device 804 that can output visual content, for example, at an output coupler providing a visual display area 805, so that the visual content (e.g., a user interface) is visible to the user. In the example shown in FIGS. 8A and 8B, the display device 804 is provided in one of the two arm portions 830, simply for purposes of discussion and illustration. Display devices 804 may be provided in each of the two arm portions 830 to provide binocular output of content. In some examples, the display device 804 may be a see through near eye display. In some examples, the display device 804 may be configured to project light from a display source onto a portion of teleprompter glass functioning as a beamsplitter seated at an angle (e.g., 30-45 degrees). The beamsplitter may allow for reflection and transmission values that allow the light from the display source to be partially reflected while the remaining light is transmitted through. Such an optic design may allow a user to see both physical items in the world, for example, through the lenses 807, next to content (for example, digital images, user interface elements, virtual content, and the like) output by the display device 804. In some implementations, waveguide optics may be used to depict content on the display device 804.

[00137] The example wearable device 800. in the form of smart glasses as shown in FIGS. 8A and 8B, includes one or more of an audio output device 806 (such as, for example, one or more speakers), an illumination device 808, a sensing system 810, a control system 812, at least one processor 814, and an outward facing image sensor 816 (for example, a camera). In some examples, the sensing system 810 may include various sensing devices and the control system 812 may include various control system devices including, for example, the at least one processor 814 operably coupled to the components of the control system 812. In some examples, the control system 812 may include a communication module providing for communication and exchange of information between the wearable device 800 and other external devices.

[00138] In some examples, the head mounted wearable device 800 includes a gaze tracking device 815 to detect and track eye gaze direction and movement. Data captured by the gaze tracking device 815 may be processed to detect and track gaze direction and movement as a user input. In the example shown in FIGS. 8A and 8B, the gaze tracking device 815 is provided in one of two arm portions 830, simply for purposes of discussion and illustration. In the example arrangement shown in FIGS. 8A and 8B, the gaze tracking device 815 is provided in the same arm portion 830 as the display device 804, so that user eye gaze can be tracked not only with respect to objects in the physical environment, but also with respect to the content output for display by the display device 804. In some examples, gaze tracking devices 815 may be provided in each of the two arm portions 830 to provide for gaze tracking of each of the two eyes of the user. In some examples, display devices 804 may be provided in each of the two arm portions 830 to provide for binocular display of visual content.

[00139] The wearable device 800 is illustrated as glasses, such as smartglasses, augmented reality (AR) glasses, or virtual reality (VR) glasses. More generally, the wearable device 800 may represent any head-mounted device (HMD), including, e.g., goggles, helmet, or headband. Even more generally, the wearable device 800 and the computing device 706 may represent any wearable device(s), handheld computing device(s), or combinations thereof.

[00140] Use of the wearable device 800, and similar wearable or handheld devices such as those shown in FIG. 7, enables useful and convenient use case scenarios of implementations of FIGS. 1-6. For example, as shown in FIG. 8B, the display area 805 may be used to display the UI 106 of FIG. 1. More generally, the display area 805 may be used to provide any of the functionality described with respect to FIGS. 1-6 that may be useful in operating the gaze point validation manager 132.

[00141] Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.

[00142] These computer programs (also known as modules, programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine- readable medium" “computer-readable medium" refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal'’ refers to any signal used to provide machine instructions and/or data to a programmable processor.

[00143] To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, or LED (light emitting diode)) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback (e.g.. visual feedback, auditory feedback, or tactile feedback), and input from the user can be received in any form, including acoustic, speech, or tactile input.

[00144] The systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.

[00145] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

[00146] A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the description and claims.

[00147] In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps may be provided, or steps may be eliminated, from the described flow s, and other components may be added to, or removed from, the described systems. Accordingly, other implementations are within the scope of the following claims.

[00148] Further to the descriptions above, a user is provided with controls allowing the user to make an election as to both if and when systems, programs, devices, networks, or features described herein may enable collection of user information (e.g.. information about a user’s social network, social actions, or activities, profession, a user’s preferences, or a user’s current location), and if the user is sent content or communications from a server. In addition, certain data may be treated in one or more ways before it is stored or used, so that user information is removed. For example, a user’s identity may be treated so that no user information can be determined for the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined. Thus, the user may have control over what information is collected about the user, how that information is used, and what information is provided to the user.

[00149] The computer system (e.g., computing device) may be configured to wirelessly communicate with a network server over a network via a communication link established with the network server using any known wireless communications technologies and protocols including radio frequency (RF), microwave frequency (MWF). and/or infrared frequency (IRF) wireless communications technologies and protocols adapted for communication over the network.

[00150] In accordance with aspects of the disclosure, implementations of various techniques described herein may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. Implementations may be implemented as a computer program product (e.g., a computer program tangibly embodied in an information carrier, a machine-readable storage device, a computer-readable medium, a tangible computer-readable medium), for processing by, or to control the operation of, data processing apparatus (e.g., a programmable processor, a computer, or multiple computers). In some implementations, a tangible computer-readable storage medium may be configured to store instructions that when executed cause a processor to perform a process. A computer program, such as the computer program(s) described above, may be written in any form of programming language, including compiled or interpreted languages, and may be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may be deployed to be processed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

[00151] Specific structural and functional details disclosed herein are merely representative for purposes of describing example implementations. Example implementations, however, may be embodied in many alternate forms and should not be construed as limited to only the implementations set forth herein.

[00152] The terminology used herein is for the purpose of describing particular implementations only and is not intended to be limiting of the implementations. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including." when used in this specification, specify the presence of the stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.

[00153] It will be understood that when an element is referred to as being "coupled," "connected," or "responsive" to, or "on," another element, it can be directly coupled, connected, or responsive to, or on, the other element, or intervening elements may also be present. In contrast, when an element is referred to as being "directly coupled," "directly connected," or "directly responsive" to, or "directly on," another element, there are no intervening elements present. As used herein the term "and/or" includes any and all combinations of one or more of the associated listed items.

[00154] Spatially relative terms, such as "beneath," "below," "lower," "above," "upper." and the like, may be used herein for ease of description to describe one element or feature in relationship to another element(s) or feature(s) as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "below" or "beneath" other elements or features would then be oriented "above" the other elements or features. Thus, the term "below" can encompass both an orientation of above and below. The device may be otherwise oriented (rotated 130 degrees or at other orientations) and the spatially relative descriptors used herein may be interpreted accordingly.

[00155] Example implementations of the concepts are described herein with reference to cross-sectional illustrations that are schematic illustrations of idealized implementations (and intermediate structures) of example implementations. As such, variations from the shapes of the illustrations as a result, for example, of manufacturing techniques and/or tolerances, are to be expected. Thus, example implementations of the described concepts should not be construed as limited to the particular shapes of regions illustrated herein but are to include deviations in shapes that result, for example, from manufacturing. Accordingly, the regions illustrated in the figures are schematic in nature and their shapes are not intended to illustrate the actual shape of a region of a device and are not intended to limit the scope of example implementations.

[00156] It will be understood that although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. Thus, a "first" element could be termed a "second" element without departing from the teachings of the present implementations. [00157] Unless otherwise defined, the terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which these concepts belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and/or the present specification and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

[00158] While certain features of the described implementations have been illustrated as described herein, many modifications, substitutions, changes, and equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover such modifications and changes as fall within the scope of the implementations. It should be understood that they have been presented by way of example only, not limitation, and various changes in form and details may be made. Any portion of the apparatus and/or methods described herein may be combined in any combination, except mutually exclusive combinations. The implementations described herein can include various combinations and/or sub-combinations of the functions, components, and/or features of the different implementations described.

Claims

WHAT IS CLAIMED IS:

1. A method comprising: identifying a first gaze point within a user interface (UI) that includes a set of UI elements included within a corresponding set of bounding areas, the first gaze point within a first bounding area of the set of bounding areas; selecting a first search subset of the set of bounding areas, the first search subset at least partially surrounding the first gaze point and the first bounding area; searching the first search subset to identify a second bounding area of the first search subset corresponding to a second gaze point obtained after the first gaze point; selecting a second search subset of the set of bounding areas, the second search subset at least partially surrounding the second gaze point and the second bounding area, and overlapping with the first search subset; and determining an identified UI element of the set of UI elements that is included within the second bounding area.

2. The method of claim 1, further comprising: identifying a third gaze point within a third bounding area and within the second search subset, based on a search of the second search subset; selecting a third search subset of the set of bounding areas, the third search subset at least partially surrounding the third gaze point and the third bounding area, and overlapping with the second search subset; and determining a further identified UI element of the set of UI elements that is included within the third bounding area.

3. The method of claim 1, further comprising: searching for a third gaze point within the second search subset; failing to identify the third gaze point within the second search subset; expanding the second search subset to obtain an expanded second search subset; and identify ing the third gaze point within the expanded second search subset.

4. The method of claim 1. further comprising: selecting the first search subset and the second search subset based on at least one gaze point movement threshold.

5. The method of claim 1. further comprising: determining a number of bounding areas of the set of bounding areas to include within the second search subset based on a gaze velocity determined with respect to the first gaze point and the second gaze point.

6. The method of claim 1. further comprising: determining a shape of the second search subset based on a gaze vector determined with respect to the first gaze point and the second gaze point.

7. The method of claim 1. further comprising: generating bounding areas of the set of bounding areas as being adjacent to, and contiguous with, one another.

8. The method of claim 1. further comprising: generating the set of bounding areas as Voronoi regions.

9. The method of claim 1, further comprising: identifying the first gaze point and the first search subset with respect to a first image frame captured at a first time; and identifying the second gaze point and the second search subset with respect to a second image frame captured at a second time.

10. The method of claim 1. further comprising: receiving a selection of the identified UI element.

11. A non-transitory computer-readable medium storing executable instructions that when executed by at least one processor cause the at least one processor to: identify a first gaze point within a user interface (UI) that includes a set of UI elements included within a corresponding set of bounding areas, the first gaze point within a first bounding area of the set of bounding areas; select a first search subset of the set of bounding areas, the first search subset at least partially surrounding the first gaze point and the first bounding area; search the first search subset to identify a second bounding area of the first search subset corresponding to a second gaze point obtained after the first gaze point; select a second search subset of the set of bounding areas, the second search subset at least partially surrounding the second gaze point and the second bounding area, and overlapping with the first search subset; and determine an identified UI element of the set of UI elements that is included within the second bounding area.

12. The non-transitory computer-readable medium of claim 11, wherein the instructions, when executed by the at least one processor, are further configured to cause the at least one processor to: identify a third gaze point within a third bounding area and within the second search subset, based on a search of the second search subset; select a third search subset of the set of bounding areas, the third search subset at least partially surrounding the third gaze point and the third bounding area, and overlapping with the second search subset; and determine a further identified UI element of the set of UI elements that is included within the third bounding area.

13. The non-transitory computer-readable medium of claim 11, wherein the instructions, when executed by the at least one processor, are further configured to cause the at least one processor to: select the first search subset and the second search subset based on at least one gaze point movement threshold.

14. The non-transitory computer-readable medium of claim 11, wherein the instructions, when executed by the at least one processor, are further configured to cause the at least one processor to: determine a number of bounding areas of the set of bounding areas to include within the second search subset based on a gaze velocity determined with respect to the first gaze point and the second gaze point.

15. The non-transitory computer-readable medium of claim 11, wherein the instructions, when executed by the at least one processor, are further configured to cause the at least one processor to: determine a shape of the second search subset based on a gaze vector determined with respect to the first gaze point and the second gaze point.

16. The non-transitory computer-readable medium of claim 11, wherein the instructions, when executed by the at least one processor, are further configured to cause the at least one processor to: identify the first gaze point and the first search subset with respect to a first image frame captured at a first time; and identify the second gaze point and the second search subset with respect to a second image frame captured at a second time.

17. A head mounted device (HMD) comprising: at least one frame; at least one gaze tracker including an image sensor mounted on the at least one frame; at least one processor; and at least one memory', the at least one memory' storing a set of instructions, which, when executed, cause the at least one processor to: identify a first gaze point within a user interface (UI) that includes a set of UI elements within a corresponding set of bounding areas, the first gaze point being included within a first bounding area of the set of bounding areas; select a first search subset of the set of bounding areas, the first search subset at least partially surrounding the first gaze point and the first bounding area; search the first search subset to identify a second bounding area of the first search subset corresponding to a second gaze point obtained after the first gaze point; select a second search subset of the set of bounding areas, the second search subset at least partially surrounding the second gaze point and the second bounding area, and overlapping with the first search subset; and determine an identified UI element of the set of UI elements that is included within the second bounding area.

18. The HMD of claim 17, wherein the set of instructions, when executed by the at least one processor, are further configured to cause the HMD to: identify a third gaze point within a third bounding area and within the second search subset, based on a search of the second search subset; select a third search subset of the set of bounding areas, the third search subset at least partially surrounding the third gaze point and the third bounding area, and overlapping with the second search subset; and determine a further identified UI element of the set of UI elements that is included within the third bounding area.

19. The HMD of claim 17, wherein the set of instructions, when executed by the at least one processor, are further configured to cause the HMD to: select the first search subset and the second search subset based on at least one gaze point movement threshold.

20. The HMD of claim 17, wherein the set of instructions, when executed by the at least one processor, are further configured to cause the HMD to: identify the first gaze point and the first search subset with respect to a first image frame captured at a first time; and identify the second gaze point and the second search subset with respect to a second image frame captured at a second time.