HK1180041B

HK1180041B - Adjustment of a mixed reality display for inter-pupillary distance alignment

Info

Publication number: HK1180041B
Application number: HK13107026.8A
Authority: HK
Inventors: J．R．刘易斯; K．S．佩雷兹; R．L．克劳扣; A．A-A．基普曼
Original assignee: 微软技术许可有限责任公司
Priority date: 2011-08-30
Filing date: 2013-06-14
Publication date: 2016-12-02

Description

Adjusting a mixed reality display for interpupillary alignment

Technical Field

The invention relates to a mixed reality display.

Background

Head mounted displays and binoculars are examples of binocular viewing systems in which there are optical systems for the user's eyes to each view a scene. The interpupillary distance between the pupils of both eyes, which estimates show that the distance of adults varies between about 25 and 30 millimeters. In the case of head-mounted mixed reality displays, if the optical axis of each optical system is not aligned with the respective eye, it may be difficult for the user to correctly fuse three-dimensional (3D) content, or may suffer from eye fatigue or headaches like the muscles of the user's natural visual system. For binocular viewing devices such as binoculars and binoculars, a user manually adjusts the position of each optical system in the eyepiece by trial and error to obtain a clear binocular view, which may take several minutes.

Disclosure of Invention

The present technology provides various embodiments for adjusting a see-through, near-eye, mixed reality display to align with the user's interpupillary distance (IPD). A see-through, near-eye mixed reality display device includes, for each eye, a display optical system positioned to be seen through by each eye, and the display optical system has an optical axis (typically at the center of the optical system). Each display optical system is active. At least one sensor of each display optical system captures data for each eye, and one or more processors determine whether the display device is aligned with the user IPD. If not, one or more adjustment values to be applied by the display adjustment mechanism are determined for moving the position of the display optical system. Various examples of display adjustment mechanisms are described.

The present technology provides embodiments of a system for adjusting a see-through, near-eye, mixed reality display to align with an interpupillary distance (IPD) of a user. The system includes a see-through, near-eye, mixed reality display device including, for each eye, a display optical system having an optical axis, the display optical system positioned to be seen through by the respective eye. The display device includes a respective movable support structure for supporting each display optical system. At least one sensor of each display optical system is attached to the display device. The at least one sensor has a detection area at a location for capturing data of a corresponding eye. One or more processors of the system may access a memory that stores the captured data and software. The one or more processors determine one or more position adjustment values for each display optical system for alignment with the IPD based on the captured data and the position of the respective detection region. At least one display adjustment mechanism is attached to the display device and communicatively coupled to the one or more processors to move the at least one movable support structure to adjust the position of the respective display optical system according to the one or more position adjustment values.

The present technology also provides embodiments of a method for automatically aligning a see-through, near-eye, mixed reality display device with an IPD of a user. The method is operable within a see-through, near-eye, mixed reality system that includes, for each eye, a display optical system having an optical axis positioned to be seen through by the respective eye. The method includes automatically determining whether the see-through, near-eye, mixed reality display device is aligned with the user IPD according to alignment criteria. In response to the display device not being aligned with the user IPD according to the alignment criteria, one or more adjustment values of at least one display optical system are determined so as to satisfy the alignment criteria, and the at least one display optical system is adjusted based on the one or more adjustment values.

The present technology provides another embodiment of a system that adjusts a see-through, near-eye, mixed reality display system in any of three dimensions to align with the user's IPD. In this embodiment, the system includes a see-through, near-eye, mixed reality display device including, for each eye, a display optical system having an optical axis, the display optical system positioned to be seen through by the respective eye. The display device includes a respective movable support structure for supporting each display optical system. The respective movable support structure is capable of movement in three dimensions. At least one sensor attached to each display optical system of the display device has a detection area at a position for capturing data of the corresponding eye. One or more processors have access to a memory for storing software and data, including captured data for each eye.

One or more processors of the system determine one or more position adjustment values for each display optical system in any of the three dimensions for alignment with the IPD based on the captured data and the position of the respective detection area. The system also includes at least one display adjustment mechanism attached to the display device and communicatively coupled to the one or more processors to move the at least one respective movable support structure in any of the three dimensions under control of the one or more processors in accordance with the one or more position adjustment values.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Drawings

FIG. 1A is a block diagram depicting example components of one embodiment of a see-through, mixed reality display device with an adjustable IPD in a system environment in which the device may operate.

Fig. 1B is a block diagram depicting example components of another embodiment of a see-through, mixed reality display device with an adjustable IPD.

Fig. 2A is a top view showing an example of a gaze vector extending to a gaze point at a distance and extending in a direction aligned to a far IPD.

Fig. 2B is a top view illustrating an example of a gaze vector extending to a gaze point at a distance and extending towards a direction aligned to a near IPD.

Fig. 3A is a flow diagram of an embodiment of a method for aligning a see-through, near-eye, mixed reality display with an IPD.

Fig. 3B is a flow chart of an example of an implementation of a method for adjusting a display device to align with a user IPD.

FIG. 3C is a flow chart illustrating different example options for mechanical or automatic adjustment of at least one display adjustment mechanism.

Fig. 4A illustrates an exemplary arrangement of a see-through, near-eye, mixed reality display device implemented as eyewear with an active display optical system including a gaze-detection element.

Fig. 4B illustrates another exemplary arrangement of a see-through, near-eye, mixed reality display device implemented as eyewear with an active display optical system including gaze-detection elements.

Fig. 4C illustrates yet another exemplary arrangement of a see-through, near-eye, mixed reality display device implemented as eyewear with an active display optical system including gaze-detection elements.

Fig. 4D, 4E, and 4F show different views of one example of a mechanical display adjustment mechanism using a sliding mechanism that a user can actuate to move the display optical system.

FIG. 4G illustrates one example of a mechanical display adjustment mechanism using a wheel mechanism that a user may actuate to move the display optics.

Fig. 4H and 4I show different views of one example of a mechanical display adjustment mechanism using a ratchet mechanism that a user may actuate to move the display optical system.

FIG. 4J shows a side view of a ratchet such as may be used with the mechanisms of FIGS. 4H and 4I.

Fig. 5A is a side view of a temple in an eyewear embodiment of a mixed reality display device that provides support for hardware and software components.

Fig. 5B is a side view of a temple that provides support for hardware and software components and three-dimensional adjustment of a microdisplay assembly in an embodiment of a mixed reality display device.

Fig. 6A is a top view of an embodiment of an active display optical system of a see-through, near-eye, mixed reality device including an arrangement of gaze detection elements.

Fig. 6B is a top view of another embodiment of an active display optical system of a see-through, near-eye, mixed reality device including an arrangement of gaze-detection elements.

Fig. 6C is a top view of a third embodiment of an active display optical system of a see-through, near-eye, mixed reality device including an arrangement of gaze-detection elements.

Fig. 6D is a top view of a fourth embodiment of an active display optical system of a see-through, near-eye, mixed reality device including an arrangement of gaze-detection elements.

FIG. 7A is a block diagram of one embodiment of hardware and software components of a see-through, near-eye, mixed reality display unit that may be used with one or more embodiments.

FIG. 7B is a block diagram of one embodiment of hardware and software components of a processing unit associated with a see-through, near-eye, mixed reality display unit.

Fig. 8A is a block diagram of an embodiment of a system for determining a position of an object within a user field of view of a see-through, near-eye, mixed reality display device.

FIG. 8B is a flow diagram of an embodiment of a method for determining a three-dimensional user field of view for a see-through, near-eye, mixed reality display device.

Fig. 9A is a flow diagram of an embodiment of a method for aligning a see-through, near-eye, mixed reality display with an IPD.

Fig. 9B is a flow diagram of an embodiment of a method for aligning a see-through, near-eye, mixed reality display with an IPD based on image data in an image format for a pupil.

FIG. 9C is a flow diagram of an embodiment of a method for determining at least one adjustment value for a display adjustment mechanism based on a mapping criterion for at least one sensor of each display optical system that does not meet an alignment criterion.

Fig. 9D is a flow diagram of a method embodiment for aligning a see-through, near-eye, mixed reality display with an IPD based on gaze data.

Fig. 9E is a flow chart of another version of the method embodiment of fig. 9D.

Fig. 9F is a flow diagram of a method embodiment to align a see-through, near-eye, mixed reality display with an IPD based on gaze data relative to an image of a virtual object.

Fig. 10A is a flow diagram of an embodiment of a method for realigning a see-through, near-eye, mixed reality display device with an interpupillary distance (IPD).

Fig. 10B is a flow diagram illustrating an embodiment of a method for selecting an IPD from either a near IPD or a far IPD.

Fig. 11 is a flow diagram illustrating an embodiment of a method for determining whether a change is detected that indicates that alignment with a selected IPD no longer satisfies alignment criteria.

Fig. 12 is a flow diagram of an embodiment of a method for determining gaze in a see-through, near-eye, mixed reality display system.

FIG. 13 is a flow diagram of an embodiment of a method for identifying flashes in image data.

Fig. 14 is a flow diagram of an embodiment of a method that may be used to determine boundaries of a gaze-detection coordinate system.

Fig. 15 is a flow diagram illustrating an embodiment of a method for determining the location of the corneal center in a coordinate system using an optical gaze detection element of a see-through, near-eye, mixed reality display.

Fig. 16 provides an illustrative example of defining a plane using the geometry provided by the arrangement of optical elements to form a gaze-detection coordinate system that may be used by the embodiment of fig. 15 to find the center of the cornea.

Figure 17 is a flow chart illustrating an embodiment of a method for determining pupil center from sensor generated image data.

Fig. 18 is a flow diagram illustrating an embodiment of a method for determining a gaze vector based on a determined pupil center, a corneal center, and a center of rotation of an eyeball.

Fig. 19 is a flow diagram illustrating an embodiment of a method for determining gaze based on glint data.

FIG. 20 is a block diagram of an exemplary mobile device that may operate in embodiments of the present technology.

FIG. 21 is a block diagram depicting one embodiment of a computing system that may be used to implement a hub computing system.

Detailed Description

Interpupillary distance (IPD) generally refers to the horizontal distance between the pupils of a user. The IPD may also include a vertical or height dimension. Furthermore, many people have an IPD that is asymmetric with respect to their nose. For example, the left eye is closer to the nose than the right eye.

A see-through, near-eye mixed reality display includes, for each eye, a display optical system having an optical axis, the display optical system positioned to be seen through by the respective eye. The display device is aligned with the IPD (asymmetric or symmetric) of the user when the optical axis of each display optical system is aligned with the corresponding pupil. If each pupil is not aligned with the optical axis within a criterion, the corresponding display optical system is adjusted via the display adjustment mechanism until the alignment satisfies a criterion. An example of a criterion is distance, e.g. 1 mm. When the pupil is satisfactorily aligned with the optical axis, the distance between the optical axes of the display optical systems represents the interpupillary distance (IPD) within at least one criterion.

In the embodiments described below, each display optical system is positioned within a movable support structure, which is positionally adjustable by a display adjustment mechanism. In many examples, the adjustment is performed automatically under control of a processor. For example, adjustment in more than one direction may be performed by a set of motors that may move the display optics vertically, horizontally, or in the depth direction. In other embodiments, the display adjustment mechanism is a mechanical display adjustment mechanism that is user-actuated to position the display optical system according to a displayed instruction or an audio instruction. In some examples shown below, the control of the mechanical display adjustment mechanism is calibrated so that each actuation corresponds to a unit of distance that the display optical system is to move in a particular direction, and the instructions are provided in terms of the number of actuations.

FIG. 1A is a block diagram depicting example components of one embodiment of a see-through, mixed reality display device with an adjustable IPD in a system environment in which the device may operate. System 10 includes a see-through display device that is a near-eye, head-mounted display device 2 in communication with a processing unit 4 via line 6. In other embodiments, head mounted display device 2 communicates with processing unit 4 through wireless communication. The processing unit 4 may take various embodiments. In some embodiments, processing unit 4 is a separate unit that may be worn on the user's body (e.g., worn on the wrist in the illustrated example) or placed in a pocket, and includes most of the computing power for operating near-eye display device 2. Processing unit 4 may communicate wirelessly (e.g., WiFi, bluetooth, infrared, or other wireless communication means) with one or more hub computing systems 12. In other embodiments, the functionality of the processing unit 4 may be integrated in the software and hardware components of the display device 2.

The head mounted display device 2 (which in one embodiment is in the shape of framed 115 glasses) is worn on the head of a user so that the user can see through the display (which in this example is embodied as display optics 14 for each eye) and thus have an actual direct view of the space in front of the user.

The term "actual direct view" is used to refer to the ability to see real-world objects directly with the human eye, rather than seeing the created image representation of the object. For example, viewing through glasses in a room would allow a user to have an actual direct view of the room, whereas viewing a video of a room on a television is not an actual direct view of the room. Based on the context of executing software (e.g., a gaming application), the system may project an image of a virtual object (sometimes referred to as a virtual image) on a display viewable by a person wearing the see-through display device, while the person also views real-world objects through the display.

The frame 115 provides a support for holding the elements of the system in place and a conduit for electrical connections. In this embodiment, the frame 115 provides a convenient frame for the glasses as a support for the elements of the system discussed further below. In other embodiments, other support structures may be used. Examples of such structures are a visor or goggles. The frame 115 includes a temple or side arm for resting on each ear of the user. The temple 102 represents an embodiment of a right temple and includes a control circuit 136 of the display device 2. The nose bridge 104 of the frame comprises a microphone 110 for recording sound and transmitting audio data to the processing unit 4.

Hub computing system 12 may be a computer, a gaming system or console, or the like. According to an example embodiment, hub computing system 12 may include hardware components and/or software components such that hub computing system 12 may be used to execute applications such as gaming applications, non-gaming applications, and the like. The application may execute on hub computing system 12, display device 2, on mobile device 5 as described below, or on a combination of these devices.

Hub computing system 12 also includes one or more capture devices, such as capture devices 20A and 20B. In other embodiments, more or less than two capture devices may be used to capture a room or other physical environment of the user.

The capture devices 20A and 20B may be, for example, cameras that visually monitor one or more users and the surrounding space so that gestures and/or movements performed by the one or more users and the structure of the surrounding space may be captured, analyzed, and tracked to perform one or more controls or actions in an application and/or animate an avatar or on-screen character.

The hub computing environment 12 may be connected to an audiovisual device 16 such as a television, a monitor, a high-definition television (HDTV), or the like that may provide game or application visuals. In some cases, the audiovisual device 16 may be a three-dimensional display device. In one example, the audiovisual device 16 includes a built-in speaker. In other embodiments, the audiovisual device 16, a separate stereo system, or the hub computing device 12 is connected to the external speakers 22.

Fig. 1B is a block diagram depicting example components of another embodiment of a see-through, mixed reality display device with an adjustable IPD. In this embodiment, near-eye display device 2 communicates with a mobile computing device 5, which is an example embodiment of processing unit 4. In the example shown, the mobile device 5 communicates via a line 6, but in other examples the communication may also be wireless.

Further, as in hub computing system 12, gaming and non-gaming applications may execute on the processor of mobile device 5, where user motion controls or user motions animate an avatar displayed on display 7 of device 5. The mobile device 5 also provides a network interface for communicating with other computing devices, such as hub computing system 12, using wired or wireless communication protocols over the internet or over a wired or wireless communication medium via another communication network. A remote network accessible computing system like hub computing system 12 may be used as processing power and remote data access for processing unit 4 like mobile device 5. Examples of hardware and software components of the mobile device 5 (which may be included in a smartphone or tablet computing device, for example) are described in fig. 20, and these components may include hardware and software components of the processing unit 4, such as those discussed in the embodiment of fig. 7A. Some other examples of mobile devices 5 are laptop or notebook computers and netbook computers.

In some embodiments, gaze detection for each of the user's eyes is based on the relationship between the three-dimensional coordinate system of the gaze detection element on a near-eye, mixed reality display device, such as glasses 2, and one or more human eye elements, such as the corneal center, the center of eyeball rotation, and the pupil center. Examples of gaze detection elements that may be part of the coordinate system include an illuminator that generates glints and at least one sensor for capturing data representative of the generated glints. As described below (see discussion of fig. 16), the corneal center may be determined using planar geometry based on two glints. The corneal center links the pupil center and the center of rotation of the eyeball, which can be considered a fixed location for determining the optical axis of the user's eye at a particular gaze or viewing angle.

Fig. 2A is a top view showing an example of a gaze vector extending to a gaze point at a distance and extending in a direction aligned to a far IPD. Fig. 2A shows an example of gaze vectors that intersect at a gaze point, where the user's eyes are effectively concentrated at infinity (e.g., beyond five (5) feet), or in other words, when the user is looking straight ahead. Models 160l, 160r of the eyeball of each eye are shown based on the gullsland schematic eye model. For each eye, eyeball 160 is modeled as a sphere having a center of rotation 166, and includes a cornea 168 that is also modeled as a sphere and has a center 164. The cornea is rotated relative to the eyeball, and the center of rotation 166 of the eyeball may be treated as a fixed point. The cornea overlies the iris 170 with the pupil 162 centered on the iris 170. In this example, on the surface 172 of the respective cornea are glints 174 and 176.

In the embodiment shown in fig. 2A, the sensor detection area 139 is aligned with the optical axis of each display optical system 14 within the spectacle frame 115. In this example, the sensor associated with this detection area is a camera capable of capturing image data representing flashes 174l and 176l generated by illuminators 153a and 153b, respectively, on the left side of the frame 115 and data representing flashes 174r and 176r generated by illuminators 153c and 153d, respectively. Through the display optical system (14 l and 14r in the spectacle frame 115), the user's field of view includes real objects 190, 192, and 194 and virtual objects 182, 184, and 186.

An axis 178 formed from the center of rotation 166 through the corneal center 164 to the pupil 162 is the optical axis of the eye. Gaze vector 180 is sometimes referred to as the line of sight or visual axis extending from the fovea through pupil center 162. The fovea is a small area located at about 1.2 degrees in the retina. The angular offset between the optical axis and the visual axis calculated in the embodiment of fig. 14 has horizontal and vertical components. The horizontal component is up to 5 degrees from the optical axis and the vertical component is between 2 and 3 degrees. In many embodiments, the optical axis is determined and the minor correction is determined by user calibration to obtain the visual axis selected as the gaze vector. For each user, the virtual object may be displayed by the display device at each of a plurality of predetermined locations at different horizontal and vertical locations. When an object is displayed at each location, the optical axis of each eye can be calculated and the ray modeled as extending from that location to the user's eye. The gaze offset angle, with horizontal and vertical components, may be determined based on how the optical axis must be moved to align with the modeled ray. From different positions, the average gaze offset angle with horizontal and vertical components may be selected as a small correction to be applied to each calculated optical axis. In some embodiments, only the horizontal component is used for gaze offset angle correction.

The visual axes 180l and 180r show that the gaze vectors are not perfectly parallel because they become closer to each other as they extend from the eyeball to the gaze point at effectively infinity as indicated by the symbols 181l and 181r in the field of view. At each display optical system 14, the gaze vector 180 appears to intersect the optical axis, with the sensor detection region 139 centered at this intersection. In this configuration, the optical axis is aligned with the interpupillary distance (IPD). When the user looks straight ahead, the measured IPD is also called far IPD.

When identifying an object for a user to focus on to align the IPD at a distance, the object may be aligned in a direction along each optical axis of each display optical system. Initially, the alignment between the optical axis and the pupil of the user is unknown. For far IPDs, the direction may be straight ahead through the optical axis. When aligning near IPDs, the object identified may be in a direction through the optical axis, but not directly in front due to the necessary rotation of the eye for near distances, but centered between the optical axes of the display optics.

Fig. 2B is a top view illustrating an example of a gaze vector extending to a gaze point at a distance and extending towards a direction aligned to a near IPD. In this example, the cornea 168l of the left eye rotates to the right or toward the user's nose, and the cornea 168r of the right eye rotates to the left or toward the user's nose. Both pupils are gazing at a real object 194 at a closer distance, e.g., two (2) feet in front of the user. The gaze vectors 180l and 180r from each eye enter the Panum's fusional area 195 where the real object 194 is located. Panum's fusion area is an area of single vision in a binocular viewing system like human vision. The intersection of gaze vectors 180l and 180r indicates that the user is looking at real object 194. At such distances, as the eyeballs rotate inward, the distance between their pupils decreases to a near IPD. The near IPD is typically about 4mm smaller than the far IPD. A near IPD distance criterion, such as a point of regard at less than four feet, may be used to switch or adjust the IPD alignment of the display optical system 14 to that of the near IPD. For near IPD, each display optical system 14 may be moved toward the user's nose such that the optical axis and detection region 139 are moved toward the nose by several millimeters, as represented by detection regions 139ln and 139 rn.

Users are typically unaware of their IPD data. The following discussion illustrates some embodiments of methods and systems for determining the IPD of a user and adjusting the display optical system accordingly.

Fig. 3A is a flow diagram of a method embodiment 300 for aligning a see-through, near-eye, mixed reality display with an IPD. At step 301, one or more processors of control circuitry 136 (e.g., processor 210, processing units 4, 5, hub computing system 12, or a combination of these, below in fig. 7A) automatically determine whether the see-through, near-eye, mixed reality display device is aligned with the IPD of the user according to alignment criteria. If not, the one or more processors cause, at step 302, an adjustment to the display device via at least one display adjustment mechanism to align the device with the user IPD. If it is determined that the see-through, near-eye, mixed reality display device is aligned with the user IPD, then optionally at step 303, the IPD dataset for the user is stored. In some embodiments, display device 2 may automatically determine whether IPD alignment is present each time anyone wears display device 2. However, because IPD data is typically fixed for adults due to the limitations of the human skull, an IPD data set for each user may typically be determined once and stored. The stored IPD data set can be used at least as an initial setting of the display device to start the IPD alignment check.

The display device 2 has a display optical system for each eye and in some embodiments the one or more processors store the IPD as the distance between the optical axes of the display optical systems at locations that meet the alignment criteria. In some embodiments, the one or more processors store the position of each optical axis in the IPD dataset. The IPD of a user may be asymmetric, for example, with respect to the user's nose. For example, the left eye is closer to the nose than the right eye. In one example, the adjustment values for the display adjustment mechanism for each display optical system from the initial position may be saved in the IPD data set. The initial position of the display adjustment mechanism can have a fixed position relative to the stationary frame portion, such as a point on the nose bridge 104. Based on this fixed position relative to the stationary frame portion and the adjustment values for one or more movement directions, the position of each optical axis relative to the stationary frame portion can be stored as a pupil alignment position for each display optical system. In addition, where the motionless frame portion is a point on the bridge of the nose, a position vector of the respective pupil to the user's nose can be estimated for each eye based on the fixed position to the point on the bridge of the nose and the adjustment value. The two position vectors for each eye provide at least a horizontal distance component and may also include a vertical distance component. The interpupillary distance IPD in one or more directions may be derived from these distance components.

Fig. 3B is a flow chart of an example of an implementation of a method for adjusting a display device to align with a user IPD. In this method, at least one display adjustment mechanism adjusts the position of the misaligned at least one display optical system 14. At step 407, one or more adjustments are automatically determined for the at least one display adjustment mechanism so as to satisfy the alignment criteria of the at least one display optical system. At step 408, at least one display optical system is adjusted based on the one or more adjustment values. This adjustment may be performed automatically under control of the processor, or mechanically as discussed further below.

FIG. 3C is a flow diagram illustrating different example options for mechanical or automatic adjustment that may be performed by at least one display adjustment mechanism that implements step 408. Depending on the configuration of the display adjustment mechanism in the display device 2, the display adjustment mechanism may automatically (meaning under control of the processor) adjust at least one display adjustment mechanism in step 334 according to the one or more adjustment values that have been determined in step 407. Alternatively, one or more processors associated with the system (e.g., processors in processing units 4, 5, processor 210 in control circuitry 136, or even a processor of hub computing system 12) may electronically provide instructions to cause a user to apply the one or more adjustment values to at least one display adjustment mechanism, per step 333. There may be a combination of automatic adjustment and mechanical adjustment according to the instructions.

Some examples of electronically provided instructions are instructions displayed by microdisplay 120, mobile device 5 or on display 16 by hub computing system 12, or audio instructions through speaker 130 of display device 2. There may be device configurations with automatic adjustment and mechanical mechanisms depending on user preferences or to allow some additional control by the user.

In many embodiments, the display adjustment mechanism includes a mechanical controller having a calibration for a user to activate the controller to correspond to a predetermined distance and direction of movement of the at least one display optical system, and a processor that determines the content of the instruction based on the calibration. In the examples below for fig. 4D through 4J, examples of mechanical display adjustment mechanisms are provided that relate a mechanical action of a wheel or button press or a user activated action to a particular distance. The instructions displayed to the user may include a particular sequence of user activations associated with a predetermined distance. The force is provided by the user rather than an electronically controlled component and the sequence of instructions is determined to cause the desired change in position. For example, a crosshair may be displayed to guide the user and inform the user to move the slider three slots to the right. This causes a predetermined repositioning of, for example, 3mm of the display optical system.

Fig. 4A illustrates an exemplary arrangement of a see-through, near-eye, mixed reality display device implemented as eyewear with an active display optical system including a gaze-detection element. Appearing as a lens for each eye is a display optical system 14, e.g., 14r and 14l, for each eye. The display optical system includes see-through lenses, such as 118 and 116 in fig. 6A-6D, as a pair of ordinary eyeglasses, but also contains optical elements (e.g., mirrors, filters) for seamlessly blending virtual content with the actual direct real world view seen through the lenses 118, 116. The display optics 14 have an optical axis generally centered on the see-through lenses 118, 116, where the light is generally collimated to provide an undistorted view. For example, when an eye care professional fits a pair of ordinary glasses to the user's face, the goal is for the glasses to fall on the user's nose at a position where each pupil is aligned with the optical axis or center of the corresponding lens, typically so that the collimated light reaches the user's eyes for a clear or undistorted view.

In the example of fig. 4A, the detection regions 139r, 139l of at least one sensor are aligned with the optical axes of their respective display optics 14r, 14l such that the centers of the detection regions 139r, 139l capture light along the optical axes. If the display optical system 14 is aligned with the user's pupil, each detection region 139 of the respective sensor 134 is aligned with the user's pupil. The reflected light of the detection region 139 is transmitted via one or more optical elements to the camera's real image sensor 134, in this example the sensor 134 is shown by a dashed line inside the frame 115.

In one example, a visible light camera, commonly referred to as an RGB camera, may be the sensor, and an example of an optical element or light directing element is a partially transmissive and partially reflective visible light reflector. The visible light camera provides image data of the pupil of the user's eye, while the IR photodetector 152 captures the glint as a reflection in the IR portion of the spectrum. If a visible light camera is used, reflections of the virtual image may appear in the eye data captured by the camera. Image filtering techniques can be used to remove virtual image reflections if desired. The IR camera is insensitive to virtual image reflections on the eye.

In other examples, the at least one sensor 134 is an IR camera or a Position Sensitive Detector (PSD) to which IR radiation may be directed. For example, the heat reflective surface may transmit visible light but reflect IR radiation. The IR radiation reflected from the eye may be from incident radiation of the illuminator 153, other IR illuminators (not shown) or from ambient IR radiation reflected from the eye. In some examples, the sensor 134 may be a combination of RGB and IR cameras, and the light directing elements may include visible light reflecting or turning elements and IR radiation reflecting or turning elements. In some examples, the camera may be small, e.g., 2 millimeters (mm) by 2 mm. An example of such a camera sensor is omnivision ov 7727. In other examples, the camera may be small enough (e.g., omnivision ov 7727), for example, so that the image sensor or camera 134 can be centered on the optical axis or other location of the display optical system 14. For example, the camera 134 may be embedded in a lens of the system 14. Additionally, image filtering techniques may be applied to blend the cameras into the user's field of view to mitigate any distraction to the user.

In the example of fig. 4A, there are four sets of illuminators 153, the illuminators 153 being paired with photodetectors 152 and separated by a barrier 154 to avoid interference between incident light generated by the illuminators 153 and reflected light received at the photodetectors 152. To avoid unnecessary clutter in the drawings, reference numerals are shown relative to a representative pair. Each illuminator may be an Infrared (IR) illuminator that generates a narrow beam of light of approximately a predetermined wavelength. Each of the photodetectors may be selected to capture light at about the predetermined wavelength. Infrared may also include near infrared. Because the illuminator or photodetector may have a wavelength shift or a small range around the wavelength is acceptable, the illuminator and photodetector may have a tolerance range related to the wavelength used for generation or detection. In embodiments where the sensor is an IR camera or an IR Position Sensitive Detector (PSD), the photodetector may be an additional data capture device and may also be used to monitor the operation of the luminaire, such as wavelength shift, beam width change, etc. The photodetector may also provide flash data with a visible light camera as the sensor 134.

As described below, in some embodiments where the corneal center is calculated as part of determining the gaze vector, two glints (and thus two illuminators) will be sufficient. However, other embodiments may use additional glints when determining the pupil location and thus the gaze vector. Because the eye data representing glints is captured repeatedly, for example at a frame rate of 30 frames per second or greater, the data for one glint may be obscured by the eyelids or even by the eyelashes, but the data may be collected by the glints generated by another illuminator.

In fig. 4A, each display optical system 14 and its arrangement of gaze detection elements facing each eye (e.g., camera 134 and its detection region 139), optical alignment elements (not shown in this figure; see fig. 6A-6D below), illuminator 153, and photodetector 152 are located on movable inner frame portions 171l, 171 r. In this example, the display adjustment mechanism includes one or more motors 203 having a drive shaft 205 attached to an object, the drive shaft 205 being used to push and pull the object in at least one of three dimensions. In this example, the object is an inner frame portion 117 that slides from left to right or vice versa within the frame 115, guided and powered by a drive shaft 205 driven by a motor 203. In other embodiments, one motor 203 can drive both inner frames. As discussed with reference to fig. 5A and 5B, the processor of the control circuitry 136 of the display device 2 can be connected to one or more motors 203 via electrical connections within the frame 115 to control adjustment of the drive shaft 205 by the motors 203 in different directions. In addition, the motor 203 is also connected to a power source via the electrical connection of the frame 115.

Fig. 4B illustrates another exemplary arrangement of a see-through, near-eye, mixed reality display device implemented as eyewear with an active display optical system including gaze-detection elements. In this embodiment, each display optical system 14 is enclosed in a separate frame part 115l and 115r, e.g. a separate frame part, which can be moved individually by a motor 203. In some embodiments, the range of movement in any dimension is less than 10 millimeters. In some embodiments, the range of movement is less than 6 millimeters, depending on the range of frame sizes provided for the product. For the horizontal direction, moving each frame a few millimeters to the left or right will not significantly affect the width between the temples (e.g., 102) that attach the display optical system 14 to the user's head. Additionally, in this embodiment, two sets of illuminators 153 and photodetectors 152 are positioned near the top of each frame section 115l, 115r to illustrate the geometric relationship between the illuminators and thus another example of the flashes they generate. This arrangement of the glints may provide more information about the position of the pupil in the vertical direction. In other embodiments like the embodiment in fig. 4A (where the illuminator is closer to the side of the frame portion 115l, 115r, 117l, 117 r), the illuminator 153 may be placed at different angles relative to the frame portion to direct light to different parts of the eye and also to obtain more vertical and horizontal components for identifying pupil location.

Fig. 4C illustrates yet another exemplary arrangement of a see-through, near-eye, mixed reality display device implemented as eyewear with an active display optical system including gaze-detection elements. In this example, the sensors 134r, 134l are in line or aligned with the optical axis at about the center of their respective display optics 14r, 14l, but are located below the system 14 on the frame 115. Additionally, in some embodiments, the camera 134 may be a depth camera or include a depth sensor. In this example, there are two sets of illuminators 153 and photodetectors 152.

The interpupillary distance may describe the distance between the user's pupils in the horizontal direction, but the vertical difference may also be determined. In addition, moving the display optical system in the depth direction between the eye and the display device 2 may also help align the optical axis with the pupil of the user. The user's eyes may actually have different depths in the skull. Movement of the display device in the depth direction relative to the head may also introduce misalignment between the optical axis of display optical system 14 and its respective pupil.

In this example, the motor forms an example of an XYZ transmission mechanism for moving each display optical system 14 in three dimensions. In this example, the motors 203 are located on the outer frame 115 and their drive shafts 205 are attached to the top and bottom of the respective inner frame portions 117. The operation of the motors 203 is synchronized by the control circuit 136 processor 210 for their drive shaft movement. In addition, because this is a mixed reality device, each image generation unit used to create an image of a virtual object (i.e., a virtual image) to the microdisplay assembly 173 or the like for display in the corresponding display optical system 14 is moved by a motor and drive shaft to maintain optical alignment with the display optical system. An example of the microdisplay assembly 173 is described further below. In this example, the motors 203 are three-axis motors or drive shafts that can move them in three dimensions. For example, the drive shaft may be pushed or pulled along the center of the crosshair guide on one directional axis and moved in each of two perpendicular directions in the same plane within the vertical opening of the crosshair guide.

Fig. 4D, 4E, and 4F show different views of one example of a mechanical display adjustment mechanism using a sliding mechanism, which is one example of a mechanical controller that a user can activate to move a display optical system. Fig. 4D shows the different components of the slidable display adjustment mechanism 203 in a side view. In this example, the motor is replaced with a support 203 a. Attachment elements 205a of each support 203a to a movable support of the display optical system (e.g., frame section 115r or inner frame 117 r) include fasteners such as bolt and nut combinations within the movable supports 115r, 117r to secure the support 203a to the frame 115r or inner frame 117 r. In addition, another attachment element 205b (in this example an arm and a catch within the support 203 a) couples each support to a sliding mechanism 203b that includes a slider 207 for each frame side with a deformable device 211 that holds the slider in a slot defined by a slot divider 209 and can change shape when the slider is activated to move to another slot. Each slider 207 has a rim 210 gripping the two edges 213a, 213b of the sliding mechanism 203 b.

Fig. 4E provides a top view of the sliding mechanism 203b with the support 203a in the initial position. The sliders 207l, 207r of each support 203a are held in place by deformable means 211 between the slot dividers 209. In the case of the slider 207l of the left display optical system, as shown in fig. 4F, when the user squeezes both ends of the slider, the slider retracts or shortens in length, and the deformable device 211 contracts in shape to move over the end of the slot divider 209 into the central opening 121 so that the user can push or pull the slider to another slot, in this example the left one. In this example, each slot may represent a calibrated distance (e.g., 1 mm), so when an instruction is displayed to the user, the instruction may be a specific number of specific movements or positions. The user applies the movement force to increase or decrease the IPD, but does not have to determine the amount of adjustment.

FIG. 4G illustrates one example of a mechanical display adjustment mechanism that uses a rotating wheel mechanism that a user can activate to move the display optical system. In this example, the support 203a in nose bridge 104 is replaced by a wheel or turntable 203a attached to each display optical system. The attachment element to the movable support 115r or 117r comprises an arm or rotating shaft from the center of the wheel or dial to the top of the screw. The end of the arm or the shaft on the side of the screw or nut is adapted to the head of the screw or nut in order to screw it. Fasteners secure the screws to the frame 115l or the inner frame 117 l. The rotational force generated by the rotating wheel produces a linear force on the screw and the end of the rotational shaft that is adapted to the screw head also rotates the screw, producing a linear force to push the frame portions 115l, 117l to the left.

Each runner or dial extends outwardly from nose bridge 104 for a portion (e.g., the top in this example). The portion of the wheel that rotates through the opening portion may also be calibrated for an adjustment distance, such as 1 mm. The user can be instructed to turn the left wheel 2 turns toward his or her nose, causing the screw to also turn downward toward the nose and push the frame 115l and inner frame 117l 2mm to the left.

Fig. 4H and 4I show different views of one example of a mechanical display adjustment mechanism using a ratchet mechanism that a user can activate to move the display optical system. The ratchet mechanism is shown for moving the left movable supports 115l, 117 l. The ratchet mechanism for the right movable support 115r, 117r will work similarly. In this example, the support 203a is attached to the left side of the frame portions 115l, 117l via fasteners (e.g., arms and nuts) and is itself fixed via the respective nuts and arms of the two ratchets 204a and 204 b. As shown, each ratchet has teeth. As the wheel is rotated, each pawl 219a locks a new tooth. Each ratchet wheel rotates in one direction only, and the wheels rotate in the opposite direction. Rotation in opposite directions, as indicated by the left and right arrows, produces linear moments in opposite directions at the center of the wheel. FIG. 4J shows a side view of a ratchet such as may be used with the mechanisms of FIGS. 4H and 4I. The ratchet 204a comprises a central opening 123 for connection to the fastening mechanism 205b and a further opening 127 allowing a further fastening mechanism 205b to pass through to reach the centre of the further ratchet 204 b.

The slider button 223l slides within the slotted guide 225l to push the top 227 of the arm 221 downward to rotate each ratchet 204 an increment, e.g., causing a linear torque to push or pull one tooth space of the support 203 a. As shown in the example of fig. 4I, if the slider 223l pushes down on the top 227b and the arm 221b, the wheel 204b rotates to create a moment toward the bridge of the nose that pushes the support 203a via the arm 205b passing through the opening 127 in the other wheel 204a, and thereby pushes the frame portions 115l, 117l toward the bridge of the nose 104, as indicated by the dashed extension of the top of the arm 205b within the ratchet 204 b. Similarly, if the slider 223l is positioned to push the top 227a of the arm 221a downward, the wheel 219a is rotated an increment, which creates a moment away from the wheel 219a to push the support 203a toward the frame portions 115l, 117 l. In some embodiments, for each increment of slider return to center, such each sliding to one side or the other results in one increment and one calibrated adjustment unit length, e.g., 1 mm.

The examples of fig. 4D through 4J are merely some examples of mechanical display adjustment mechanisms. Other mechanical mechanisms may also be used to move the display optics.

Fig. 5A is a side view of the temple 102 of the frame 115 in an eyeglass embodiment of a see-through, mixed reality display device that provides support for hardware and software components. In front of the frame 115 is a video camera 113 facing the physical environment that can capture video and still images. In particular, in some embodiments in which display device 2 does not operate in conjunction with depth cameras such as capture devices 20a and 20b of hub system 12, camera 113 facing the physical environment may be a depth camera and a camera sensitive to visible light. For example, the depth camera may include an IR illuminator emitter and a heat reflective surface, such as a hot mirror in front of a visible image sensor, that transmits visible light and directs reflected IR radiation within a wavelength range emitted by the illuminator or around a predetermined wavelength to a CCD or other type of depth sensor. Data from the sensors may be sent to the processor 210 of the control circuitry 136, or to the processing units 4, 5, or both, which may process the data but the units 4, 5 may also be sent to a computer system on the network or to the hub computing system 12 for processing. The process identifies objects through image segmentation and edge detection techniques and maps depth to objects in the user's real-world field of view. In addition, the camera 113 facing the physical environment may further include an exposure meter for measuring ambient light.

Control circuitry 136 provides various electronics that support other components of head mounted display device 2. More details of the control circuit 136 are provided below with reference to fig. 7A. The earpiece 130, inertial sensor 132, GPS transceiver 144, and temperature sensor 138 are internal to the temple 102 or mounted on the temple 102. In one embodiment, inertial sensors 132 include a three axis magnetometer 132A, three axis gyroscope 132B, and three axis accelerometer 132C (see FIG. 7A). Inertial sensors are used to sense the position, orientation, and sudden acceleration of head mounted display device 2. From these movements, the head position can also be determined.

The display device 2 provides an image generation unit that can create one or more images including one or more virtual objects. In some embodiments, a microdisplay may be used as the image generation unit. In this example, microdisplay assembly 173 includes light processing elements and variable focus adjuster 135. An example of a light processing element is a micro display unit 120. Other examples include one or more optical elements, such as one or more lenses of lens system 122 and one or more reflective elements, such as faces 124a and 124B in fig. 6A and 6B or 124 in fig. 6C and 6D. Lens system 122 may include a single lens or multiple lenses.

A microdisplay unit 120 is mounted on or inside the temple 102, which includes an image source and generates an image of a virtual object. The microdisplay unit 120 is optically aligned with the lens system 122 and the reflecting surface 124 or reflecting surfaces 124a and 124b shown in the following figures. The optical alignment may be along an optical axis 133 or an optical path 133 that includes one or more optical axes. The microdisplay unit 120 projects an image of the virtual object through the lens system 122, which can direct image light onto the reflective element 124, the reflective element 124 directs the light into the light guide optical element 112 in fig. 6C and 6D or onto a reflective surface 124a (e.g., a mirror or other surface), the reflective surface 124a directs the light of the virtual image to a partially reflective element 124b, and the partially reflective element 124b combines a virtual image view along path 133 with a natural or real direct view along optical axis 142 in fig. 6A-6D. The combination of views is directed to the user's eyes.

Variable focus adjuster 135 changes the displacement between one or more light processing elements in the optical path of the microdisplay assembly or the optical power (optical power) of elements in the microdisplay assembly. The optical power of a lens is defined as the inverse of its focal length, e.g. 1/focal length, so that a change in one affects the other. The change in focal length causes a change in a region of the field of view, such as a region at a particular distance, that is in focus for the image generated by the microdisplay assembly 173.

In one example of a displacement change made by microdisplay assembly 173, the displacement change is guided within an armature 137, which armature 137 supports at least one light processing element such as lens system 122 and microdisplay 120 in this example. The armature 137 helps stabilize the alignment along the optical path 133 during physical movement of the components to achieve a selected displacement or optical power. In some examples, the adjuster 135 may move one or more optical elements, such as a lens in the lens system 122 within the armature 137. In other examples, the armature may have a slot or space in the region around the light processing element so that it slides over the element (e.g., microdisplay 120) without moving the light processing element. Another element in the armature, such as lens system 122, is attached so that system 122 or the lens therein slides or moves with moving armature 137. The displacement range is typically on the order of a few millimeters (mm). In one example, this range is 1-2 mm. In other examples, the armature 137 may provide support for a focus adjustment technique involving adjustment of other physical parameters besides displacement to the lens system 122. An example of such a parameter is polarization.

For more information on adjusting the focal length of the microdisplay assembly, see U.S. patent No. 12/941,825 entitled "automatic variable virtual focus for augmented reality" filed on 11/8 2010, entitled AviBar-Zeev and john lewis, and which is incorporated herein by reference.

In one example, the adjuster 135 may be an actuator such as a piezoelectric motor. Other techniques for actuators may also be used, and some examples of such techniques are voice coils formed from coils and permanent magnets, magnetostrictive elements, and electrostrictive elements.

There are different image generation techniques that can be used to implement microdisplay 120. For example, microdisplay 120 can be implemented using a transmissive projection technology where the light source is modulated by an optically active material, backlit with white light. These techniques are typically implemented using LCD-type displays with powerful backlights and high optical power densities. Microdisplay 120 can also be implemented using a reflective technology where external light is reflected and modulated by an optically active material. Depending on the technology, the illumination is forward-lit by a white light source or an RGB source. Digital Light Processing (DLP), Liquid Crystal On Silicon (LCOS), and from Qualcomm, IncThe display techniques are all examples of efficient reflection techniques, as most of the energy is reflected from the modulated structure and can be used in the systems described herein. Additionally, microdisplay 120 can be implemented using an emissive technology, where light is generated by the display. For example, PicoP from Microvision, Inc^TMEngine uses micro-mirror rudders to emit laser signals to a small screen acting as a transmissive elementOr direct a beam of light (e.g., laser light) to the eye.

As described above, the configuration of the light processing elements of the microdisplay assembly 173 creates a focal distance or focal region where virtual objects appear in the image. Changing the configuration changes the focal region of the virtual object image. The focal region determined by the light processing element may be determined and changed based on equation 1/S1+1/S2= 1/f.

Symbol f denotes the focal length of a lens, such as the lens system 122 in the microdisplay assembly 173. Lens system 122 has a front nodal point and a rear nodal point. If a ray is directed forward to either of the two nodes at a given angle to the optical axis, the ray will exit the other node at an equal angle to the optical axis. In one example, the rear node of lens system 122 would be between itself and microdisplay 120. The distance from the rear node to microdisplay 120 can be represented as S2. The front nodal point is typically within a few millimeters of the lens system 122. The target position is the position of the virtual object image to be generated by microdisplay 120 in three-dimensional physical space. The distance from the front node to the target position of the virtual image may be represented as S1. The sign convention shows that S1 has a negative value because the image is a virtual image appearing on the same side of the lens as microdisplay 120.

If the focal length of the lens is fixed, S1 and S2 are changed to focus the virtual object at a different depth. For example, the initial position may have S1 set to infinity and S2 equal to the focal length of lens system 122. Assuming that the lens system 122 has a focal length of 10mm, consider an example in which the virtual object is to be placed approximately 1 foot (i.e., 300 mm) in the user's field of view. S1 is now approximately-300 mm, f is 10mm and S2 is currently set at the initial position of the focal length of 10mm, meaning that the rear node of the lens system 122 is 10mm from the microdisplay 122. A new distance or new displacement between lens 122 and microdisplay 120 is determined based on 1/(-300) +1/S2=1/10 (all items are in millimeters). The result was that S2 was approximately 9.67 mm.

In one example, one or more processors (e.g., in the control circuitry, processing units 4, 5, or both) may calculate displacement values of S1 and S2, while causing the focal length f to be fixed and causing the control circuitry 136 to cause the variable adjuster driver 237 (see fig. 7A) to send drive signals to cause the variable virtual focus adjuster 135 to move the lens system 122, for example, along the optical path 133. In other embodiments, the microdisplay unit 120 may be moved instead of or in addition to moving the lens system 122. In other embodiments, the focal length of at least one lens in lens system 122 may also be changed, either in addition to or instead of the change in displacement along optical path 133.

FIG. 5B is a side view of a temple that provides support for hardware and software components and three-dimensional adjustment of a microdisplay assembly in another embodiment of a mixed reality display device. Some of the reference numerals shown above in fig. 5A have been removed to avoid clutter in the drawing. In embodiments where display optical system 14 is moved in any of three dimensions, the optical elements represented by reflective surface 124 and other elements (e.g., 120, 122) of microdisplay assembly 173 can also be moved to maintain an optical path 133 of light for the virtual image to the display optical system. In this example, an XYZ gearing mechanism, consisting of one or more motors under the control of a processor 210 (see fig. 7A) of the control circuitry 136, represented by a motor block 203 and a drive shaft 205, controls the movement of the elements of the microdisplay assembly 173. An example of a motor that may be used is a piezoelectric motor. In the example shown, one motor is attached to armature 137 and also moves variable focus adjuster 135, and another representative motor 203 controls the movement of reflective element 124.

Fig. 6A is a top view of an embodiment of an active display optical system 14 of a see-through, near-eye, mixed reality device 2 that includes an arrangement of gaze-detection elements. A portion of the frame 115 of the near-eye display device 2 will enclose the display optical system 14 and provide support for the elements of an embodiment of a microdisplay assembly 173, which microdisplay assembly 173 is shown as including microdisplay 120 and its accompanying elements. To illustrate the various components of display system 14 (in this case right eye system 14 r), the top portion of frame 115 around the display optics is not depicted. Additionally, the microphone 110 in the nose bridge 104 is not shown in this view in order to focus attention on the operation of the display adjustment mechanism 203. As in the example of fig. 4C, in this embodiment, the display optical system 14 is moved by moving the inner frame 117r, which in this example also surrounds the microdisplay assembly 173. In this embodiment, the display adjustment mechanism is implemented as three-axis motors 203 that attach their drive shafts 205 to the inner frame 117r to translate the display optical system 14 in any of three dimensions (as shown by the three (3) axis symbols 144 indicating movement), in this embodiment the display optical system 14 includes a microdisplay assembly 173.

In this embodiment, display optical system 14 has an optical axis 142 and includes a see-through lens 118 that allows a user's actual direct view of the real world. In this example, see-through lens 118 is a standard lens used in eyeglasses and may be made according to any prescription (including not according to the prescription). In another embodiment, see-through lens 118 is replaced with a variable prescription lens. In some embodiments, see-through, near-eye display device 2 will include additional lenses.

Display optical system 14 also includes reflective surfaces 124a and 124 b. In this embodiment, light from microdisplay 120 is directed along optical path 133 via reflective element 124a to partially reflective element 124b embedded in lens 118, and partially reflective element 124b combines the virtual object image view traveling along optical path 133 with a natural or actual direct view along optical axis 142, such that the combined view is directed to the user's eye (the right eye in this example) at a location on the optical axis that has the most collimated light for the clearest view.

The detection area 139r of the light sensor is also part of the display optical system 14 r. The optical element 125 implements a detection region 139r by capturing reflected light from the user's eye received along the optical axis 142 and directs the captured light to a sensor 134r, which in this example is located in the lens 118 within the inner frame 117 r. As shown, this arrangement allows the detection region 139 of the sensor 134r to have its center aligned with the center of the display optical system 14. For example, if the sensor 134r is an image sensor, the sensor 134r captures a detection region 139 such that an image captured at the image sensor is centered on the optical axis because the center of the detection region 139 is the optical axis. In one example, the sensor 134r is a visible light camera or a combination of RGB/IR cameras, and the optical element 125 includes an optical element that reflects visible light reflected from the user's eye, such as a partially reflective mirror surface.

In other embodiments, the sensor 134r is an IR sensitive device such as an IR camera, and the element 125 includes a heat reflective surface that passes visible light through it and reflects IR radiation to the sensor 134 r. The IR camera may capture not only glints, but also infrared or near-infrared images of the user's eyes, including the pupils.

In other embodiments, the IR sensor device 134r is a Position Sensitive Device (PSD), sometimes referred to as an optical position sensor. The location of the detected light on the surface of the sensor is identified. The PSD may be selected to be sensitive to the wavelength range of the flashing IR illuminator or about a predetermined wavelength. When light in the wavelength range of, or about, its predetermined wavelength of the position sensitive device is detected on a sensor or photosensitive component of the device, an electronic signal is generated that identifies the position on the surface of the detector. In some embodiments, the surface of the PSD is divided into discrete sensors (like pixels) from which the position of the light can be determined. In other examples, a PSD isotropic sensor may be used, where changes in local impedance on the surface may be used to identify the location of the light spot on the PSD. Other embodiments of PSDs may also be used. By operating the illuminator 153 in a predetermined sequence, the glint reflection locations on the PSD may be identified and thus correlated with locations on the corneal surface.

The light detecting elements (in this case, reflective elements) 125, 124a, and 124b depicted in fig. 6A-6D are representative of their function. These elements may take any number of forms and may be implemented with one or more optical components in one or more arrangements for directing light to its intended destination (e.g., a camera sensor or a user's eye). As shown, this arrangement allows the detection area 139 of the sensor to have its center aligned with the center of the display optical system 14. The image sensor 134r captures a detection region 139 such that an image captured at the image sensor is centered on the optical axis because the center of the detection region 139 is the optical axis.

As discussed above in fig. 2A and 2B and in the following figures, display optical system 14r is aligned with the pupil in the event that the user is looking straight ahead and the center of the user's pupil is the center of the captured image of the user's eye when detection region 139 or image sensor 134r is actually centered on the optical axis of the display. When both display optics 14 are aligned with their respective pupils, the distance between the optical centers matches or aligns with the user's interpupillary distance. In the example of fig. 6A, the interpupillary distance may be aligned with display optical system 14 in three dimensions.

In one embodiment, if the data captured by the sensor 134 indicates that the pupil is not aligned with the optical axis, the processing unit 4, 5 or the control circuitry 136, or one or more processors in both, use a mapping criterion that relates a distance or length measurement unit to pixels or other discrete units or regions of the image to determine how far the center of the pupil is from the optical axis 142. Based on the determined distance, the one or more processors determine how much distance and in which direction to move display optical system 14r to align optical axis 142 with the pupil. The control signals are applied by one or more display adjustment mechanism drivers 245 to one of the components that make up the one or more display adjustment mechanisms 203, such as the motor 203. In the case of motors in this example, the motors move their drive shafts 205 to move the inner frame 117r in at least one direction as indicated by the control signal. The deformable portions 215a, 215b of the frame 115 are on the temple side of the inner frame 117r, are attached at one end to the inner frame 117r and slide within the slots 217a and 217b inside the temple frame 115 to anchor the inner frame 117 to the frame 115 as the display optics 14 is moved in any of three directions to change width, height, or depth relative to the respective pupil.

In addition to the sensors, the display optical system 14 includes other gaze detection elements. In this embodiment, at least two (2) (but possibly more) Infrared (IR) illumination devices 153 are attached to the frame 117r at the side of the lens 118 that direct a narrow beam of infrared light within a particular wavelength range or about a predetermined wavelength to the user's eye to each generate a corresponding glint on the surface of a corresponding cornea. In other embodiments, the illuminator and any photodiodes may be on the lens, e.g., on the corners or on the sides. In this embodiment, there is an Infrared (IR) photodetector 152 in addition to at least 2 IR illumination devices 153. Each photodetector 152 is sensitive to IR radiation within a particular wavelength range of its corresponding IR illuminator 153 through the lens 118 and is positioned to detect a respective flash of light. As shown in fig. 4A-4C, the illuminator and the photodetector are separated by a barrier 154 so that incident IR light from the illuminator 153 does not interfere with reflected IR light received at the photodetector 152. In the case where the sensor 134 is an IR sensor, the photodetector 152 may not be needed or may be an additional flash data capture source. Using a visible light camera, the photodetector 152 captures light from the flash and generates a flash intensity value.

In fig. 6A to 6D, the positions of gaze detection elements such as the detection region 139 and the illuminator 153 and photodetector 152 are fixed with respect to the optical axis of the display optical system 14. These elements move with the display optics 14 on the inner frame and thus with the optical axis of the display optics 14, but their spatial relationship to the optical axis 142 is unchanged.

Fig. 6B is a top view of another embodiment of an active display optical system of a see-through, near-eye, mixed reality device including an arrangement of gaze-detection elements. In this embodiment, the light sensor 134r may be implemented as a visible light camera (sometimes referred to as an RGB camera), or it may be implemented as an IR camera or a camera capable of processing light in the visible and IR ranges, such as a depth camera. In this example, the image sensor 134r is a detection region 139 r. The image sensor 134r of the camera is vertically located on the optical axis 142 of the display optical system. In some examples, the camera may be located above or below the see-through lens 118 on the frame 115, or embedded in the lens 118. In some embodiments, the illuminator 153 provides light to the camera, while in other embodiments the camera captures images using ambient light or light from its own light source. The captured image data may be used to determine the alignment of the pupil with the optical axis. Based on the geometry of the gaze detection elements, gaze determination techniques based on image data, glint data, or both may be used.

In this example, motor 203 in nose bridge 104 moves display optical system 14r in a horizontal direction relative to the user's eyes, as shown by directional symbol 145. As the system 14 is moved, the deformable frame members 215a and 215b slide within the slots 217a and 217 b. In this example, reflective element 124a of the microdisplay assembly 173 embodiment is fixed. Because the IPD is typically determined only once and stored, any adjustments that may be made to the focal length between microdisplay 120 and reflective element 124a may be implemented by the microdisplay assembly, e.g., via adjustments to the microdisplay elements within armature 137.

Fig. 6C is a top view of a third embodiment of an active display optical system of a see-through, near-eye, mixed reality device including an arrangement of gaze-detection elements. Display optics 14 has a similar arrangement of gaze detection elements, including an IR illuminator 153 and a photodetector 152, and a light sensor 134r located on the frame 115 or on the lens 118, either above or below the optical axis 142. In this example, display optical system 14 includes a light guide optical element 112 that is a reflective element for directing an image to a user's eye and is positioned between an additional see-through lens 116 and a see-through lens 118. The reflective element 124 is within the light guide optical element and moves with the element 112, in this example on the temple 102, an embodiment of the microdisplay assembly 173 is attached to a display adjustment mechanism 203 of the display optical system 14, the display adjustment mechanism 203 being implemented as a set of three-axis motors 203 with a drive shaft 205, including at least one motor for moving the microdisplay assembly. One or more motors 203 on nose bridge 104 represent other components of display adjustment mechanism 203 that provide three-axis movement 145. In another embodiment, the motors may be used to move the device in a horizontal direction only via their attached drive shafts 205. Motor 203 of microdisplay assembly 173 will also move it vertically to maintain alignment between the light exiting microdisplay 120 and reflecting element 124. The processor 210 of the control circuit (see fig. 7A) coordinates their movement.

Light guide optical element 112 transmits light from microdisplay 120 to the eye of a user wearing head mounted display device 2. The light guide optical element 112 also allows light from the front of the head mounted display device 2 to be transmitted through the light guide optical element 112 to the user's eye, allowing the user to have an actual direct view of the space in front of the head mounted display device 2 in addition to receiving a virtual image from the microdisplay 120. Thus, the walls of the light guide optical element 112 are see-through. The light guide optical element 112 includes a first reflective surface 124 (e.g., a mirror or other surface). Light from microdisplay 120 passes through lens 122 and is incident on reflecting surface 124. Reflective surface 124 reflects incident light from microdisplay 120 such that light is trapped by internal reflection within the planar substrate comprising light guide optical element 112.

After a number of reflections at the surface of the substrate, the captured light waves reach the array of selectively reflective surfaces 126. Note that only one of the five surfaces is labeled 126 to prevent the drawings from being too crowded. The reflective surfaces 126 couple light waves exiting the substrate and incident on these reflective surfaces to the user's eye. More details of light-guiding optical elements can be found in U.S. patent application publication No. 2008/0285140, serial No. 12/214,366 "Substrate-guided optical devices," published 11/20 of 2008, which is incorporated by reference in its entirety. In one embodiment, each eye will have its own light guide optical element 112.

Fig. 6D is a top view of a fourth embodiment of an active display optical system of a see-through, near-eye, mixed reality device including an arrangement of gaze-detection elements. This embodiment is similar to the embodiment of fig. 6C, including a light guide optical element 112. However, the light detector has only an IR photodetector 152, so this embodiment relies only on glint detection for gaze detection, as discussed in the examples below.

In the embodiment of fig. 6A-6D, the positions of gaze detection elements, such as the detection region 139 and the illuminator 153 and photodetector 152, are fixed relative to each other. In these examples, they are also fixed relative to the optical axis of the display optical system 14.

In the above embodiments, the specific number of lenses shown is merely an example. Other numbers and configurations of lenses operating according to the same principles may be used. Additionally, in the above example, only the right side of the see-through, near-eye display 2 is shown. By way of example, a complete near-eye, mixed reality display device would include another set of lenses 116 and/or 118, another light-guide optical element 112 for the embodiment of fig. 6C and 6D, another microdisplay 120, another lens system 122, possibly another environment-facing camera 113, another eye-tracking camera 134 for the embodiment of fig. 6A-6C, an earpiece 130, and a temperature sensor 138.

Fig. 7A is a block diagram of one embodiment of the hardware and software components of the see-through, near-eye, mixed reality display unit 2 that may be used with one or more embodiments. Fig. 7B is a block diagram depicting the components of the processing units 4, 5. In this embodiment, the near-eye display device 2 receives instructions on the virtual image from the processing units 4, 5 and provides sensor information to the processing units 4, 5. Software and hardware components that may be implemented in the processing units 4, 5 are depicted in FIG. 7B, which will receive sensor information from the display device 2 and may also receive sensor information from the hub computing device 12 (see FIG. 1A). Based on this information the processing unit 4, 5 will determine where and when to provide the virtual image to the user and send instructions to the control circuitry 136 of the display device 12 accordingly.

Note that some of the components of fig. 7A (e.g., camera 113 facing the physical environment, eye camera 134, variable virtual focus adjuster 135, photodetector interface 139, microdisplay 120, lighting device 153 (i.e., illuminator), headphones 130, temperature sensor 138, display adjustment mechanism 203) are shown shaded to indicate that each of these devices is at least two — at least one on the left and at least one on the right of head mounted display device 2. Fig. 7A shows the control circuit 200 in communication with the power management circuit 202. The control circuit 200 includes a processor 210, a memory controller 212 in communication with a memory 214 (e.g., D-RAM), a camera interface 216, a camera buffer 218, a display driver 220, a display formatter 222, a timing generator 226, a display output interface 228, and a display input interface 230. In one embodiment, all components of control circuit 220 communicate with each other via dedicated lines of one or more buses. In another embodiment, each component of the control circuit 200 is in communication with the processor 210.

The camera interface 216 provides an interface to both the physical environment facing cameras 113 and each eye camera 134 and stores the respective images received from the cameras 113, 134 in a camera buffer 218. Display driver 220 will drive microdisplay 120. Display formatter 222 may provide information about the virtual image displayed on microdisplay 120 to one or more processors of one or more computer systems (e.g., 4, 5, 12, 210) that perform the processing of the augmented real system. A timing generator 226 is used to provide timing data to the system. The display output 228 is a buffer for providing images from the physical environment facing camera 113 and the eye camera 134 to the processing units 4, 5. Display input 230 is a buffer for receiving images, such as virtual images to be displayed on microdisplay 120. The display output 228 and the display input 230 communicate with a band interface 232 that is an interface to the processing units 4, 5.

Power management circuit 202 includes voltage regulator 234, eye tracking illumination driver 236, variable adjuster driver 237, photodetector interface 239, audio DAC and amplifier 238, microphone preamplifier and audio ADC240, temperature sensor interface 242, display adjustment mechanism driver 245, and clock generator 244. The voltage regulator 234 receives power from the processing units 4, 5 through the band interface 232 and provides the power to the other components of the head mounted display device 2. The illumination driver 236 controls the illumination device 153 to operate at about a predetermined wavelength or within a certain wavelength range, for example, via a driving current or voltage. Audio DAC and amplifier 238 receives audio information from headphones 130. Microphone preamplifier and audio ADC240 provide an interface for microphone 110. Temperature sensor interface 242 is an interface for temperature sensor 138. The one or more display adjustment drivers 245 provide control signals to one or more motors or other devices making up each display adjustment mechanism 203 that indicate the amount of movement adjustment in at least one of the three directions. Power management unit 202 also provides power to and receives data back from three axis magnetometer 132A, three axis gyroscope 132B, and three axis accelerometer 132C. The power management unit 202 also provides power to, receives data from, and transmits data to the GPS transceiver 144.

Variable adjuster driver 237 provides control signals, such as drive current or drive voltage, to adjuster 135 to move one or more elements of microdisplay assembly 173 to achieve a displacement of the focal region calculated by software executing in control circuit 13 or processing units 4, 5 or processor 210 of the hub computer 12 or both. In embodiments that sweep through a range of displacements, and thus a range of focal regions, the variable adjuster driver 237 receives timing signals from the timing generator 226, or alternatively from the clock generator 244, to operate at a programmed rate or frequency.

The photodetector interface 239 performs any analog-to-digital conversion required for the voltage or current reading from each photodetector, stores the reading in memory in a processor readable format via the memory controller 212, and monitors the operating parameters of the photodetectors 152, such as temperature and wavelength accuracy.

FIG. 7B is a block diagram of one embodiment of the hardware and software components of the processing unit 4 associated with a see-through, near-eye, mixed reality display unit. The mobile device 5 may include this embodiment of hardware and software components as well as similar components that perform similar functions. Fig. 7B shows control circuitry 304 in communication with power management circuitry 306. Control circuitry 304 includes a Central Processing Unit (CPU) 320, a Graphics Processing Unit (GPU) 322, a cache 324, a RAM326, a memory controller 328 in communication with a memory 330 (e.g., D-RAM), a flash controller 332 in communication with a flash memory 334 (or other type of non-volatile storage), a display output buffer 336 in communication with see-through, near-eye display device 2 via band interface 302 and band interface 232, a display input buffer 338 in communication with near-eye display device 2 via band interface 302 and band interface 232, a microphone interface 340 in communication with an external microphone connector 342 for connecting to a microphone, a pci express interface for connecting to a wireless communication device 346, and a USB port 348.

In one embodiment, the wireless communication component 346 may include a Wi-Fi enabled communication device, a Bluetooth communication device, an infrared communication device, and the like. The USB port may be used to interface the processing units 4, 5 to the hub computing device 12 to load data or software onto the processing units 4, 5 and to charge the processing units 4, 5. In one embodiment, CPU320 and GPU322 are the main load devices used to determine where, when, and how to insert virtual images into a user's field of view.

Power management circuitry 306 includes a clock generator 360, an analog-to-digital converter 362, a battery charger 364, a voltage regulator 366, a see-through, near-eye display power supply 376, and a temperature sensor interface 372 (located on the wrist band of processing unit 4) that communicates with a temperature sensor 374. The AC-to-DC converter 362 is connected to a charging receptacle 370 to receive AC power and generate DC power for the system. The voltage regulator 366 communicates with a battery 368 for providing power to the system. The battery charger 364 is used to charge the battery 368 (via the voltage regulator 366) upon receiving power from the charging receptacle 370. The device power interface 376 provides power to the display device 2.

The above figures provide examples showing the geometry of the elements of the optical system, which provide the basis for the different methods of aligning IPDs discussed in the following figures. These method embodiments may refer to the various elements and structures of the above systems as illustrative contexts; however, these method embodiments may operate in system or structural embodiments that differ from those described above.

The following method embodiments identify or provide one or more focus objects for aligning IPDs. Fig. 8A and 8B discuss some embodiments for determining the position of an object within the field of view of a user wearing the display device.

Fig. 8A is a block diagram of an embodiment of a system for determining a position of an object within a user field of view of a see-through, near-eye, mixed reality display device. This embodiment illustrates how devices leverage a networked computer to map a three-dimensional model of a user's field of view and real and virtual objects within the model. An application 456 executing in a processing unit 4, 5 communicatively coupled to the display device 2 may communicate with the computing system 12 for processing image data to determine and track a three-dimensional user field of view over one or more communication networks 50. Computing system 12 may remotely execute applications 452 for processing units 4, 5 to provide images of one or more virtual objects. As mentioned above, in some embodiments, the software and hardware components of the processing unit are integrated into the display device 2. Either or both of the applications 456 and 452 working together may map a 3D model of the space around the user. The depth image processing application 450 detects objects, identifies objects, and their locations in the model. The application 450 may perform its processing based on depth image data from depth cameras like 20A and 20B, two-dimensional or depth image data from one or more front facing cameras 113, and GPS metadata associated with objects in the image data obtained from the GPS image tracking application 454.

The GPS image tracking application 454 identifies images of the user's location in one or more image databases 470 based on GPS data received from the processing unit 4, 5 or other GPS units identified as being in the vicinity of the user, or both. Additionally, the image database may provide accessible location images with metadata like GPS data and data identifying uploads by users who wish to share their images. The GPS image tracking application provides the distance between objects in the image to the depth image processing application 450 based on the GPS data. Additionally, the application 456 may perform the processing for mapping and locating objects in 3D user space locally and may interact with the GPS image tracking application 454 to receive distances between objects. By leveraging network connectivity, many combinations of sharing processing between applications are possible.

FIG. 8B is a flow diagram of an embodiment of a method for determining a three-dimensional user field of view for a see-through, near-eye, mixed reality display device. At step 510, the one or more processors of control circuitry 136, processing units 4, 5, hub computing system 12, or a combination of these receive image data from one or more front facing cameras 113 and at step 512 identify one or more real objects in the front facing image data. Based on the position of the front facing camera 113 or the front facing camera 113 of each display optical system, the image data from the front facing camera approximates the user field of view. The data from the two cameras 113 may be aligned and account for the offset of the position of the forward facing camera 113 relative to the optical axis of the display. Data from orientation sensors 132 (e.g., three axis accelerometer 132C and three axis magnetometer 132A) may also be used with the front facing camera 113 image data to map the user's surroundings, the user's face and head positions to determine which objects (real or virtual) he or she may be focusing on at the time. Optionally, based on the application being executed, at step 514, the one or more processors identify a location of the virtual object in a user field of view, where the user field of view may be determined to be a field of view captured in the forward-facing image data. At step 516, the three-dimensional position of each object in the user field of view is determined. In other words, where each object is located with respect to the display device 2, for example, with respect to the optical axis 142 of each display optical system 14.

In some examples of identifying one or more real objects in the forward facing image data, the location of the user may be identified via GPS data of a GPS unit, e.g., GPS unit 965 in mobile device 5 or GPS transceiver 144 on display device 2. This location may be communicated from the device 2 or via the processing unit 4, 5 over a network to a computing system 12 having access to an image database 470, which may be accessed based on GPS data. Based on the pattern recognition of the object in the front facing image data and the image of the location, the one or more processors determine a location of the one or more objects in the front facing image data relative to the one or more GPS tracking objects in the location. A location of the user from the one or more real objects is determined based on the one or more relative locations.

In other examples, each front facing camera is a depth camera that provides depth image data or has a depth sensor for providing depth data that can be combined with image data to provide depth image data. One or more processors (e.g., 210) and processing units 4, 5 of the control circuitry identify one or more real objects (including their three-dimensional position in the user field of view) based on depth image data from the front-facing camera. Additionally, the orientation sensor 132 data may also be used to refine which image data currently represents the user field of view. In addition, the remote computer system 12 may also provide additional processing power to other processors to identify objects and map the user field of view based on depth image data from the front facing image data.

In other examples, a user wearing the display device may be in an environment that: in this environment, a computer system with a depth camera (as in the example of hub computing system 12 with depth cameras 20A and 20B in system 10 in FIG. 1A) maps the environment or space in three dimensions and tracks real and virtual objects in the space based on depth image data from its cameras and the executing application. For example, the storage computer system may map a three-dimensional space when a user enters the storage. Depth images from multiple viewpoints, including depth images from one or more display devices in some examples, may be combined by the depth image processing application 450 based on a common coordinate system of the space. Objects in space are detected (e.g., edge detection) and identified by pattern recognition techniques, including face recognition techniques, and with reference to images of things and people from an image database. Such a system may transmit data such as the position of the user in space and the positions of objects around the user, which data may be used by the one or more processors of the device 2 and processing units 4, 5 to detect and identify which objects are in the user's field of view. Further, one or more processors of display device 2 or processing units 4, 5 may send front facing image data and orientation data to computer system 12, computer system 12 performs object detection, identification, and object position tracking within the user's field of view and sends updates to processing units 4, 5.

Fig. 9A is a flow diagram of a method embodiment 400 for aligning a see-through, near-eye, mixed reality display with an IPD. Steps 402 to 406 show more details of an example of step 301 for automatically determining whether a see-through, near-eye, mixed reality display device is aligned with the IPD of the user according to alignment criteria. Steps 407 to 408 show exemplary more detailed steps in step 302 for adjusting the display device to align the device with the user IPD. As discussed in fig. 3C, this adjustment may be performed automatically by the processor or through instructions provided electronically to the user for mechanical adjustment.

At step 402, one or more processors of the see-through, near-eye, mixed reality system, such as processor 210 of the control circuitry, in processing unit 4, mobile device 5, or hub computing system 12, individually or in combination, identify an object in the user field of view at a distance and in a direction for determining the IPD. For a far IPD, the distance is at effectively infinity (e.g., more than 5 feet), and the direction is directly in front with respect to the optical axis of each display optical system. In other words, the distance and direction are such that: when each pupil is aligned with each optical axis, the user is looking straight ahead. At step 403, the one or more processors perform processing to attract the user's focus to the object. In one example, the one or more processors electronically provide instructions to request a user to view the identified real object. In some cases, the user may simply be requested to look straight ahead. Some examples of electronically provided instructions are instructions displayed by image generation unit 120, mobile device 5, or instructions displayed by hub computing system 12 on display 16, or audio instructions through speakers 130 of display device 2. In other examples, the object may have image enhancement applied to it to attract the user's eyes to focus on it. For example, during the observation period, an eye-catching visual effect may be applied to the object. Some examples of such visual effects are highlighting, blinking, and movement.

At step 404, at least one sensor, such as sensor 134r or photodetector 152, or both, in the arrangement of gaze-detection elements of the respective display optical system captures data for each eye during a viewing period for the object. In one example, the captured data may be IR image data and glints reflected from each eye captured by an IR camera. The flash light is generated by the IR illuminator 153. In other examples, the at least one sensor is an IR sensor like a position sensitive detector. The at least one sensor may also be an IR photodetector 152. In some examples, the at least one sensor 134 may be a visible light camera. However, as described above, if images of virtual objects are used in the process of determining IPD alignment, they can be resolved by filtering out reflections of the virtual objects in the user's eyes. If the visible light illuminator generates a flash of light, the user's eye may react to the visible light of the illuminator.

At step 406, the one or more processors determine whether each pupil is aligned with the optical axis of its corresponding display optical system according to alignment criteria based on the captured data and the arrangement of gaze-detection elements. The alignment criterion may be a distance from the optical axis, for example 2 millimeters (mm). If so, the display device 2 has been aligned with each pupil, and thus the IPD, and the one or more processors store the position of each optical axis in the IPD dataset in step 409.

If the alignment criteria are not satisfied, then at step 407, the one or more processors automatically determine one or more adjustment values of the at least one display adjustment mechanism for satisfying the alignment criteria of the at least one display optical system. By "automatically determine" is meant that the one or more processors determine the values without requiring the user to identify the adjusted values through mechanical manipulation. In many embodiments, the current position of the optical axis relative to a fixed point of the support structure is tracked based on stored device configuration data. At step 408, the processor causes an adjustment to at least one respective display optical system based on the one or more adjustment values. In auto-adjustment, the one or more processors control the at least one display adjustment mechanism 203 via the one or more display adjustment mechanism drivers 245 to move the at least one corresponding display optical system based on the one or more adjustment values. In a mechanical adjustment method, a processor electronically provides instructions to a user for applying the one or more adjustment values to at least one display adjustment mechanism via a mechanical controller. The instructions may provide a specific number of user activations that are calibrated to a predetermined distance to avoid guessing at the user. Also in this example, the user avoids guessing how much to activate the mechanical control while providing physical force to move the at least one display optical system instead of the motor requiring power. The steps of the method embodiment may be repeated a predetermined number of times or until the alignment criteria are met.

Fig. 9B is a flowchart of a method embodiment 410 of an implementation example of aligning a see-through, near-eye, mixed reality display device with an IPD of a user based on image data of a pupil of each eye of an image format. The image format has a predetermined size and shape, which may be set by the image sensor size and shape, for example. An example of an image format is an image frame. The format is to provide a coordinate system, e.g. a center as an origin, for tracking a position within the image data. The image data in this image format is centered on the optical axis 142 of the display optical system 14 when the detection area 139 of an image sensor, such as an IR camera (or a visible light camera if necessary), is centered on the optical axis 142. How far the pupil center is from the image center is the basis for determining whether the pupil is satisfactorily aligned with the optical axis. As in the example of fig. 4C, the image sensor 134 may be on the movable support 117 so as to be aligned along an axis passing through the optical axis 142. In processing the image data, the one or more processors take into account the offset vector of the image sensor 134 from the optical axis to determine whether the pupil is aligned with the optical axis.

At step 412, a real object at a distance and in a direction in the user's field of view is identified to determine the IPD, and at step 413, one or more processors perform processing to attract the user's focus to the real object. At step 414, image data for each eye is captured in image format by at least one sensor aligned with the optical axis of the respective display optical system during a viewing period for the real object. At step 415, a respective pupil position relative to a respective optical axis is determined from the image data. The pupil region in the image data may be identified by thresholding the intensity values. An ellipse fitting algorithm may be applied to approximate the size and shape of the pupil, and the center of the resulting ellipse may be selected as the center of the pupil. Ideally, the center of the pupil is aligned with the optical axis of the display optical system. Fig. 17, discussed below, provides an embodiment of a method for determining the pupil center from image data that may also be used to implement step 415. At step 416, the one or more processors determine whether each pupil is aligned with a respective optical axis according to an alignment criterion based on the pupil positions of the image format (e.g., image frame). In the case where detection region 139 is centered on optical axis 142, the one or more processors determine whether the pupil position is the center of the image format (e.g., the center of the image frame) according to an alignment criterion. The pupil position of each eye relative to the optical axis can be determined in both the horizontal and vertical directions.

If the alignment criteria are met, the one or more processors store the position of each optical axis in the IPD data set in step 409. If not, at step 417, the one or more processors determine at least one adjustment value for the respective display adjustment mechanism based on the mapping criteria for the at least one sensor of each display optical system that does not meet the alignment criteria. At step 418, the one or more processors control the respective display adjustment mechanism to move the respective display optical system based on the at least one adjustment value. The steps of the method embodiment may be repeated a predetermined number of times or until the alignment criteria are met.

Also, as shown in some of the above figures, the detection region of the camera can be not centered on the optical axis (e.g., 142), but aligned with it. For example, in fig. 4C, 6B, and 6C, the camera image sensor 134 is aligned perpendicular to the optical axis 142 because it is located above or below the optical axis 142 (e.g., on the frame 115).

FIG. 9C is a flow diagram of an embodiment of a method for determining at least one adjustment value for a display adjustment mechanism based on mapping criteria for at least one sensor of a display optical system that does not meet alignment criteria that may be used to implement step 417. At step 442, the one or more processors determine a horizontal pupil position difference vector based on the mapping criteria of the at least one sensor. A pixel-to-distance mapping criterion may be used in each direction for which adjustments are provided. Depending on the shape of the detection area of the image sensor, the mapping criterion may be different for vertical and horizontal. At step 444, a vertical pupil position difference vector is also determined based on the mapping criteria of the at least one sensor. At step 446, the one or more processors correlate the horizontal pupil position difference vector with a horizontal adjustment value and, at step 448, correlate the vertical pupil position difference vector with a vertical adjustment value.

Because the horizontal IPD may have a range between 25-30mm, the display adjustment mechanism typically has a distance range limitation to move the display optical system in any direction. Depth adjustment may help bring out-of-range adjustment values in the horizontal or vertical direction into range. Optional steps 451 and 453 can be performed. At optional step 451, the one or more processors determine whether any of the horizontal or vertical adjustment values are out of range. If not, alignment of the display optics may be achieved by movement in a two-dimensional plane, and step 418 may be performed. If at least one adjustment value is outside of range, then in optional step 453, the one or more processors determine a depth adjustment value for bringing any horizontal or vertical adjustment values outside of range closer to or within the range limit, and may perform step 418 to adjust the display optical system.

As an illustrative example, if the optical axis is 12mm from the right and the display adjustment mechanism can only move the display optical system 6mm to the left, by increasing the depth between the display optical system and the pupil, the angle between the position from the pupil to the optical axis when looking straight ahead decreases, so the increase in depth in combination with the 6mm adjustment to the left brings the optical axis closer to being aligned with the pupil according to the alignment criterion. The effect of depth variation on the vertical dimension may also be taken into account, so that vertical adjustment is also necessary or the depth adjustment value is modified

The embodiments of fig. 9B and 9C may also be applied to glint data from each eye when the glints have a geometric relationship with each other and the sensors have surfaces that are discrete sensors (e.g., pixels). For example, a glint of an eye generated by an illuminator forms a frame or other geometric shape aligned with the optical axis of the eye's corresponding display optical system by the position of the illuminator. If the sensor is a Position Sensitive Detector (PSD) for detecting glints, the position on the sensor and the detected intensity values of the glints generated from the fixed illuminator are used to map the position of the pupil. Image data from an IR camera (or even a visible light camera) provides greater accuracy for pupil position determination, but the flash data method processes less data and is therefore less computationally intensive.

Fig. 9D is a flowchart of a method embodiment 420 of an implementation example for aligning a see-through, near-eye, mixed reality display with an IPD based on gaze data. Steps 412 and 413 are performed as discussed above in fig. 9B. At step 423, the one or more processors determine a reference gaze vector for each eye to the real object passing through an optical axis of the display optical system based on the arrangement of gaze detection elements of the respective display optical system. Embodiments of the gaze determination method are discussed in fig. 12 to 19. Embodiments of an arrangement or system of gaze detection elements in which the methods may operate are shown in fig. 4A-4C and 6A-6D. As discussed with reference to the embodiment of fig. 8A-8B, the location of real objects in the user's field of view is tracked. In the case of far IPD, the estimation is based on the pupil position that the user is looking right in front, and the reference gaze vector is estimated by modeling the ray that passes through the optical axis from the estimated pupil position to the real object.

At step 414, during the viewing period for the real object, the at least one sensor of the schedule captures data for each eye, and at step 425, the one or more processors determine a current gaze vector for each eye based on the captured data and the schedule. At step 426, the one or more processors determine whether the current gaze vector matches the reference gaze vector based on the alignment criteria. If so, the display device 2 has been aligned with each pupil, and thus the IPD, and the one or more processors store the position of each optical axis in the IPD dataset in step 409.

If at least one of the current gaze vectors does not satisfy the alignment criteria, the one or more processors automatically determine one or more adjustment values for at least one display adjustment mechanism of each display optical system that does not satisfy the alignment criteria based on the difference between the current and reference gaze vectors at step 427. The difference between the current and reference gaze vectors may be represented as a three-dimensional position difference vector, and at least one of horizontal, vertical, and depth adjustment values may be determined to bring the three-dimensional position difference vector within an alignment criterion, such as a position difference tolerance in one or more directions.

At step 428, the one or more processors cause the at least one display adjustment mechanism to adjust at least one respective display optical system based on the one or more adjustment values.

The method embodiment of fig. 9D may be performed using various methods for determining a gaze vector. For example, the gaze determination method embodiment of fig. 19 may be used. In addition, the gaze determination methods of fig. 12-18 may be used to determine a gaze vector from an inner eye portion to an object based on image data and glint data. In the method, the determined initial vector models the optical axis of the eye. However, as mentioned above, the human gaze vector is the visual axis or line of sight from the fovea through the center of the pupil. Photoreceptors in the foveal region of the human retina are more densely packed than those in the rest of the retina. This region provides the highest visual acuity or visual clarity and also provides stereo vision of neighboring objects. After determining the optical axis, a default gaze offset angle may be applied such that the optical axis approximates the visual axis and is selected as the gaze vector. In some cases, the alignment of the pupil with the optical axis of the display optical system may be determined based on the determined amount of the optical axis passing from the center of rotation of the eyeball through the determined cornea and the pupil center without correcting the visual axis. However, in other examples, the correction is applied to more accurately approximate the gaze vector originating from the retina.

Fig. 9E is a flowchart of a method embodiment 430 of an implementation example of the method 420 in fig. 9D applying a gaze offset angle. In this example, the uncorrected current and reference gaze vectors are used to roughly align the pupils with their respective optical axes. Subsequently, the gaze offset angle is calibrated for the user and the alignment check is performed again with the gaze offset angle applied to the vector for a more finely adjusted or more precise alignment with the respective optical axis. As discussed further below with reference to fig. 18, calibration of the gaze offset angle is performed by displaying one or more images of the virtual object at different distances in the user field of view and determining a gaze offset vector based on a distance vector between the initial optical axis vector and a location of the one or more images in the user field of view. When the IPD is properly aligned, the virtual object image will appear clearer to the user.

In step 411, the gaze offset angle is set to an initial value. Steps 412 and 413 are performed as discussed above in fig. 9B. At step 431, the one or more processors determine a reference gaze vector to the real object through the optical axis of the display optical system based on the arrangement of gaze detection elements, as at step 423, but in this step the reference gaze vector comprises a gaze offset angle. Initially, if the gaze offset angle is zero, the reference gaze vector is a vector extending from the optical axis of the eye. At step 414, data for each eye is captured by the scheduled at least one sensor during the observation period for the real object. At step 433, a current gaze vector is determined, as at step 423, but in this step the current gaze vector includes a gaze offset angle. Step 426 is performed as in fig. 9D. If the alignment determination for the optical axis of at least one of the display optical systems fails, steps 427 and 428 are performed and the process starting at step 426 is repeated.

If, at step 426, it is determined from the alignment criteria that the current gaze vector matches the reference gaze vector, then, at step 436, the one or more processors determine whether the gaze offset angle has been calibrated. For example, the initial value may serve as a flag indicating that calibration has not been completed or otherwise stored in a memory of the display device that may indicate that calibration has been performed. If calibration has not been performed, the one or more processors cause the gaze offset angle to be calibrated at step 437, and the process repeats from step 412. However, from now on, the reference and gaze vectors more closely approximate the visual axis of the line of sight from the user's eyes. If, at step 426, the alignment determination indicates a satisfactory alignment, and the gaze offset angle has now been calibrated as determined at step 436, the position of each optical axis is stored in the IPD dataset.

Fig. 9F is a flow diagram of a method embodiment to align a see-through, near-eye, mixed reality display with an IPD based on gaze data relative to an image of a virtual object. In this example, the user's view of the virtual object may not start very clear because the IPD is misaligned. However, the one or more processors have more control over the virtual object than the real object, and thus have more room in placing the virtual object in the user field of view to determine the IPD. By moving the virtual stereoscopic images in each display optical system together or separately, the gaze pattern indicates where in the field of view of each user's eyes the object is not being tracked. From locations in the user field of view where the object is not tracked, the one or more processors may determine how to adjust each display optical system to better align with its respective pupil.

At step 462, the one or more processors cause an image generation unit (e.g., microdisplay 120) to display a stereoscopic image of the virtual object at a distance and in a direction in the user's field of view to determine the IPD by projecting a separate image in each display optical system. The two separate images constitute the stereoscopic image. Then, during the viewing time period, the one or more processors cause the image generation unit 120 to move at least one of the individual images in the user field of view of at least one of the display optical systems to one or more positions that are expected to be viewable with each pupil aligned with its respective optical axis, at step 463. At step 464, the one or more processors cause the at least one sensor of the arrangement of gaze detection elements of the respective display optical system to capture data for each eye during the observation period of step 464.

At step 465, the one or more processors determine a gaze pattern of each eye during the viewing period based on the captured data and the arrangement of gaze detection elements of each display optical system. The gaze pattern is a set of gaze vectors determined for each location of the virtual object image in the user field of view during the viewing period. In other words, the gaze pattern reflects gaze changes during the observation period. At step 466, the one or more processors determine whether the gaze pattern indicates that the optical axis is aligned with the respective pupil according to the alignment criteria.

As part of the determination of step 466, the one or more processors determine whether each gaze vector calculated during the period of time that the virtual object is at a location in the user field of view intersects the virtual object at the location.

If the alignment criteria are met, the one or more processors store the position of each optical axis in the IPD data set in step 409. If the alignment criteria are not satisfied, the one or more processors automatically determine one or more adjustment values for at least one display adjustment mechanism for each display optical system that does not satisfy the alignment criteria based on the gaze pattern at step 467 and cause the display adjustment mechanism to automatically adjust the respective display optical system to satisfy the alignment criteria at step 468.

One or more adjustment values may be determined based on a distance vector between each gaze vector that does not intersect a virtual object and the location of the virtual object at the time period of expected intersection.

Method embodiments, such as the method embodiments described in fig. 9D and 9F, may be used when determining gaze using glint data. In one embodiment, instead of processing a much larger eye image data set, glint reflections may estimate gaze based on several intensity value data points detected for the glints. The position of the illuminator 153 on the eyeglasses frame 115 or other support structure of the near-eye display device may be fixed such that the flash position detected by the one or more sensors is fixed in the sensor detection area. The cornea and thus the iris and the pupil rotate together with the eyeball about a fixed center. As the user's gaze changes, the iris, pupil, and sclera (sometimes referred to as the white portion of the eye) move under the glints. Thus, glints detected at the same sensor location may result in different intensity values due to different reflectivities associated with different eye portions. Since the pupil is an aperture with tissue that absorbs most of the incident light, its intensity value will be very low or near zero, while the intensity value of the iris will be a higher intensity value due to its higher reflectivity. Since the sclera has the highest reflectivity, the intensity value of the sclera may be the highest.

In some examples, the illuminators may be located on either side of the display optical system 14 as in fig. 6A to 6D, and thus on either side of the pupil of the user's eye. In other embodiments, additional illuminators may be located on the frame 115 or lens 118, for example, four illuminators may be placed to generate a surrounding geometry (e.g., a box) of glints on the eyeball that would be approximately centered on the pupil when the user looks straight ahead. The microdisplay assembly 173 can display a virtual image or send a message, such as a visual virtual image or audio instructions, to the user so that the user looks straight ahead to initiate a flash on or near the pupil. In other embodiments, glint-based gaze detection is based on intensity values generated from luminaires where glints are located independently of being centered on the pupil.

Fig. 10A is a flow diagram of an embodiment of a method for realigning a see-through, near-eye, mixed reality display device with an interpupillary distance (IPD). At step 741, the processing unit 4, 5 detects a change indicating that the alignment with the selected IPD no longer satisfies the alignment criteria, which triggers the one or more processors to readjust at least one of the display optical systems to satisfy the alignment criteria at step 743. Also, the alignment criterion may be a distance of a few millimeters, e.g. 3 mm. A gaze determination method that is continuously done to track the user's focus may detect this change.

Fig. 10B is a flow diagram illustrating an embodiment of a method for selecting an IPD from either a near IPD or a far IPD based on gaze data. In step 752, the processing unit 4, 5 determines the distance of the gaze point based on the gaze data and selects in step 754 the near IPD or the far IPD as the IPD based on the distance of the gaze point. In one example, the user's point of regard is initially determined to be about seven feet in front of the user. In this example, the display device uses two feet as the gaze point distance that triggers a change between the near and far IPDs. The focus of the user changes and the gaze point determined by the gaze determination method indicates that the gaze point is within a two foot threshold for adjusting the IPD to a far or regular IPD selected from the beginning to a near IPD. The processing units 4, 5 monitor the point of regard and check the distance to detect this change for readjustment between IPDs.

Another type of detected change that may trigger readjustment of the display optical system is movement of the display optical system relative to the eye. The head movement may cause the display device to shift on the user's face.

Fig. 11 is a flow diagram illustrating an embodiment of a method for determining whether a change is detected that indicates that alignment with a selected IPD no longer satisfies alignment criteria. At step 742, the processing unit 4, 5 periodically determines whether the near-eye display device is moving relative to the respective eye according to a criterion. If the result indicates that no movement has occurred based on the criteria, step 744, the processing unit 4, 5 performs further processing at step 746 until the next scheduled movement check. If movement does occur based on the criteria, a determination is made at step 748 whether the pupil alignment still satisfies the alignment criteria. If so, the processing unit 4, 5 performs further processing at step 746 until the next scheduled mobile check. If the pupil alignment no longer meets the alignment criteria, an optional step 750 may be performed in which the processing unit 4, 5 determines which IPD data set (near or far) applies based on the current gaze point. In step 752, the processing unit 4, 5 adjusts any respective display optical system according to the applicable IPD data set to meet the alignment criterion.

Based on the different geometries of the gaze detection elements described above, movement may be detected during different gaze determination method embodiments. The processing units 4, 5 may monitor the gaze result to determine whether the readjustment of the pupil alignment is complete. Also, in embodiments that provide both near and far IPD alignment, the distance to the point of regard may be monitored to trigger a switch between near and far IPD alignment.

Fig. 12 is a flow diagram of an embodiment of a method for determining gaze in a see-through, near-eye, mixed reality display system and provides an overall view of how a near-eye display device can leverage the geometry of its optical components to determine gaze and depth changes between the eye and the display optical system. One or more processors of the mixed reality system, such as processor 210 of the control circuitry, in processing unit 4, mobile device 5, or hub computing system 12, alone or in combination, determine the boundaries of the gaze-detection coordinate system at step 602. At step 604, a gaze vector for each eye is determined based on the reflected eye data including glints, and at step 606, a point of gaze, e.g., what the user is viewing, in a three-dimensional (3D) user field of view for both eyes is determined. Because the location and identity of the objects in the user field of view are tracked, for example, by embodiments like those in fig. 8A-8B, any object at the point of gaze in the 3D user field of view is identified at step 608. In many embodiments, the three-dimensional user field of view includes an actual direct view of the displayed virtual objects and real objects. The term "object" includes a person.

The method embodiment in fig. 12 and other method embodiments discussed below that use glint data for other ways of detecting gaze may identify such glints from image data of the eye. When an IR illuminator is used, an IR image sensor is also generally used. The following method also works with discrete surface Position Sensitive Detectors (PSDs), such as PSDs with pixels. FIG. 13 is a flow diagram of an embodiment of a method for identifying flashes in image data. As mentioned above, glints are very small and often very bright reflections of light from a light source on a specular reflective surface such as the cornea of an eye. In the following method embodiments, each of the steps is performed on a set of data samples. In some examples, this may include data from one image or image frame, and in other examples, the set of data samples may be multiple images or image frames. At step 605, the processor identifies each connected set of pixels that have their own intensity value within a predetermined intensity range, e.g., the intensity value range may begin at 220 and end at the brightest pixel value of 255. In step 607, candidate glints are pruned by identifying each connected set of pixels that satisfy the glint geometry criteria as candidate glints. Examples of flash geometry criteria are size and shape. Some may be too large, too small, or have an excessively irregular shape. In addition, the illuminators are positioned so that the resulting flashes have a spatial or geometric relationship to each other. For example, the illuminator 153 is arranged so that the flash forms a rectangle. In embodiments where the pupil center discussed in fig. 14 is also determined from the image data, the spatial relationship with the pupil may also be a criterion, e.g., too far away from the pupil may indicate that the connected set is not a candidate glint.

In step 609, the one or more processors determine whether the candidate flashes are less than a predetermined number. For example, for four luminaires, four flashes are expected, but the predetermined number may be two. In the example of a rectangle as the geometric relationship, two flashes forming a horizontal line or a diagonal line of a predetermined length may be selected as candidates. For other glints, there may be eyelid or eyelash obscuration. If there are less than the predetermined number of flashes, the set of data samples is discarded for further processing and processing returns to step 605 at step 611 to process the next set of data samples. If the candidate is not less than the predetermined number, step 613 determines if the candidate flash is more than the predetermined number. If there are more flashes, the one or more processors select a predetermined number of candidates as flashes that most closely fit a predetermined geometric relationship between the flashes at step 615. For example, for rectangles, those candidates that most closely form a rectangle of the predetermined size and shape. If there are no more candidates than this number, the number of candidates matches the predetermined number of flashes, and these candidates are selected as flashes at step 617.

Due to the geometry of the placement of the illuminator used to generate the glints discussed above, the glints appear in the same location unless the frame 115 is moved relative to the eye. Furthermore, since the positioning of the illuminators relative to each other on the support structure of the frame 115 or lens 118 is fixed, the spatial relationship of the flashes to each other in the image is also fixed. As for size, since the flash is very small, the number of pixels making up the flash area on the sensor and in the sensed image will be correspondingly small. For example, if the image sensor of the camera has 1000 pixels, each flash may occupy less than ten pixels. Flashes in each image frame taken at a rate of, for example, 30 or 60 frames per second may be monitored, and an area may be identified as a flash from a certain number of frame samples. Flash data may not be present in every frame. The sampling accommodates or smoothes out obstructions to the glint and pupil data in the different image frames, such as due to factors like the eyelids or eyelashes that cover the glints and/or pupils. Image frames are an example of an image format.

Fig. 14 is a flow diagram of a method embodiment that may be used to implement step 602 of determining the boundaries of a gaze-detection coordinate system. At step 612, the one or more processors determine the location of the corneal center 164 of each eye relative to the illuminator 153 and the at least one light sensor (e.g., 134 or 152) based on the glints. Based on the image data provided by the at least one sensor, the one or more processors determine the pupil center of each eye at step 614. At step 616, the location of the center of rotation of the eye is determined relative to the cornea and pupil center, which may be considered fixed. For example, based on the pupil center, the light rays may extend back through the determined corneal center 164 to the fixed center of eyeball rotation 166. In addition, the distance or length approximation is used to approximate the length on the optical axis between the pupil and the cornea (e.g., about 3 mm) and the length on the optical axis between the center of curvature of the cornea and the center of rotation of the eyeball (about 6 mm). These values are determined from demographic studies on human ocular parameters, as gathered by gullsland (see Hennessey, page 88).

Optionally, at step 618, the one or more processors determine a position of the fixed center of eyeball rotation relative to the illuminator and the at least one sensor of the respective eye. This position determined at step 618 provides the depth distance between a fixed point (or a point that may be approximately fixed to accurately account for gaze detection) and the display optics. In practice, a depth axis is defined for the depth detection coordinate system. The detected change along the depth axis may be used to indicate that the near-eye display system has moved and trigger a check of the alignment of each optical axis with its respective pupil to see if the alignment criteria are still met. If not, an automatic readjustment is performed according to step 752. Fig. 9A to 9D provide examples of how readjustment may be performed.

Figure 15 illustrates an embodiment of a method for determining the location of the corneal center in a coordinate system using an optical element of a see-through, near-eye, mixed reality display. At step 622, the one or more processors generate a first plane including points including a location of a first illuminator used to generate the first flash, a location of a pupil center of the at least one image sensor, such as a camera entrance pupil center, and a location of the first flash. As in the embodiment of fig. 3A, the pupil center of the camera may be located relative to the detection region 139, the detection region 139 acting as an image sensor that directs light it receives into another location. In other examples, like fig. 3B and 3C, the detection region 139 itself can be an image sensor, which is an image plane. The first plane will also include the location of the center of the cornea. Similarly, at step 624, the one or more processors generate a second plane including points including a location of a second illuminator used to generate a second flash, a location of a same pupil center of the at least one sensor, and a location of the second flash. The two planes share the same camera pupil center as the origin, and the distance vector to each illuminator is fixed relative to the camera pupil center because the image sensor and illuminator are positioned at predetermined locations on the near-eye display device. These predetermined positions allow points in the plane to be related to each other in a third coordinate system that includes the two illuminators, the position of the camera pupil center, and the corneal center of curvature. At step 626, the processor determines the location of the center of curvature of the cornea based on the intersection of the first and second planes.

Fig. 16 provides an illustrative example of the geometry of a gaze-detection coordinate system 500 that may be used by the embodiment of fig. 15 to find the center of the cornea. In this embodiment, the at least one sensor is a camera modeled as a pinhole camera. The geometry depicted is a slightly modified version of fig. 3 on page 89 (Hennessey et al, "single camera eye-gazetracking system with freehead motion)", ETRA2006, san diego, california, ACM, page 88, pages 87-94 (hereinafter Hennessey), which is incorporated herein by reference. The following provides a list of variables:

is the position of the illuminator i whose light produces a flash of light(e.g. 174) of the first,

is a glint produced on the corneal surface by illuminator i (153),

is the camera pupil center of the pinhole camera model,

is a flash of lightAn image on an image plane, which is the detection area 139 of the camera sensor,

the length i being from the pointToThe scalar distance or length of (a) or (b),

is from the center of the pupil of the cameraTo flash lightImage on image sensorThe vector of (a) is determined,

is from the center of the pupil of the cameraTo the location of the luminaire iThe vector of (a) is determined,

in the case of the present example,shaft edgeDefinition of

And in a coordinate systemThe axis causes a flash on the connected image plane 139 (detection area)Image of (2)Is/are as followsIs located at the position ofAndin the plane formed by the shaft.

Is thatRepresentation formed in a plane from the illuminator (153) positionGlints onto the corneal surface(174) Is measured by the angle between the lines 502 of the incident light rays.

Is in a planeThe representation formed from flashTo the camera pupil centerThe camera pupil center is also the origin of the coordinate system.

Is the central position of cornea, which is also positionedIn a plane.

Because the cornea is modeled as a sphere, r is the radius of the corneal sphere, and each glintIs a point on the first (i.e., outer) surface of the sphere so that each glint is located a radius r from the center of the cornea. In the above example, the flashModeled as a point on the outer surface (i.e., the first surface) of the cornea. In such a model, the illuminator's light is scattered off the cornea in the same medium (air) of the same refractive index as the reflected light of the glint directed back to the camera sensor.

As shown in fig. 16, with glints on the surface of the corneaA perpendicular line or ray 506 may extend from the glint in the direction of the cornea, and also to the plane of the coordinate systemIs/are as followsThe axes intersect. As also shown in FIG. 16, incident ray 502 and reflected ray 504 are configured with an illuminatorWith the center of the pupil of the cameraRight triangle of line length i in between. Thus, angle A and angle D each consist ofIs shown in whichAnd is

According to Hennessey, center of corneaMay be based on unknown parameters in the coordinate system 500Is defined, get for 4 unknownsThe following three equations:

including the center of the corneaAnother flashCamera pupil center of cameraAnd location of another luminaireIs also formed. Camera pupil center of cameraAnd the cornea center is the same in each plane, but the camera pupil centerThe location is known. This will result in a signal with 8 unknowns6 equations of (1). In Hennessey, the gaze detection coordinate system is taken as an auxiliary coordinate system, the rotation matrixThe points of each plane may be translated between the secondary coordinate system and a single world coordinate system, such as a third coordinate system that correlates the position of the detection region 139 with the illuminator 153. The following constraints exist: in which the corneal center defined for each glint is the same in the world coordinate system, e.g.And 3 equations for different axis components, e.g.Andthus providing 9 equations with 8 unknowns. Hennessey (page 90) states that the solution is solved using a gradient descent algorithmThe numerical value of (c). Thus, the location center 164 of the cornea 168 is defined relative to the location of the illuminator and the image plane or detection region 139.

Figure 17 illustrates an embodiment of a method for determining the pupil center from sensor generated image data. At step 642, the one or more processors identify black pupillary regions in a plurality of image data samples of the respective eye, and at step 644, average the black pupillary regions in the plurality of image data samples to adjust for panning head. An assumption can be made that the pupil is circular and elliptical when viewed from an angle. One axis (major axis) of the ellipse remains unchanged because it represents a pupil diameter that does not change, assuming the illumination does not change because the pupil size changes as the illumination changes.

When the pupil is looking straight ahead through the display, the pupil appears circular in image format, such as an image frame of a camera whose detection area is centered on the optical axis of the display. As the pupil changes its gaze and moves away from the center of the image frame, the pupil appears elliptical because from one angle the circle appears elliptical. The width of the minor axis of the ellipse changes as the gaze changes. The narrow ellipse to the left of the center of the image frame indicates that the user is looking far to the right. A wider ellipse at a smaller distance to the right of the center of the image frame indicates that the user is looking to the left but not far to the left.

The center of the pupil is the center of the ellipse. An ellipse is fitted from the detected edge points in the image. Because such edge points are noisy and not all of them are on an ellipse, the ellipse fitting process is repeated multiple times on a randomly selected subset of all edge points. The subset that is most consistent with all edge points is used to obtain the final ellipse. At step 646, the processor performs an ellipse fitting algorithm on the average black pupil region to determine an ellipse representative of the pupil, and at step 648, determines the pupil center by determining the center of the ellipse representative of the pupil.

Where the center of rotation, the center of the cornea, and the center of the pupil are identified, the optical axis of the eye can be obtained by extending rays from the center of rotation through the cornea and the center of the pupil. However, as mentioned above, the human gaze vector is the visual axis or line of sight from the fovea through the center of the pupil. Photoreceptors in the foveal region of the human retina are more densely packed than those in the rest of the retina. This region provides the highest visual acuity or visual clarity and also provides stereo vision of neighboring objects. After determining the optical axis, a default gaze offset angle may be applied such that the optical axis approximates the visual axis and is selected as the gaze vector.

Fig. 18 shows an embodiment of a method for determining a gaze vector based on the determined pupil center, corneal center, and eyeball center of rotation, and which may be used to implement step 604. At step 652, the one or more processors model the optical axis 178 of the eye as a ray extending from the fixed center of eyeball rotation through the determined corneal center and pupil center, and at step 654, apply a correction to the modeled optical axis to estimate the visual axis. At step 656, the one or more processors extend the estimated visual axis from the pupil through display optics of the see-through, near-eye display into the user's field of view.

In one embodiment, using the fixed positioning of the illuminator as a basis, the effect of different regions of the eye on the reflectivity (and thus on the amount or intensity of reflected light) is used as a basis for gaze detection. Intensity data from IR or visible light sensors can be used to determine gaze, so reflectivity data can be based on IR-based reflectivity or visible light reflectivity. To illustrate, the sclera is more reflective than other regions of the eye (e.g., the pupil and iris). The illuminator 153, located on the frame 115, at a distance from the user's right side, causes a flash reflection on the right sclera of the user's right eye if the user looks farther to the user's left side. The PSD134r or as shown in fig. 6B, the photodetector 152 on the inner right frame near the nasal bridge 104 receives more of the reflected light represented in a data reading, while light from the reflection at other photodetectors 152 or other locations on the PSD receives less of the amount of reflected light in the range associated with the black pupil when the illuminator 153 closest to the nasal bridge is turned on. The reflectivity of the iris may also be captured by the camera 134 and stored for the user by the processor 210, the processing unit 4, or the mobile device 5 including the processing unit 4.

The accuracy may not be as high as that of an image based on the entire eye, but is sufficient for many applications. Additionally, such gaze detection may be used to assist or backup gaze detection techniques. For example, such flash-based techniques mitigate some processor overhead during computationally intensive time periods when complex virtual images are generated. Furthermore, such flash-based techniques may be performed more times over a period of time than image-based techniques, which process more data or are computationally intensive but more accurate techniques that can be run at a lower rate to periodically recalibrate the accuracy of gaze detection. An example of a gaze detection technique that is both image-based and more computationally intensive is a technique for determining a gaze vector relative to an interior portion of the eye based on glint data and pupil image data, such as the embodiments described in fig. 12-18, which can be run at a lower rate to periodically recalibrate the accuracy of gaze detection. For example, embodiments of more computationally intensive techniques based in part on image data may operate at a rate of ten (10) times per second, while glint-based gaze detection techniques may operate at a faster rate of one hundred (100) times per second, or even five hundred (500) times per second in some cases.

Fig. 19 is a flow diagram illustrating an embodiment of a method for determining gaze based on glint data. At step 673, data representing each flash intensity value is captured. Based on the specular reflectivity of the different eye portions and the location of the illuminator, an eyeball portion of each glint location that has a geometric relationship to the glint is identified based on the detected intensity values in step 674. At step 675, a gaze angle is estimated based on the eye portion associated with each of the flash locations. As described in the previous examples, the eyeball portion may be the iris, pupil, or sclera of the eyeball. The position of the illuminator forms the geometry of the flash, e.g. a frame, circle, rectangle, etc. that frames or encloses the pupil at least on both sides. A gaze vector is determined based on the gaze angle at step 676, and a gaze point in the 3D user field of view is determined based on the intersection of the determined gaze vectors for the two eyes at step 677.

As described above, different methods with different accuracies can be used at different cycle rates in exchange for accuracy in speed. Method embodiments based on flash intensity values, such as the method embodiment described in fig. 19, are examples of techniques with low computational intensity that may be used.

Other tests for movement may be performed based on facial features having fixed characteristics in the image data. In one embodiment, the eye camera may capture an area of about 5 to 10mm around the visible eyeball portion of the corneal prominence, sclera, iris, and pupil in order to capture portions of the eyelids and eyelashes. Facial features that are fixed in position, such as moles or spots on the skin, such as eyelids, or on the bottom edge of the skin encompassing the lower eye, may also be presented in the image data of the eye. In the image sample, the location of the nevus or plaque can be monitored for changes in location. If a facial feature moves up, down, right, or left, a vertical or horizontal shift may be detected. If the facial features appear larger or smaller, a depth change in the spatial relationship between the eyes and the display device 2 may be determined. Due to things like camera resolution, there may be a range of criteria for the position change that triggers the recalibration of the training images.

In another example, although illumination is a factor that changes the pupil size and the ratio of the pupil area to the iris periphery or visible iris area within the periphery, the size of the iris periphery or periphery does not change with gaze changes or illumination changes, and thus, the periphery or periphery is a fixed characteristic of the iris as a facial feature. By fitting the iris with an ellipse, the processor 210 or the processor of the processing unit 4, 5 of the display device 2 may determine whether the iris becomes larger or smaller in the image data according to a criterion. If larger, the display device 2 with its illuminator 153 and at least one sensor 134 has moved in depth closer to the user's eyes; if it gets smaller, the display device 2 has moved further away. A change in the fixed characteristic may trigger an IPD alignment check.

In addition to depth variation, vertical and horizontal variations in pupil alignment may also be determined by periodic checks that display a virtual object at a predetermined distance for the user to see when looking straight ahead and see if the pupil is centered on the optical axis as centered in the image data or at a predetermined glint position. Vertical and horizontal changes may also trigger readjustment. As shown in the above examples, in some embodiments, the display adjustment mechanism provides movement in any of three dimensions.

FIG. 20 is a block diagram of an exemplary mobile device that may operate in embodiments of the present technology. Exemplary electronic circuitry of a typical mobile telephone is depicted. The phone 900 includes one or more microprocessors 912, and memory 1010 (e.g., non-volatile memory such as ROM and volatile memory such as RAM) that stores processor readable code that is executed by one or more processors of the control processor 912 to implement the functions described herein.

The mobile device 900 may include, for example, a processor 912, memory 1010 including applications and non-volatile storage. The processor 912 may implement communications as well as any number of applications, including the interactive applications described herein. The memory 1010 can be any variety of memory storage media types including non-volatile and volatile memory. The device operating system handles the different operations of the mobile device 900 and may contain user interfaces for operations such as making and receiving phone calls, text messaging, checking voicemail, and the like. The application 1030 may be any kind of program, such as a camera application for photos and/or videos, an address book, a calendar application, a media player, an internet browser, games, other multimedia applications, an alarm application, other third party applications, the interaction applications discussed herein, and so forth. The non-volatile storage component 1040 in memory 1010 contains data such as web caches, music, photos, contact data, scheduling data, and other files.

The processor 912 also communicates with the RF transmit/receive circuitry 906, which circuitry 906 in turn is coupled to the antenna 902, which also communicates with the infrared transmitter/receiver 908, with any additional communication channel 1060 like Wi-Fi or bluetooth, and with the movement/orientation sensor 914 like an accelerometer. An accelerometer is included in the mobile device to enable applications such as intelligent user interfaces that let the user input commands through gestures, indoor GPS functionality that calculates the movement and direction of the device after disconnecting from GPS satellites, and to detect the orientation of the device and automatically change the display from portrait to landscape when the phone is rotated. The accelerometer may be provided, for example, by a micro-electromechanical system (MEMS), which is a tiny mechanical device (micron-scale) built on a semiconductor chip. Acceleration direction, as well as orientation, vibration and shock can be sensed. The processor 912 also communicates with a ringer/vibrator 916, a user interface keypad/screen, a biometric sensor system 918, a speaker 1020, a microphone 922, a camera 924, a light sensor 926, and a temperature sensor 928.

The processor 912 controls the transmission and reception of wireless signals. During a transmit mode, processor 912 provides a voice signal or other data signal from microphone 922 to RF transmit/receive circuitry 906. Transmit/receive circuitry 906 transmits the signal to a remote station (e.g., a fixed station, carrier, other cellular telephone, etc.) for communication via antenna 902. The ringer/vibrator 916 is used to signal an incoming call, text message, calendar reminder, alarm clock reminder, or other notification to the user. During a receive mode, the transmit/receive circuitry 906 receives voice or other data signals from a remote station via the antenna 902. The received voice signals are provided to the speaker 1020, while other received data signals are also processed appropriately.

In addition, a physical connector 988 may be used to connect the mobile device 900 to an external power source, such as an AC adapter or powered docking station. The physical connector 988 may also be used as a data connection to a computing device. The data connection allows operations such as synchronizing mobile data with computing data on another device.

A GPS transceiver 965 that relays the location of the user application using satellite-based radio navigation is enabled for such services.

The example computer systems illustrated in the figures include examples of computer-readable storage media. Computer readable storage media are also processor readable storage media. Such media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, cache, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, memory sticks or cards, magnetic cassettes, magnetic tape, a media drive, a hard disk, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer.

FIG. 21 is a block diagram depicting one embodiment of a computing system that may be used to implement a hub computing system like that in FIGS. 1A and 1B. In this embodiment, the computing system is a multimedia console 800, such as a gaming console. As shown in fig. 18, the multimedia console 800 has a Central Processing Unit (CPU) 801 and a memory controller 802 that facilitates processor access to various types of memory, including a flash Read Only Memory (ROM) 803, a Random Access Memory (RAM) 806, a hard disk drive 808, and a portable media drive 806. In one implementation, CPU801 includes a level 1 cache 810 and a level 2 cache 812 to temporarily store data and thus reduce the number of memory access cycles made to hard disk drive 808, thereby improving processing speed and throughput.

The CPU801, the memory controller 802, and various memory devices are interconnected together via one or more buses (not shown). The details of the bus used in this implementation are not particularly relevant to understanding the subject matter of interest discussed herein. It should be understood, however, that such a bus may include one or more of serial and parallel buses, a memory bus, a peripheral bus, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures can include an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA (eisa) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus also known as a mezzanine bus.

In one embodiment, the CPU801, memory controller 802, ROM803, and RAM806 are integrated onto a common module 814. In this embodiment, ROM803 is configured as a flash ROM that is connected to memory controller 802 via a PCI bus and a ROM bus (neither of which are shown). RAM806 is configured as multiple Double Data Rate Synchronous Dynamic RAM (DDRSDRAM) modules that are independently controlled by memory controller 802 via separate buses (not shown). Hard disk drive 808 and portable media drive 805 are shown connected to memory controller 802 by a PCI bus and an AT attachment (ATA) bus 816. However, in other implementations, different types of dedicated data bus structures may alternatively be applied.

Graphics processing unit 820 and video encoder 822 form a video processing pipeline for high speed and high resolution (e.g., high definition) graphics processing. Data is transmitted from a Graphics Processing Unit (GPU) 820 to a video encoder 822 over a digital video bus (not shown). Lightweight messages (e.g., popups) generated by system applications are displayed using the GPU820 interrupt to schedule code to render popup into an overlay. The amount of memory used for the overlay depends on the overlay area size, and the overlay preferably scales with the screen resolution. Where the concurrent system application uses a full user interface, it is preferable to use a resolution that is independent of the application resolution. A scaler (scaler) may be used to set this resolution, thereby eliminating the need to change the frequency and cause a TV resynch.

An audio processing unit 824 and an audio codec (coder/decoder) 826 form a corresponding audio processing pipeline for multi-channel audio processing of various digital audio formats. Audio data is transmitted between audio processing unit 824 and audio codec 826 via a communication link (not shown). The video and audio processing pipelines output data to an a/V (audio/video) port 828 for transmission to a television or other display. In the illustrated implementation, the video and audio processing component 820 and 828 are installed on the module 214.

FIG. 21 shows a module 814 that includes a USB host controller 830 and a network interface 832. USB host controller 830 is shown in communication with CPU801 and memory controller 802 via a bus (e.g., a PCI bus) and hosts peripheral controllers 804(1) - (804) - (4). The network interface 832 provides access to a network (e.g., the internet, home network, etc.) and may be any of a wide variety of various wired or wireless interface components including an ethernet card, a modem, a wireless access card, a bluetooth module, a cable modem, and the like.

In the implementation depicted in fig. 21, the console 800 includes a controller support subassembly 840 for supporting four controllers 804(1) -804 (4). The controller support subassembly 840 includes any hardware and software components necessary to support wired and wireless operation with external control devices such as, for example, media and game controllers. The front panel I/O subassembly 842 supports the multiple functions of the power button 812, the eject button 813, as well as any LEDs (light emitting diodes) or other indicators exposed on the outer surface of the console 802. Subassemblies 840 and 842 are in communication with module 814 via one or more cable assemblies 844. In other implementations, the console 800 may include additional controller subcomponents. The illustrated implementation also shows an optical I/O interface 835 configured to send and receive signals that may be passed to module 814.

MUs 840(1) and 840(2) are shown as being connectable to MU ports "a" 830(1) and "B" 830(2), respectively. Additional MUs (e.g., MUs 840(3) -840 (6)) are shown as connectable to controllers 804(1) and 804(3), i.e., two MUs per controller. Controllers 804(2) and 804(4) may also be configured to receive MUs (not shown). Each MU840 provides additional storage on which games, game parameters, and other data may be stored. In some implementations, the other data can include any of a digital game component, an executable gaming application, an instruction set for expanding a gaming application, and a media file. When inserted into console 800 or a controller, MU840 may be accessed by memory controller 802. The system power supply module 850 supplies power to the components of the gaming system 800. A fan 852 cools the circuitry within console 800. A microcontroller unit 854 is also provided.

An application 860 comprising machine instructions is stored on hard disk drive 808. When console 800 is powered on, various portions of application 860 are loaded into RAM806, and/or caches 810 and 812, for execution on CPU801, with application 860 being one such example. Various applications may be stored on hard disk drive 808 for execution on CPU 801.

Gaming and media system 800 may be used as a standalone system by simply connecting the system to monitor 16 (FIG. 1A), a television, a video projector, or other display device. In this standalone mode, gaming and media system 800 allows one or more players to play games or enjoy digital media, such as watching movies or listening to music. However, with the integration of broadband connectivity made possible through network interface 832, gaming and media system 800 may also be operated as a participant in a larger network gaming community.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A system for adjusting a see-through, near-eye, mixed reality display to align with an interpupillary distance (IPD), comprising:

a see-through, near-eye, mixed reality display device (2) comprising a display optical system (14) for each eye, each display optical system having an optical axis (142) and being positioned to be seen through by a respective eye (160);

the display device comprises a respective movable support structure (115, 117) for supporting each display optical system;

at least one sensor (134, 152) attached to each display optical system of the display device, the sensor having a detection area (139, 152) at a location for capturing data of the respective eye;

a memory (214, 330, 1010, 803, 806, 840) for storing software and data, the data comprising the captured data for each eye;

one or more processors (210, 320, 324, 801, 820, 912) having access to the memory to determine one or more position adjustment values for each display optical system for alignment with the IPD based on the captured data and a position of a respective detection area; and

at least one display adjustment mechanism (203, 205) attached to the display device and communicatively coupled to the one or more processors to move at least one movable support structure to adjust a position of a respective display optical system according to the one or more position adjustment values.

2. The system of claim 1, wherein the respective movable support structure further supports an image generation unit (120) aligned with the display optics.

3. The system of claim 1,

the one or more processors determining one or more position adjustment values for each display optical system based on the captured data and the position of the respective detection region further comprise the one or more processors determining a pupil position of each eye relative to an optical axis of its respective display optical system.

4. The system of claim 1, wherein the at least one display adjustment mechanism is capable of moving at least one movable support structure (203, 205) in any one of three dimensions under automatic control of the one or more processors according to the one or more position adjustment values.

5. The system of claim 1,

the at least one display adjustment mechanism is automatically controlled by the one or more processors to adjust (408) a position of the respective display optical system according to the one or more position adjustment values.

6. The system of claim 1, further comprising:

the one or more processors cause the display device to electronically provide (333) instructions for a user to activate the at least one display adjustment mechanism to move the at least one movable support structure according to the one or more position adjustment values;

the at least one display adjustment mechanism comprises a mechanical controller (203b, 207, 210, 211, 213, 203a, 223, 225, 227, 221, 204, 127, 123) having a calibration for user activation of the controller to correspond to a predetermined distance and direction of movement of the at least one display optical system; and

the one or more processors determine content of the instruction based on the calibration.

7. The system of claim 4, wherein:

the at least one display adjustment mechanism under control of the one or more processors moves an image generation unit (120) aligned with the display optical system in any of three dimensions so as to maintain an optical path between the image generation unit and the respective display optical system.

8. In a see-through, near-eye, mixed reality display system (2, 4, 5) including a display optical system (14) for each eye, each display optical system having an optical axis (142) and positioned to be seen through by a respective eye (160), a method for aligning a see-through, near-eye, mixed reality display device with an interpupillary distance (IPD) of a user, the method comprising:

automatically determining whether the see-through, near-eye, mixed reality display is aligned with a user IPD according to an alignment criterion (301); and

in response to the display device not being aligned with the user IPD according to the alignment criteria,

determining one or more adjustment values (407) of at least one display optical system for satisfying the alignment criterion, and

adjusting the at least one display optical system by moving at least one movable support structure supporting the at least one display optical system based on the one or more adjustment values (408).

9. The method of claim 8, wherein adjusting the at least one display optical system based on the one or more adjustment values further comprises adjusting the at least one display optical system (453, 418) according to a depth adjustment value.

10. The method of claim 8, wherein the one or more adjustment values are determined based on a three-dimensional position difference vector representing a position difference between a current gaze vector to an object in a three-dimensional user field of view and a reference gaze vector.