HK1173239B

HK1173239B - Camera navigation for presentations

Info

Publication number: HK1173239B
Application number: HK13100224.3A
Authority: HK
Inventors: J．F．弗兰德; C．H．普拉特里; A．库普萨米; S．巴斯彻; R.S.迪茨
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2009-12-17
Filing date: 2010-11-18
Publication date: 2014-07-25

Description

Camera navigation for presentations

Background

Many computing applications, such as computer games, multimedia applications, office applications, and the like, use controls to allow a user to manipulate game characters or other aspects of the application. Such controls are typically entered using, for example, a controller, remote control, keyboard, mouse, etc. For example, presentation management typically involves user interaction for interacting with a controller/clicker and/or directly with the computing device being presented. These control methods have various disadvantages. For example, these controls can be difficult to learn, thereby creating an obstacle between the user and these games and applications. Controllers typically have a limited number of buttons, so the available navigation is limited, and the various types of commands can vary widely between different types and capabilities of clickers. The use of a controller while presenting may distract the viewer. Also, in order to share control of a presentation among multiple users, the users must each have access to and/or distribute a controller.

SUMMARY

Techniques are disclosed herein for managing presentation of information in a gesture-based system, wherein gestures are derived from a pose or motion of a user's body in physical space. The user may use gestures to control the manner in which information is presented or otherwise interact with the gesture-based system. For example, a capture device may capture data representing user gestures, and may employ gesture recognition techniques to recognize gestures that may be suitable for controlling aspects of a gesture-based system. The presenter of information may be incorporated into the presentation via an avatar or other visual representation. Thus, the user is sedentary in the system in the form of a visual representation that can interact with the information presentation. Viewers of the presentation may similarly be immersed in the system. Thus, the immersion of the user in the system, including both the presenter and the viewer, provides a virtual relationship between the users that is more interactive than a simple display of information.

A user may present information to an audience using gestures that control aspects of the system, or multiple users may work together using gestures to share control of the system. Thus, in an example embodiment, a single user is able to control presentation of information to an audience via gestures. In another example embodiment, multiple participants may be able to share control of the presentation via gestures captured by the capture device or otherwise interact with the system to control aspects of the presentation. Gestures may be applicable to various presentation formats. In an example embodiment, a gesture control includes aspects of a presentation application in a non-sequential information format.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Brief Description of Drawings

The foregoing summary, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the embodiments, there is shown in the drawings exemplary constructions of the embodiments; however, the embodiments are not limited to the specific methods and instrumentalities disclosed. In the drawings:

FIG. 1 illustrates an example embodiment of a target recognition, analysis, and tracking system and a user playing a game.

Fig. 2A and 2B each illustrate an example embodiment of a shared presentation experience.

FIG. 2C depicts an example embodiment of an example target recognition, analysis, and tracking system and display of multiple users and a visual representation of each user in a physical space.

Fig. 2D and 2E depict example target recognition, analysis, and tracking systems, and example embodiments of user interaction via a fixed point focus function, from a side view and a top view, respectively.

Fig. 3 is an illustration of shared control over a network for a presentation.

FIG. 4 illustrates an example embodiment of a capture device that may be used in a target recognition, analysis, and tracking system.

FIG. 5A illustrates a skeletal mapping of a user generated from a target recognition, analysis, and tracking system such as that shown in FIG. 3.

FIG. 5B illustrates additional details of a gesture recognizer architecture such as that shown in FIG. 4.

FIG. 6 depicts an example flow diagram of a method of establishing a shared presentation experience and generating a visual representation to represent multiple users in a physical space.

FIG. 7 illustrates an example embodiment of a computing environment in which techniques described herein may be implemented.

FIG. 8 illustrates another example embodiment of a computing environment in which techniques described herein may be implemented.

Detailed description of illustrative embodiments

Techniques for managing presentation of information to an audience via gestures are disclosed herein. The subject matter of the disclosed embodiments is described with specificity to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the claimed subject matter might be embodied in other ways to include similar elements to the elements described herein in conjunction with other present or future technologies.

Gestures may be derived from the position or motion of the user in physical space and may include any dynamic or static user motion, such as running, moving fingers, or static gestures. According to an example embodiment, a capture device, such as a camera, may capture user image data, including data representing a gesture of a user. The computer environment may be used to recognize and analyze gestures made by a user in the user's three-dimensional physical space, such that the user's gestures may be interpreted to control various aspects of the system or application space. The computer environment may display user feedback by mapping the user's gestures to an on-screen avatar.

Gesture-based systems may employ techniques for managing or controlling presentation of information using gestures. The user may use gestures to control the manner in which information is presented or otherwise interact with the gesture-based system. In an example embodiment, a single user is able to control presentation of information to an audience via gestures. In another example embodiment, multiple participants may be able to share control of the presentation via gestures captured by the capture device or otherwise interact with the system to control aspects of the presentation.

The systems, methods, techniques, and components of presentation management may be implemented in a multimedia console, such as a gaming console, or any other computing environment in which it is desirable to display a visual representation of an object, including by way of example and not limitation, satellite receivers, set-top boxes, electronic games, Personal Computers (PCs), portable telephones, Personal Digital Assistants (PDAs), and other handheld devices.

FIG. 1 illustrates an example embodiment of a configuration of a target recognition, analysis, and tracking system 10, which target recognition, analysis, and tracking system 10 may employ the disclosed techniques for immersing a user in a gesture-based system that allows interaction via gestures. In this example embodiment, the user 18 is playing a bowling game. In an exemplary embodiment, the system 10 may recognize, analyze, and/or track a human target such as the user 18. The system 10 may collect information related to the user's movements, facial expressions, body language, emotions, etc. in physical space. For example, the system may identify and scan human target 18. System 10 may use body gesture recognition techniques to identify the body type of human target 18. System 10 may identify body parts of user 18 and how they move.

As shown in FIG. 1, the target recognition, analysis, and tracking system 10 may include a computing environment 212. The computing environment 212 may be a multimedia console, Personal Computer (PC), gaming system or console, handheld computing device, PDA, mobile phone, cloud computer, capture device, and the like. According to an example embodiment, the computing environment 212 may include hardware components and/or software components to make the computing environment 212 available to execute applications. The application may be any program that operates or is executed by a computing environment, including gaming and non-gaming applications, such as word processing programs, spreadsheets, media players, database applications, computer games, video games, chat, forums, communities, instant messaging, and so forth.

As shown in FIG. 1, the target recognition, analysis, and tracking system 10 may also include a capture device 202. The capture device 202 may be, for example, a camera that may be used to visually monitor one or more users, such as the user 18, such that gestures performed by the one or more users may be captured, analyzed, and tracked to perform one or more controls or actions in an application. In the example embodiment shown in fig. 1, the virtual object is a bowling ball and the user moves in the three-dimensional physical space as if actually holding the bowling ball. The posture of the user in the physical space may control the bowling ball 17 displayed on the screen 14. In example embodiments, a human target, such as the user 18, may actually hold a physical object. In these embodiments, a user of the gesture-based system may hold the object so that the motions of the player and the object may be used to adjust and/or control parameters of the game. For example, the motion of a player holding a racquet may be tracked and utilized to control an on-screen racquet in an electronic sports game. In another example embodiment, the motion of a player holding an object may be tracked and utilized to control an on-screen weapon in an electronic combat game.

According to one embodiment, the target recognition, analysis, and tracking system 10 may be connected to an audiovisual device 16, such as a television, a monitor, a high-definition television (HDTV), or the like, that may provide game or application visuals and/or audio to a user, such as the user 18. For example, the computing environment 212 may include a video adapter such as a graphics card and/or an audio adapter such as a sound card that may provide audiovisual signals associated with the game application, non-game application, or the like. The audiovisual device 16 may receive the audiovisual signals from the computing environment 212 and may then output the game or application visuals and/or audio associated with the audiovisual signals to the user 18. According to one embodiment, the audiovisual device 16 may be connected to the computing environment 212 via, for example, an S-video cable, a coaxial cable, an HDMI cable, a DVI cable, a VGA cable, or the like.

As used herein, a computing environment may refer to a single computing device or computing system. The computing environment may include non-computing components. As used herein, computing systems, computing devices, computing environments, computers, processors, or other computing components may be used interchangeably. For example, the computing environment may include the entire target recognition, analysis, and tracking system 10 shown in FIG. 1. The computing environment may include the audiovisual device 16 and/or the capture device 202. Either or both of the exemplary audiovisual device 16 or capture device 202 may be separate entities coupled to the computing environment or may be part of a computing device that processes and displays, for example. Thus, the computing environment may be a standalone capture device that includes a processor that can process the captured data. Thus, the capture device 202 may be equipped to not only process captured data, but also analyze and store data, output data to a screen, and so forth.

As shown in FIG. 1, the target recognition, analysis, and tracking system 10 may be used to recognize, analyze, and/or track a human target such as a user 18. For example, the user 18 may be tracked using the capture device 202 such that gestures of the user 18 may be interpreted as controls that may be used to affect an application executed by the computer environment 212. Thus, according to one embodiment, user 18 may move his or her body to control the application. The system 10 may track the user's body and the movements made by the user's body, including gestures to control aspects of the system such as applications, operating systems, and so forth.

The system 10 may translate the input to the capture device 202 into an animation that represents the user's motion such that the animation is driven by the input. Thus, the user's motion may be mapped to a visual representation, such as an avatar, such that the user's motion in physical space is simulated by the avatar. The rate at which frames of image data are captured and displayed may determine a level of continuity of the displayed motion of the visual representation.

FIG. 1 depicts an example embodiment of an application executing on computing environment 212 that may be a bowling game that user 18 may be playing. In this example, computing environment 212 may use audiovisual device 16 to provide a visual representation of a bowling gym and bowling lane to user 18. The computing environment 212 may also use the audiovisual device 16 to provide a visual representation of a player avatar 19 that the user 18 may control with his or her movements. The computer environment 212 and capture device 202 of the target recognition, analysis, and tracking system 10 may be used to recognize and analyze gestures made by the user 18 in their three-dimensional physical space so that the user's gestures may be interpreted as controls for the player avatar 19 in the game space. For example, as shown in FIG. 1, the user 18 may make a bowling motion in physical space to cause the player avatar 19 to make a bowling motion in game space. Other movements of the user 18 may also be interpreted as controls or actions, such as controls for walking, selecting a ball, positioning an avatar on a bowling lane, throwing a ball, and so forth.

Multiple users may interact with each other from remote locations. The computing environment 212 may use the audiovisual device 16 to provide a visual representation of a player avatar that another user may control with his or her movements. For example, a visual representation of another bowler on the audiovisual device 16 may represent another user, such as a second user in a physical space with the user or a networked user in a second physical space. For multiple users participating in the system, it may be beneficial to each wear an immersive display or to view by capturing a display of the real-world environment. For example, a user can view a wide area of the environment or can focus on an object or event of interest in the environment by adjusting the personal immersive display. An example immersive display is a head-mounted unit (wearable head piece) that includes a capture device and a display component.

Gestures may be used in a video game specific context, such as the bowling game example shown in fig. 1. In another game example, such as a driving game, various movements of the hands and feet may correspond to maneuvering the vehicle in a direction, shifting gears, accelerating, and braking. The player's gestures may be interpreted as controls corresponding to actions other than controlling avatar 19, such as gestures for input in a general purpose computing context. For example, various movements of the user's 18 hands or other body parts may be used to end, pause or save a game, select a level, view a high score, communicate with a friend, and so forth.

While FIG. 1 depicts a user in a video game-specific context, it is contemplated that the target recognition, analysis, and tracking system 10 may interpret target movements for controlling aspects of operating systems and/or applications that are outside the scope of the game. Substantially any controllable aspect of an operating system and/or application may be controlled by movement of a target, such as user 18. For example, user gestures may correspond to common system level tasks, such as navigating up or down in a hierarchical list, opening a file, closing a file, and saving a file. The user's gestures may be controls that are applicable to the operating system, non-gaming aspects of the game, or non-gaming applications. For example, a user's gesture may be interpreted as an object manipulation, such as controlling a user interface. For example, consider a user interface with blades (blades) or tabbed interfaces arranged vertically from left to right, where selection of each blade or tab opens up options for various controls within the application or system. The system may identify a hand gesture of the user's mobile tab in which the user's hand in physical space is actually aligned with the tab in application space. Gestures including a pause, a grab motion, and then a left hand swipe may be interpreted as a selection of a tab, and then moving it away to open the next tab.

Fig. 2A and 2B illustrate an example embodiment of a gesture-based system 200 that may capture one or more users shown in a physical space 201. The system 200 may recognize gestures from captured data corresponding to control of the system 200. A user may use gestures to manage the presentation of information, or multiple users may work together via gestures to collaborate with the presented information.

The system 200 may include any number of computing environments, capture devices, displays, or any combination thereof. The system 200 shown in FIG. 2A includes a capture device 202, a computing environment 212, and a display device 222. The system 200 shown in FIG. 2B represents a gesture-based system 200 having multiple computing environments 212,213,214, capture devices 202,203,204, and displays 222, 223, 224. The computing environment may be a multimedia console, a Personal Computer (PC), a gaming system or console, a handheld computing device, a PDA, a mobile phone, a cloud computer, etc., and may include or otherwise be connected to a capture device or display. The capture device, computing device, and display device may include any suitable devices that perform the desired functions, such as the devices described with reference to FIG. 1 above or FIGS. 3-8 below.

Both fig. 2A and 2B depict examples of productivity scenarios for computer management that may involve multiple users in a physical space, such as users 260, 261, 262, 263. For example, the physical space may be a conference room, and the users 260, 261, 262, 263 may attend a conference or coordinate opinions. It is also contemplated that the remote user may interact with the system 200 from a second location, such as another room along the lobby, or from an even further remote location, such as the user's home or office located in another state. FIG. 3, described below, depicts an example of a networked environment that allows both local and remote users to interact via a gesture-based system for computer management in this scenario.

The productivity scenario in this context may be in the form of any information presentation that may be managed via a gesture-based system 200, such as the example systems shown in fig. 2A and 2B. In an example scenario, users 260, 261, 262, 263 may be in a meeting, where the output of a presentation application is displayed to display 222. The information presentation may include animations, graphics, text, etc., and is presented in a word processing format, video and/or audio presentation, slide show, database application, media player, chart (e.g., flow chart, pie chart), chat, forum, community, instant messaging, or any other form of work product.

The capture device 202 may capture data representing user gestures, such as gestures of each of the users 260, 261, 262, 263. As shown in fig. 2A and 2B, the user may be at different distances from the system 200 or particular components in the system and still have control over various aspects of the system. Gestures may be recognized as controls on the presentation of information. Gestures may control aspects of a gesture-based system. Gestures may control anything displayed on the screen, such as adding words to a document, scrolling down or paging through a document, crossing columns in a spreadsheet, pivoting or rotating a three-dimensional map, zooming in or out, and so forth. For example, considering display 222 displaying a virtual chalk board or a dry erase board, a user's gesture may be recognized to draw or write letters to a screen, switch between slides in a slide show, pause a slide show, and so forth. The user may make a gesture to add a emphasis number to the word document.

The gestures may incorporate audio commands or the audio commands may supplement the user gestures. For example, the user may make a gesture to add a accent to the document and then speak a word after the accent point that is subsequently added to the document. The system may recognize the combination of the gesture of adding the accent and audio as a control that adds the accent and then writes the spoken word behind the accent. Because the user can make gestures to control the presentation, there is much flexibility for the user to walk around in the room.

The user is not limited to using the controller or having to press a button directly on the computing environment. However, it is contemplated that the use of gestures may be combined with the use of controls/buttons to accommodate both types of control. In addition to user gestures, including, for example, body movements, some gestures may use information input via the controller.

For exemplary purposes, the snapshot of the example information presentation shown in FIG. 2A includes a diagram 225 displayed on the screen. Consider a user, such as user 261, presenting information to other users 260, 262, 263 and/or remote users (not shown) in a room. Gestures may replace or supplement the need for a controller or similar need for an administrative demonstration. An aspect of an application that can be completed via a mouse or a click or a laser or a microphone or other peripheral device can be completed by a gesture. A gesture may control any aspect of a gesture-based system that may previously require a user to have a controller. For example, a gesture may control a virtual laser that displays light on a screen. The user may make gestures with a finger rather than using a laser pointer to highlight something of interest by projecting a small bright light spot on the display. Movement of the user's finger may simulate movement of the laser pointer, and the system may recognize the gesture and display a spot of light on the screen corresponding to the finger movement; no controller is actually required. Gestures may also control various aspects of the system that are different, or otherwise not controllable with the controller. For example, controlling the motion of the visual representation to directly correspond to the user's motion in physical space is available via gestures, whereas animation of the visual representation is not a direct translation of the user's gestures when a controller is required for input.

FIG. 2B illustrates another example embodiment of a system 200 that may capture a user in a physical space 201 and map the captured data to a visual representation in a virtual environment. Similar to FIG. 2A, system 200 includes capture device 202, computing environment 212, and display device 222. However, in this example, the system 200 also includes additional computing environments, such as another capture device 205 and computing environments 213,214 associated with the users 260, 263, respectively, and located within a close proximity to the users 260, 263, and capture devices 203 and 204 associated with the computing environments 213,214, respectively.

Various devices or components in a gesture-based system may exchange information with each other via a network and communicate to share information. For example, the capture devices 202,203,204,205 and each computing environment 212,213,214 may communicate through a wired or wireless connection, such as via a cable connection, a Wi-Fi connection, or a home wireless network. The various capture devices 202,203,204,205 may share the captured data with each other, or the central computing environment may aggregate the data for processing and interpretation. For example, the computing environment 212 may be a computing environment that stores and executes a presentation application that is visible to multiple users 260, 261, 262, 263 in a physical space. The computing environment 212 may receive the captured data from other components in the system, such as directly from, for example, the capture device 204 or from, for example, the computing environment 214.

The display 222 may be the main display at the front of the room, selected to be a size visible to most if not all users in the room. The display may be a head mounted display such as an immersive device. For example, the head mounted display may replace the user's field of view with an artificial visual environment. In another example, only a portion of the user's field of view is replaced by the artificial environment. For example, head mounted displays are capable of head tracking to superimpose information on a display of the user's field of view. The display may adjust the displayed image depending on the direction in which the user's head is aligned. For example, if the user is viewing a map on the display, each user's head mounted display may include an enlarged portion that indicates where the user is viewing. Alternatively, the user may navigate through the presentation differently than other users in the room by interacting with his or her personal display. Thus, the realism of the virtual environment may make the simulated environment more appealing, with the user having more realistic and personalized interactions with the virtual reality.

The computing environment 212 attached to the primary display 222 may aggregate data from the capture device 202,203,204,205 and the computing environments 213,214 to analyze a compilation of data captured in physical space. Gestures may be recognized from the aggregated data and result in control of an aspect of the gesture-based system. An example network arrangement that provides for such communication between components in a gesture-based system is described in more detail below with reference to FIG. 3.

As reflected by the example shown in FIG. 2B, it is contemplated that multiple users 260, 261, 262, 263 can interact via networked system components. The system enables multiple components to combine their inputs to apply a gesture. Thus, multiple users 260, 261, 262, 263 may control aspects of the system. For example, multiple capture devices with different capabilities for capturing touch, depth, audio, video, etc. may capture data about multiple users 260, 261, 262, 263 from a physical space. The captured data can be aggregated and processed to control aspects of the system on behalf of multiple users. This enables multiple users to interact with each other via gestures and via the system, thereby enhancing collaborative characteristics in a productivity environment via gestures.

Referring to FIG. 2B, consider that users 260, 261, 262, and 263 are in a meeting and captured by capture device 205, where user 260 is associated with another capture device 203 and user 263 is associated with another capture device 204. The physical space may be a large space, such as a large conference room, and a user, such as user 263, may be seated away from the capture device 205. However, the user may be closer to the capture device 204 in the physical space and may also view the shared presentation on a closer, smaller screen, such as via the display 224 or a cellular device (not shown). Computing environment 212 may execute the presentation application and output to multiple displays, such as displays 222, 223, and 224.

Captured data from any of the capture devices may be analyzed for gesture recognition, and thus any user within the field of view of the networked capture devices may control the presentation displayed on the primary display 222 via gestures. The user may control aspects of the shared presentation by making gestures with a second computing environment or capture device in the physical space. For example, the user may make small-scale gestures to the computing environment 214 that is closer to the user 263. The small-scale gesture may not be captured by the capture device 205 if other users or objects in the room, such as the display 224, obstruct the field of view of the capture device, for example. The gestures of the user 263 may be captured by the capture device 204 and processed by the local computing environment 214 or shared with the computing environments 212,213, e.g., via a network. Any computing environment networked in the same gesture-based system may thus process captured data from any number of capture devices also networked in the gesture-based system. Gestures of user 263 may be recognized as controls for a presentation shared between computing environments. Thus, gestures of a first user associated with a first capture device/computing environment may control aspects of a second computing environment.

For example, the users 260, 263 may interact with the close-range computing environments 213,214, respectively, via gestures, and the corresponding controls may be transferred to other computing environments in the system, such as the host computing environment 212 or other personal computing environments networked with the system. The user's gestures may control other computing environments in the session and/or the results of the gestures may be rendered or otherwise indicated on the display of everyone else. For example, the user 263 may make a gesture at a close distance from the computing environment 214. Gestures by user 263 may help manage the presentation displayed on main display 222. The close range capture device 204 may capture data representing gestures of the user 263, and the computing environment 214 may employ gesture recognition techniques to identify the gestures from the captured data. Gestures may control particular aspects of the computing environment 214 associated with the user 263. The gestures may translate into control of the host computing environment 212. Thus, the user is able to make gestures with respect to the close range device and/or the capture device, and have the same effect as if they made gestures with respect to the host computing system.

The use of additional capture devices and/or computing environments may provide additional flexibility for controlling the system 200. For example, consider a large conference room in which the user sits at the back of the room. In some cases, a capture device at the front of a room may not be able to capture a user sitting at the back of the room. For example, there may be a view occlusion between the user and the capture device, the capture device may not be able to register the user, or the capture device may not have resolution to identify the user or to identify, for example, the access level of the user. The user can make a gesture to a second capture device in the physical space that is networked to the system, such as a capture device that is closer to the user or a capture device associated with the user's personal computing environment. Thus, the gesture-based system can integrate multiple users into the system in various ways to manage aspects of the presentation.

In an example embodiment, such as in the example scenarios shown in fig. 2A and 2B, there may be a single user or primary user managing the presentation of information. The collaboration system allows and encourages audience participation via gestures so that audience members can control aspects of the system. A single capture device or combination of capture devices, such as capture device 202 attached to the top of display 222 or capture device 205 hung on a wall, may capture data representing multiple people in a physical space. Multiple persons may gesture to control a single application, such as an application executing on computing environment 212. For example, if the application is a mapping program and the users work together to generate a travel plan, the users can interact with the single mapping program to zoom in, pan, etc. Because multiple users can provide input via gestures to control aspects of the application, multiple controllers need not be passed through the room. Other users may join the room and they may also use gestures without having to find, for example, additional controls.

The system can passively evaluate captured data representing multiple users and their gestures for purposes other than controlling the system. For example, a user's gesture may not be intentionally performed by the user to control an aspect of the system, but may still be identified for data collection purposes. Thus, gestures or information about gestures may be collected and recorded. Consider a plurality of users 260, 261, 262, and 263 in physical space. The system may present a question on the display 222 or audibly communicate the question via, for example, a speaker system. The question may be prompted by a user, such as user 261, who may present information to other users. The system may identify motions made by the plurality of users in response to the question, including those motions corresponding to gestures. This information may be collected and provided to an output, such as a file or display stored in memory, in real time.

For example, the system may poll the viewer for problems posed by the user or the system. The question may be answered by a simple yes or no, or the question may include a plurality of possible answers and each answer may correspond to a gesture. For example, the question may be "how many people vote to agree to option a," and a "yes" answer may be represented by a gesture that includes the user raising his hand straight up in the air. The system can detect from the captured data every user who lifts his or her hand and every user who does not lift his or her hand. The system may thus count the number of "yes" responses based on the gesture identified for each user. Thus, the system can provide an immediate polling result.

Gestures recognized by the system during the presentation may trigger subsequent actions by the system. For example, the system may recognize a handshake between two users, and the handshake may be recognized as an "introduction" gesture between the two users in the room, where the "introduction" gesture includes a two-person handshake. The system may identify each user, identify an "introduction" gesture, and automatically record contact information about each user in a log that is provided to each user at the end of the meeting. Thus, through access to the compiled list of contact information, the user may have contact information about each person with whom the user is handshaking. Another example of information recorded during a meeting may be the identification of a point of interest to a user in a display, where the recording of the information may be triggered by a gesture. For example, during a meeting, a user may make a gesture to indicate that the user is interested in a particular screenshot or point in the presentation, and the system may include information in the identification log to enable the user to easily find the information to review at a later time. The gesture may include a secondary gesture, such as a finger tapping the table or lifting the finger. Thus, the system may record data about each user based on their various gestures without interrupting the flow of the presentation. Each user may then access an information log specific to that user.

Other actions may be triggered if the system recognizes a gesture. For example, in addition to the events discussed above, other common events occurring during a conference are considered, such as delegating duties to conference participants, dialing remote users, indicating a user who wants a copy of a particular document, and so forth. Part of the conference may include delegating responsibility to individual users. The system or user may request or designate a volunteer, and the recognized gesture for designating the user may include the presenter pointing at the volunteer or the volunteer lifting his or her hand. The system may recognize the gesture as accepting an action item and associate the user with the action item, such as in a project list or spreadsheet. In another example, a user may be presented with a user, such as "do you want a copy of the document? "etc., and depending on the" yes "gesture in the room, the system may automatically generate an email, including an email address associated with each user making the" yes "gesture, and send an email attaching the document to each user. The action can occur in real time so that the user can gain access to the document while still in the meeting, or the action can be placed in a queue so that the system takes action at the end of the meeting.

The action may trigger an action by a hardware component in the system. For example, if a user lifts his or her hand, the first capture device may receive and process the captured data and recognize the lifted hand as a gesture indicating that the user desires to interact with the system. The first capture device may provide instructions to the second capture device to steer the user so that the user is within the field of view of the second capture device or better focused on the user. Another gesture may be an indication to save a portion of the presented information to memory or to a user's local memory, such as a local memory in a personal computing device, such as computing environment 214 associated with user 263. The user gesture may be an indication to turn on a light, lower a screen, turn on a display device, and the like.

The action triggered as a result of the gesture may be determined by a set of instructions represented in a shorthand format such as a macro. Thus, the actions that the system will take due to a particular gesture may be defined in advance, such as before the meeting begins. The system may have any number of macros implemented due to particular gestures, resulting in actions that may be seamless to the user, such as adding information to a log or providing results to a presenter, or the actions may be known to the user, such as upon receiving an email with information requested via the user's gestures. The macros may be pre-packaged with the system or application, or the macros may be defined by the user.

The system may also take actions due to gestures in a dynamic manner. For example, a user may use a combination of gestures and/or stated commands and request that a user interested in receiving documents by email hold their hands in the air, and then state a command to send an email to the selected user. The system may respond to the combination of commands and gestures and email the subject document to each user holding their hands.

In another example, the system may collect data about the user's behavior in a physical space. For example, a presenter may desire to have feedback regarding the effectiveness of his or her presentation method. For example, during a presentation, the system may employ facial or body recognition techniques to recognize facial features/movements or body gestures/movements of individual users present in physical space. The facial features/movements, body gestures/movements, or a combination thereof may correspond to gestures that indicate a particular expression or emotion of the user. The system may provide information to the presenter to indicate the level of attention, distraction, or related behavior of the audience members. The user may use this information to identify the effective presentation method to use throughout the presentation. For example, if the system collects gestures about the user indicating that a proportion of the users are exhibiting boring or uninteresting and correlates the data to a time when the user displays a large number of words on the display, the user may use the information to indicate that displaying a large number of words is not effective for the group of users. However, for different audiences, words may be appropriate, such as when an audience member is evaluating a word in a document. In another example, the user may use animation and interactive techniques and the system may identify excitement or active user engagement. Thus, the presenter may customize the presentation for the user in the presentation based on information provided to the user in real-time, or prepare the meeting based on previously collected data.

The system may provide dynamic suggestions to the user during the presentation based on their detection of audience behavior related to the presentation. For example, during a presentation, the system may identify boring gestures and suggest to the user to skip through multiple text slides during the multiple text slides. The system may identify that the user gets better audience participation as the user walks around the front of the room rather than sitting aside. The system may provide this information to the user during the presentation, or the system may monitor audience participation over time and generate reports that provide the results to the user.

Each user engagement, whether presenting information, interacting with the system via gestures, viewing the presentation, or merely presenting within the field of view of the capture device, may be represented by a visual representation. FIG. 2C depicts another example gesture-based system 200 that may include a capture device 292, a computing device 294, and a display device 291. For example, capture device 292, computing device 294, and display device 291 may each comprise any suitable device that performs the desired functions, such as the devices described herein.

In this example, the depth camera 292 captures a scene in a physical space 293 where multiple users 260, 262, 263 are present. The depth camera 292 processes depth information and/or provides depth information to a computer, such as computer 294. The depth information may be interpreted for displaying a visual representation of the user 260, 262, 263. For example, the depth camera 292 or, as shown, the computing device 294 to which it is coupled, may output to the display 291.

The visual representation is a computer representation, typically in the form of a two-dimensional (2D) or three-dimensional (3D) model. The visual representation of the user in physical space 293 may take any form, such as an animation, a character, an avatar, and so forth. For example, the visual representation may be an avatar, such as the avatars 295 or 296 shown in FIG. 2C representing users 260 and 263, respectively. The visual representation may be a pointer, arrow, or other symbol, such as hand symbol 297 representing user 262. In this example, the monkey character 295 represents the user 260 and displays a physical pose similar to the physical pose of the user 260 captured by the capture device 292. The avatar 296 representing the user 263 has similar physical characteristics as the user 263, and the capture device even analyzes the clothing of the user 263 and applies to the user's visual representation 296.

Introducing a visual representation into a shared presentation experience may add another dimension to the experience by giving the user a sense of identity within the virtual space. The visual representation of the user may be presented on the display and the avatars of other users in the conversation may be presented on the displays of everyone else, resulting in a set of avatars in the same virtual space, even though the respective users may be in different physical spaces. The avatars of each person in the conversation may each be presented on the television, monitor, of each other person, resulting in a set of avatars that appear to be interacting in the virtual environment. Because the users in the session may include both remote and local users, the display may present visual representations of both the remote and local users so that all users may identify other participants based on the presence of each visual representation.

Each user may be represented by its own customized visual representation, such as a customized avatar or character. The visual representation of the user may reflect characteristics of the user as captured by the capture device. For example, the capture device may employ body/face recognition techniques and transform the captured data into a visual representation of the imaging user. Thus, an avatar on the screen may look and act like the user. The visual representation of the user detected in the physical space 293 may also take alternative forms, such as an animation, a character, an avatar, and so forth. The user may select from various inventory models provided by the system or application for the user's on-screen representation. For example, an inventory model for visually representing a user may include any character representation, such as a representation of a famous character, a piece of taffy, or an elephant. The inventory model may be a fantasy character (e.g., dragon, monster) or a symbol (e.g., pointer or hand symbol).

The visual representation may include a combination of user features and animations. For example, the visual representation of the user may be a monkey character such as character 295, but wearing the user's clothing as captured in the physical space or wearing glasses like the user detected in the physical space. The user's facial expression, body posture, spoken words, or any other detectable characteristic may be applied to the visual representation. The inventory model may be application specific, such as packaged with the program, or the inventory model may be available across applications or system wide.

A user gesture may result in a multimedia response corresponding to the control associated with the gesture. For example, an animation of a visual representation of a user may be presented on a display as a result of a gesture. The multimedia response may be any one or combination of animation, such as movement of a visual representation, a text message appearing on a screen, audio, still images, video, and the like. For example, if the user gesture recognized by the system executing the word processing application comprises a next page gesture, the control of the word processing document may be to flip to the next page in the document. The visual representation associated with the user performing the gesture may be, for example, a large cartoon hand having glasses similar to the user's glasses in the physical space as captured and identified by the gesture-based system. In response to the user's next page gesture, the large cartoon hand may move across the screen and appear to grab a corner of the document and then pull down to reveal the next page in the document. Thus, not only can other users know that a particular user has performed the gesture based on observing an animation of the visual representation unique to that user, but the animation increases the experience via the gesture control system. The user can visualize the results of his or her gestures, and other users can visualize the results. The user can more easily confirm that the gesture was properly recognized and applied by the system.

The multimedia response may be system wide, such as showing an animation appearing on each of the other users' televisions or monitors. When a user causes his or her avatar to be gestured, the gestures may be rendered at all client positions simultaneously. Similarly, when a user speaks or otherwise generates an audio event (e.g., through voice chat) or a text event (e.g., through text chat), audio or text may be presented at all client locations simultaneously.

The particular visual representation may be associated with a particular user. Thus, each user may have an avatar that is a visual representation of himself or herself that may act upon animations related to gestures performed by that user. When the system displays an animation of a particular visual representation, the user and other users may discern the user associated with the particular visual representation. Thus, when a user gesture and visual representation are animated to correspond to the control associated with the gesture, it is possible to discern which user performed the gesture based on which visual representation is animated. In another example embodiment, the visual representations of the users may be common or common among the users without distinguishing features. Thus, it may not always be possible to learn from the displayed animation which user corresponds to a particular visual representation.

In an example embodiment, the system may store information representing the tracked motion of the user, stored in a motion capture file. The system may apply user-specific motion from a particular motion capture file to an avatar or game character so that the avatar or game character may be animated to simulate motion in a manner similar to the motion of the user. For example, the motion capture file may include an indication of the manner in which the user jumped. When a jump animation is applied to the visual representation of the user, the system may apply jumps from the motion capture file such that the visual representation simulates the actual motion of the user captured by the system. In another example embodiment, the capture file includes a universal motion that is common among the users. For example, jumps from a motion capture file may be generic or pre-packaged with the system such that the visual representation performs generic jumps when applied by the system. The multimedia response may be a predetermined response, such as a stored animation or audio clip implemented in response to the identity of a particular gesture.

The multimedia response library may include options representing user gestures as recognized by the system. For example, the multimedia response library may include animation data that is predetermined or pre-recorded and associated with a particular gesture. Alternatively, the multimedia response library may include various options that are implemented when a particular gesture is recognized, and may be selected based on various circumstances. For example, a particular animation may be selected because the animation may apply to a particular application, a particular user, a particular gesture, a particular motion, a skill level, and so forth.

In an exemplary embodiment, the animation applied to the avatar may be, for example, an animation selected from a library of pre-packaged animations, such as animations that accompany a program, application, or system. The selected animation may be an animation corresponding to the system learned user input. In another exemplary embodiment, the animations in the library may be animations entered and recorded by the user into the avatar's animation vocabulary.

Some animations may be applied to a visual representation of a user, even though the user is not performing a gesture. For example, after inferring the user's body language, the system may determine appropriate animations to apply to the user's visual representation to reflect the user's temperament. In another example, the system may detect that a particular user is idle or not making a gesture for a particular period of time. The system may animate the visual representation of the user to reflect the user's idleness. For example, if the user's visual representation is a monkey character, the system may apply an animation that includes the monkey character grabbing a virtual pillow and displaying "zzzzzzz" near the monkey to suggest that the monkey is sleeping.

In an example embodiment, the multimedia response may include a predetermined avatar animation displayed in combination with the sound. For example, a light footstep sound may be played when the avatar is caused to walk. In another example, a heavy footstep sound may be played when the avatar is caused to run. In another example, a scraping or sliding footstep sound may be played when the avatar is caused to move sideways. In another example, a collision sound may be played when the avatar is caused to collide with an object. In another example, a heavy collision sound may be played when the avatar causes an object to fall or collide with another object.

The level or amount of animation may vary depending on, for example, the application currently executing or the demographic data of the viewer. For example, an application that is presenting legal documents during a court trial may not include more animations than a presentation for a child that is used to teach security on a playground. Thus, there may be various animations and levels defined by the system and/or changed by the user to correspond to the user's preferences.

In an example, a presentation paradigm may be a sequential slide presentation in which there is a set of slides presented in a predetermined order. In another example, the application may present the information in a non-sequential manner. For example, a presentation may include a compilation of slides, pictures, images, text, etc., and the presentation may be organized into chapters. During a presentation, the presenter may zoom in or out on individual slides and slide chapters in the slide show. The predefined order may not be defined.

The display of such non-sequentially presented presentation materials may take many forms. For example, the home screen may include a map or large flowchart. The user may zoom in on a different portion of the map or flowchart and may expand one or more slides corresponding to that portion on the screen. The user may move around on the map or flowchart without having to follow any predefined order. Consider a presentation format that includes a hallway or house having a number of doors displayed on a screen. The user may progressively jump back and forth between doors, enter a door, etc. A user may present slides using a zoomable canvas that allows the user, or a participant from an audience, to jump from one part of the presentation to another, zoom in and out, and define chapters. Audience members may navigate to different portions of a presentation via gestures, and as described above, the visual representations may interact with various forms of presentation styles. Thus, a presentation may be a compilation of content without requiring a particular order of presentation. The user is able to essentially enter the virtual world via the visual representation and control aspects of the presentation via gestures. Thus, the gesture-based system can display the asset canvas for discussion, and the user can navigate through this information in real-time. The canvas may include a large amount of information, and the presenter may select portions of the canvas depending on the audience. The canvas or presentation may include more information than is presented, but because of the flexible format, the user and/or presenter may select a section of information that may be appropriate for discussion.

In an example embodiment, such as in the example scenarios shown in fig. 2A and 2B, there may be a single user or primary user managing the presentation of information. For example, it may be desirable for a single user to be managing a presentation without having interactive controls from audience members. Consider a student who if the user 260 is in a meeting or teaching a class-he or she may want to control the presentation individually. Thus, a single user may host an information presentation and control the information presentation via gestures, but gestures of other users in the room may not be able to control the information presentation. In another example embodiment, multiple users can control aspects of the system. For example, it may be desirable for multiple presenters to control various aspects of the system, or it may be desirable to give at least one audience member control. Thus, a fleet of users may share control of a presentation via a single or multiple capture devices to better enable collaboration.

Users may share access to control aspects of the presentation, with limited or no access. For example, the user may be, for example, a primary user, a secondary user, or an observing user. The primary user is any user with the option of controlling aspects of the system. The primary user may have the option of distributing control to other users. For example, the primary user may designate a second user in the room as another primary user, a secondary user, or an observing user. The observing user may be a user who is able to interact with other users, but not via the gesture control system.

The secondary user may be granted the option of controlling aspects of the system. The secondary user may be given temporary control that may continue indefinitely once granted, or the control may continue until taken away. For example, the primary user may be presenting information, but initially does not want interaction from the audience. However, at the end of the user's presentation, the primary user may request comments and feedback from audience members. In an example, a primary user may select a particular user and give that user temporary access to control the system while a secondary user is providing feedback. The secondary user may make gestures to control the displayed information, such as by returning to a previous page in the word processing document, highlighting a chapter, or drawing virtual lines (e.g., circles and arrows) on the display. As such, the secondary user is able to easily interact with information via gestures, which may provide a more efficient and better understanding of user feedback. When completed, the primary user may remove the secondary user's access to the controls or the secondary user may relinquish the controls.

Control of the system may be transferred between users or shared among multiple users. For example, the transfer or sharing of control via gestures provides an easier way for a user to manage information without the need to pass controllers or having to control aspects of a presentation in close proximity to the computing environment. Users can interact with the system from their seats via gestures. More than one user may be the primary user, such as where there are multiple presenters. Each presenter may control certain aspects of the presentation such that the presented content may be private and accessible to certain users when they may coexist.

In another example, the primary user may open the floor to obtain feedback from any user, thereby providing the secondary user access to any user in the room. The primary user may facilitate access to the information controls via gestures of any participant, such as any user participating. For example, at the end of a presentation, the primary user may issue commands to the system to cause the system to process and recognize the gestures of any user in the room and translate the gestures into controls. In another example, the system may identify a request for control and determine whether to grant the request. For example, a gesture requesting control may include the user holding an arm straight up in the air. The system may identify gestures performed by the audience member. The system may indicate that the user has requested control via a gesture. The system may provide the user with access to control aspects of the system upon recognition of a gesture requesting control, or the system may wait until the primary user authorizes giving secondary user control.

In another example, the system may be instructed, such as via settings set by the presenter, to wait until a particular moment in the presentation in order to provide any access to the secondary user. For example, the system may be instructed to wait for the primary user to request feedback, which may be indicated by a gesture for recognition by the system, before allowing other users to have control. At the same time, however, the system may generate a queue of user requests by identifying each user making a gesture requesting control. Then, when the primary user is ready for feedback from the viewer, the system may display a queue to indicate the order in which the user requested the control.

The system may identify other instances of the requesting user's gestures, such as the current information presentation. For example, while the primary user is presenting, the audience member may make a gesture requesting control to indicate that the user desires to give feedback and/or have control over the system. As described above, rather than giving the audience member control at the time, the system may recognize the user's gesture and identify the user in the queue. When the primary user requests feedback, the system may not only indicate the user's request, but may return to the point in the presentation when the user made a gesture. Thus, the system can automatically return to the display screen when the user makes a gesture.

Note that the data may be captured by any capture device that the user is within his field of view, and any computing environment networked with the capture device may employ gesture recognition techniques on the captured data. For example, the host computing environment 210 may employ gesture recognition on data captured by the capture device 205. In another example, a computing environment associated with a particular user may recognize a gesture. For example, the user 263 may interact with the computing environment 214 and the capture device 204 may capture data representing gestures of the user 263.

The gestures for the same control of the system may vary. In the above example, the user may make a gesture requesting control by lifting the hand via the fully extended arm. However, the gesture used to request control may be a small scale gesture, such as a finger or hand motion. Gestures may be defined to limit interruptions in the flow of the presentation. Gestures may vary depending on the computing environment with which the user interacts. For example, user 263 in FIG. 2B may interact with computing environment 214 on a small scale and may apply user or computing environment specific gestures. A user may have a defined set of gesture data available on, for example, the user's personal computing environment or a cellular telephone. The user may thus gesture in a unique manner depending on the personal preferences of the user.

The designation of the level of access for a user may be granted via a number of available sources, such as by another user or an administrator of an application in the physical space, the system itself, and so forth. For example, the system may capture data in a physical space. The system can identify all four users 260, 261, 262, 263 in the physical space from the capture data. The system may simply identify that the human target is in physical space, or the system can identify a particular user based on, for example, body/face recognition techniques. The system may specify a particular level of access for each user.

The system may specify the user's access level based solely on presence or based on the user's location in physical space. For example, the system may analyze data captured by the capture device 202 shown in FIGS. 2A and 2B and identify a human from the captured data. Each user detected in the physical space may be designated by default as having a particular level of access. For example, a user access level based only on presence may be set by default to secondary user access. The system may identify the access level of the user based on the user's location in the room. For example, one or more users seated in a particular portion of a room or, for example, a child, may be designated as a watching user by default. The designation of a level of user access may be modified. The user's designation may be dynamic, changing to reflect the user's activities. For example, when a user is speaking, the system may focus on the speaking user for gesture control.

Various scenarios with multiple participants may each have different requirements, and thus various features described herein may be unique to a scenario and include any varying combination of features. For example, a first method from the example methods described herein or such a similar method for identifying a primary user in a physical space may be more appropriate for a first scenario, but a second method for selecting a primary user may be more appropriate in another scenario. Similarly, the level of access may vary depending on any number of characteristics, such as the application, the conference type, the user type, the type of information being presented, and even the time of day or the lighting in the room.

In another example, a user may be logged into the system and the user's identity may be associated with a particular level of access. For example, a creator of an information presentation may be associated as a primary user, and upon identifying the creator in physical space, the system may target the primary user. Other users detected by the system may be given secondary user designations. Alternatively, a user who is to be the audience for a presentation of the primary user may log into the system and be given a designation at the time of logging in. Alternatively, the primary user may define the access level for the participants of the conference.

The user profile may indicate an appropriate level of access for that particular user. For example, the system may have a database of user profile information (e.g., body or facial feature data) and may correlate the user to a user profile. The system may identify a particular user based on a user profile or based on physical characteristics as identified from the captured data. For example, the system may analyze data captured by the capture device 202 and employ body/face recognition techniques to identify a particular user. If a user profile for a user does not exist, the system may create a user profile. The user profile may provide an indication of the level of access to the user that the system is to employ.

The user designation as a primary user, a secondary user, or an observing user may be a permanent designation. Alternatively, the designation may be changed or changed. Further, the settings/limits/constraints for each access level may be modified. For example, a secondary user may have access to control aspects of the system, but may not have authorization to grant control to other users. However, the settings may be modified so that the secondary user may grant access to other users. The system or application may set the access level based on default information or may specifically define the level for a scenario.

As described above, a user may perform a gesture to specify the access level of another user. Gestures may also be suitable for controlling or managing aspects of information presentation. For example, if the presentation is executed in the form of a word processing application, the user may make a gesture to advance or retreat a page in the document. Gestures may be adapted to highlight an aspect of a document and/or zoom in on the screen for closer analysis by a user having access to a displayed presentation.

The visual representation may correspond to a method of sharing control of the system. For example, if control of the system is transferred from one user to the next, the visual representation may remain unchanged, but the user associated therewith is converted to correspond to the user obtaining control. In another example, each user may have a visual representation associated with him or her, as described above. The system may indicate which users currently have control by some indication related to the visual representation of the users. For example, a visual representation of a user without control may be grayed out or faded out of the display. In another example embodiment, only a visual representation of the user currently making a gesture is displayed and/or animated.

The designation of control may be accomplished via a gesture. For example, the primary user may gesture in a particular manner to give the secondary user the option of controlling aspects of the system. Consider a primary user, such as user 260 in FIG. 2B, presenting to a group of users, such as users 261, 262, 263. The primary user may use gestures to manage information presentation, such as changing a display, moving objects back and forth on a screen, virtually drawing on a screen, highlighting chapters of a document, and so forth.

The system may access or otherwise store profiles that need not be user specific, but these profiles may include information about different genres. For example, the style profile may include gesture information applicable to a type of presentation, a type of presenter, a type of audience, and so on. Preferences may be set or default information defining gestures that apply to each style may be included. The user may select the type of style profile to be implemented for a particular presentation. For example, if the presenter is a composer teaching music for a class student, the gesture may be an expanded, large-amplitude gesture. However, if the presenter is a bookmaker with a reserved gesture, the presenter may select a gesture that includes a smaller amount of movement. Consider the differences that may exist between a salesperson presenting game information to a group of potential customers and a lawyer presenting information to a co-panel during a court trial. The sales person may wish to present in an entertaining manner that may have interesting tones. Thus, a stylistic profile appropriate for the presentation may be a profile with a large, large amplitude movement, while the stylistic profile may indicate that a visual representation with a high animation level should be displayed. However, in the court trial example, the style profile may include smaller gestures and minimal animation.

As described above, the remote computing environments may share resources over the network, including applications and control of the system via input from users in the respective physical spaces. Thus, a user who is remote from the physical space 201 shown in fig. 2A and 2B may control an aspect of the system 200 by making gestures in the user's local physical space. Via a network, a computing environment local to a remote user may share information with other remote or local computing environments via the network, thereby enabling the remote user to control an aspect of the other computing environments networked with the system via gestures. Thus, if, for example, an application is executing on a remote system, user gestures may be processed and analyzed to control aspects of the remote system.

FIGS. 2D and 2E depict a system 200 that may include a capture device 202, a computing device 212, and a display device 222. For example, capture device 202, computing device 212, and display device 222 may each include any suitable device that performs the desired functions, such as the devices described herein.

Fig. 2D and 2E depict example target recognition, analysis, and tracking systems from a side view and a top view, respectively, and example embodiments where user 265 interacts via a fixed point focusing function. This example depicts the control example as relating to the perspective of the user. The user may interact with the system via a pointing gesture, wherein the system has the capability of both pointing and focusing functions. In other words, the system's analysis of the user's pointing gestures may be multidimensional, taking into account the user's gaze and hand movements to determine the user's focus on the screen. The capture device 202 may, for example, track the user's hand position to determine the direction in which the user's head is aimed. The capture device may have fidelity in tracking the actual eye movements of the user to detect a line of sight of the user's eyes corresponding to a location on the screen. The system may also track hand, arm, and finger movements of the user, such as tracking changes in the position of the user's hand, arm, or finger in physical space. The system may analyze the motion and determine a line of sight between the user's body part and the expected point of interest on the screen. By tracking the head/eye gaze with hand/arm/finger coordinates, the system allows the user to make movements or gestures with more natural and instinctive movements, where the user points to a particular location on the screen, and the system may use additional input from the capture device about the user to determine a location of interest on the screen. The system may display a pointer or some other symbol on the screen that represents the system's interpretation of where the user is pointing.

Fig. 3 is an example of an exemplary networked or distributed system 300 that can incorporate the disclosed techniques for enhancing productivity scenarios. Of course, the actual network and database environment may be arranged in a variety of configurations; however, the example environment illustrated herein provides a framework for understanding the type of environment in which embodiments may operate.

The system 300 may include a network of computing environments 302, 304, 306, 308 and capture devices 312, 314, 316. Each of these entities 302, 304, 306, 308, 312, 314, and 316 may include or utilize programs, methods, data stores, programmable logic, and the like. Users 322, 324, 326, and 328 are shown as being locally associated with computing environments 302, 306, 304, respectively. In embodiments disclosed herein, a group of users may replicate the real-world experience of meetings with other users for collaboration in a conference-type setting. Experiences may be replicated in a virtual world in which users are at different physical locations and communicate via a network. A display at the user location may present avatars representing the group of users.

Computing environments 302, 304, 306, and 308 and capture devices 312, 314, and 316 may communicate over network 250. The network 250 may represent any number or type of networks, such that computing environments 302, 304, 306, 308 in the same or different locations may be networked via any type, number, or combination thereof. Network 250 may be any network arranged such that messages may be communicated from one portion of the network to another portion of the network through any number of links or nodes. For example, in accordance with an aspect of the presently disclosed subject matter, each of the physical computing environment, capture device, and display device can contain discrete functional program modules that can use an API, or other object, software, firmware, and/or hardware, to request services from one or more of the other computing environment, capture device, and display device. Any number of users associated with any number of corresponding local computing environments may access the same application via network 250. Thus, by communicating via network 250, any number of users may interact with a plurality of other users via gestures. For example, a gesture performed at a first location may be translated and mapped to a display at a plurality of locations including the first location.

The network 250 may itself comprise other computing entities that provide services to the gesture-based system described herein, and may itself represent multiple interconnected networks. Network 250 may include, for example, an intranet, an internetwork, the Internet, a Personal Area Network (PAN), a Campus Area Network (CAN), a Local Area Network (LAN), a Wide Area Network (WAN), a computer network, a gaming network, etc. Network 250 may also represent technologies that connect various devices in a network, such as fiber optic, Public Switched Telephone Network (PSTN), cellular telephone network, global Telex network, wireless LAN, ethernet, power line communication, and so forth. Computing environments may be connected together by wired or wireless systems, by local networks or widely distributed networks. Any suitable wireless interface may be used for network communications. For example, the wireless link may be according to the following protocol: GSM, CDMA, UMTS, LTE, WIMAX, WIFI, ZIGBEE, or a combination thereof. The network may include a cloud or cloud computing. For example, a cloud infrastructure may include a plurality of services delivered through a data center and built on servers. These services may be accessed anywhere that provides access to the networking infrastructure. The cloud may appear to the user as a single access point, while the infrastructure may not be visible to the client.

As mentioned above, the computing environment may be any suitable device for processing data received by a capture device, such as a dedicated video game console or more general computing device, such as a cellular telephone or personal computer. For exemplary purposes, computing environment 308 is a server, computing environment 306 is a mobile handheld computing device, and computing environments 302 and 306 represent any type of computing environment.

In these examples, capture devices, such as capture devices 312, 314, and 316, may capture a scene in a physical space in which a user is present. Users, such as users 322, 324, 326, and 328, may be within the capture field of view of capture devices 312, 314, and 316 at locations #1, #3, and #4, respectively. Each capture device may capture data representing a gesture of the user at the location. The capture devices of each location may be connected to the local computing environment via a wired connection, a wireless connection, and/or via a network connection. For example, in location #1, capture device 312 is shown connected to computing environment 302 via cable 304. The wired connection may include a cable, such as an S-video cable, coaxial cable, HDMI cable, DVI cable, VGA cable, etc., that couples the capture device to the computing environment. The capture device may be adapted to plug directly into the computing environment or may be otherwise incorporated into a computing environment capable of processing the captured data.

The capture device may provide the captured data to the computing environment for processing. The computing environment may employ gesture recognition techniques on the data in which a user may make gestures to control aspects of the gesture-based system, including aspects of the computing environment or application. In an example embodiment, the capture device itself may be a computing environment capable of processing the captured data. For example, any of the capture devices 312, 314, 316 may have capabilities, such as a processor, for processing the captured data and employing gesture recognition techniques. For example, in location #2, capture device 314 is shown incorporated into handheld computing device 304.

Each computing environment 302, 304, 306, 308 and corresponding capture device is shown at a respective location, namely location #1, location #2, location #3, and location # 4. As used herein, location is a broad term that includes any location in which various portions of a system may be located. For example, locations #1, #2, #3, and #4 may be very close to each other, such as in different rooms in a house, very far from each other, such as in different states, or any combination thereof. Each location may refer to a particular location in the same, general, local location. For example, each location #1, #2, #3, and #4 may refer to a particular location of a component of the system in a general location, such as a conference room. Various portions of system 300 may communicate locally or remotely, such as via network 250.

For example, consider a networked system that may be found to be adapted to a work environment. For example, location #1 and location #3 may represent the same conference room such that the computing environments 302, 306 and the capture devices 312, 314 are in the same room. The local devices may be connected via a local connection, such as via network 250. Location #2 may be a remote location such that remote server 308 may be maintained at the remote location. And computing environment 304 may be a residence where the user works at home and logs into network 250 via a home network connection.

Thus, it is contemplated that location refers to any location in which a device for capturing or processing gesture data may be located. In another example, consider a gaming network in which a remote user connects to a gaming service hosted at server 308. Remote users at each of locations #1, #3, and #4 may be connected via network 250 and may play the same game with each other. In another example, the locations may be local, where local users may work on respective computing environments in the same room and interact with each other through the local area network 250.

In location #1, capture device 312 is shown connected to computing environment 302 via cable 305, but may also communicate with the local computing environment via connection 251 to network 250. For example, the capture device 312 and the computing environment 302 at the same location #1 may be part of the home wireless network 250. The capture device 312 may capture data representing the gestures of the user 322 and provide the data to the computing environment 302 over the home wireless network 250. Thus, the capture device 312 and the computing environment 302 may be in different rooms, such as in a more general location, location # 1. For example, the computing environment 302 may be a central computing environment 302 in the home wireless network 250 and may be located in one room, such as an office. The capture device 312 and display 303 may be located in another room, such as a media or game room in a home. The capture device 312 may be networked with the computing environment 302 to enable it to capture data of the user 322, provide the data to the computing environment 302 for processing, and receive output from the computing environment to the display 303.

Similarly, the computer environment or the capture device may output to a display that is local or remote to either or both of the computer environment or the capture device. For example, in location #1, computing environment 302 is shown with display component 303. However, it is contemplated that, similar to the capture device 304, the display 303 may be connected to the computing environment 302 via a wired connection, a wireless connection, and/or via a network connection. Thus, the display component 303 may be part of the home wireless network 250, for example, and receive output for display from the capture device 304 via a cable or via the network 250.

Components of a networked system may share information locally within a location or remotely across locations. In an example embodiment, the local computing environment 302 receives data representing the user 322 from the capture device 312. The computing environment 302 may output to a local display, such as a display component 303 of the computing environment 302 or another display device that is otherwise connected to the computing environment 302. The computing environment 302 may alternatively or also provide data to a remote computing environment or remote display component for display. For example, the computing environment 302 may communicate with the computing environment 316 over the network 250. The computing environment 306 may receive data from the computing environment 302 and map gestures of the user 322 to a display component 307 that is local to the computing environment 306.

In another example embodiment, the capture device 312 may provide data over the network 250 for analysis or display by a remote computing environment, such as the computing environment 304, 306, or 308. Thus, a computing environment that is remote to the user may process data captured by the capture device 312 that is local to the user 322, but display a visual representation of the user at the remote computing environment.

There are a variety of systems, components, and network configurations that support distributed computing environments. For example, computing systems may be connected together by wired or wireless systems, local networks, or widely distributed networks. Currently, many networks are coupled to the internet, which thereby provides an infrastructure for widely distributed computing and encompasses multiple different networks. Any such infrastructure, whether coupled to the internet or not, may be used for the provided systems and methods.

The network infrastructure may host a variety of network topologies, such as client/server, peer-to-peer, or hybrid architectures. A "client" is a member of a class or group that uses the services of another class or group to which it is not related. In computing, a client is a process, i.e., roughly a set of instructions or tasks, that requests a service provided by another program. The client process uses the requested service without any working details about the other program or the service itself. In a client/server architecture, particularly a networked system, a client is typically a computer that accesses shared network resources provided by another computer (e.g., a server). Any entity 302, 304, 306, 308, 312, 314, and 316 may be considered a client, a server, or both, depending on the circumstances. And, further, with respect to the entertainment console, it may be a client to the server.

A server is typically, but not necessarily, a remote computer system accessible over a remote or local network, such as the Internet. A client process may be active in a first computer system and a server process may be active in a second computer system, communicating with each other over a communications medium, thereby providing distributed functionality and allowing multiple clients to take advantage of the information-gathering capabilities of the server. Any software object may be distributed across multiple computing devices or objects.

The client and server communicate with each other using the functionality provided by the protocol layers. For example, the hypertext transfer protocol (HTTP) is a common protocol used in conjunction with the World Wide Web (WWW), the "web". Typically, a computer network address such as an Internet Protocol (IP) address or other reference such as a Uniform Resource Locator (URL) may be used to identify the server or client computers to each other. The network address may be referred to as a URL address. Communication may be provided over a communication medium, for example a client and server may be coupled to each other for high volume communication over a TCP/IP connection.

Thus, it is contemplated that location refers to any location in which a device for processing or capturing gesture data may be located. In another example, consider a gaming network in which a remote user connects to a gaming service hosted at server 308. Remote users at each of locations #1, #3, and #4 may be connected via network 250 and may play the same game with each other. In another example, the locations may be local, where local users may work on respective computing environments in the same room and interact with each other through the same system components or through a local area network.

As used herein, a reference to a system may be an application to any single portion of the system 200 shown in fig. 2A-3, any combination thereof, or any additional component or computing environment capable of performing similar functions. For example, computing device 302 may provide the functionality of computing device 212 shown with reference to fig. 1 or the computer described below with reference to fig. 8. It is contemplated that any of the computing environments described herein, such as 210,212,213,214 from fig. 1-2C or the computing environments 302, 304, 306, 308 from fig. 3, may be configured as a target recognition, analysis, and tracking system, such as the target recognition, analysis, and tracking system 10 described with reference to fig. 1, and that any of the computing environments may employ techniques for gesture recognition, scaling, or translation. As shown in fig. 1, 2A-2E, and 3, the computing environment 210,212,213,214 may include a display device or may be otherwise connected to a display device. The computing environment may include its own camera component, be connected to a separate capture device, or may be connected to a device having a camera component such as capture device 202. For example, the computing environment 212 may be coupled to the capture device 202 that may capture data from a physical space or otherwise receive gesture information of the user 204 from the capture device 202.

In view of the wide variety of computing environments that may be built in accordance with the general architecture provided herein, and the further changes that may occur in computing in a network environment such as that of FIG. 3, the systems and methods provided herein should not be construed as limited in any way to a particular computing architecture or operating system. Rather, the presently disclosed subject matter should not be limited to any single embodiment, but rather should be construed in breadth and scope in accordance with the appended claims.

Finally, it should be noted that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. Thus, the methods, computer-readable media, and systems of the presently disclosed subject matter, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the subject matter.

In the case of program code execution on programmable computers, the computing environment will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs that may utilize the creation and/or implementation of domain-specific programming models aspects of the present invention, e.g., through the use of a data processing API or the like, are preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations.

FIG. 4 illustrates an exemplary embodiment of a capture device 202 that may be used for target recognition, analysis, and tracking, where the target may be a user or an object. According to an example embodiment, the capture device 202 may be configured to capture video with depth information including a depth image, which may include depth values, via any suitable technique, including for example, time-of-flight, structured light, stereo image, or the like. According to one embodiment, the capture device 202 may organize the calculated depth information into "Z layers," or layers perpendicular to a Z axis extending from the depth camera along its line of sight.

As shown in FIG. 4, the capture device 202 may include an image camera component 22. According to one exemplary embodiment, the image camera component 22 may be a depth camera that may capture a depth image of a scene. The depth image may include a two-dimensional (2-D) pixel area of the captured scene where each pixel in the 2-D pixel area may represent a depth value, such as, for example, a length or distance in centimeters, millimeters, or the like of an object in the captured scene from the camera.

As shown in FIG. 4, according to an exemplary embodiment, the image camera component 22 may include an IR light component 24, a three-dimensional (3-D) camera 26, and an RGB camera 28 that may be used to capture a depth image of a scene. For example, in time-of-flight analysis, the IR light component 24 of the capture device 202 may emit an infrared light onto the scene and may then use sensors (not shown) to detect the backscattered light from the surface of one or more targets and objects in the scene with, for example, the 3-D camera 26 and/or the RGB camera 28. In some embodiments, pulsed infrared light may be used such that the time between an outgoing light pulse and a corresponding incoming light pulse may be measured and used to determine a physical distance from the capture device 202 to a particular location on a target or object in the scene. Additionally, in other exemplary embodiments, the phase of the outgoing light wave may be compared to the phase of the incoming light wave to determine the phase shift. This phase shift may then be used to determine a physical distance from the capture device 202 to a particular location on the targets or objects.

According to another exemplary embodiment, time-of-flight analysis may be used to indirectly determine a physical distance from the capture device 202 to a particular location on the targets or objects by analyzing the intensity of the reflected beam of light over time via various techniques including, for example, shuttered light pulse imaging.

In another exemplary embodiment, capture device 202 may use a structured light to capture depth information. In such an analysis, patterned light (i.e., light displayed as a known pattern such as a grid pattern or a stripe pattern) may be projected onto the scene via, for example, the IR light component 24. Upon falling onto the surface of one or more targets or objects in the scene, the pattern may become distorted in response. Such a deformation of the pattern may be captured by, for example, the 3-D camera 26 and/or the RGB camera 28, and may then be analyzed to determine a physical distance from the capture device 202 to a particular location on the targets or objects.

According to another embodiment, the capture device 202 may include two or more physically separated cameras that may view a scene from different angles to obtain visual stereo data that may be resolved to generate depth information. In another example embodiment, the capture device 202 may use point cloud data (point cloud data) and target digitization techniques to detect features of the user.

The capture device 202 may also include a microphone 30 or microphone array. Microphone 30 may include a transducer or sensor that may receive sound and convert it into an electrical signal. According to one embodiment, the microphone 30 may be used to reduce feedback between the capture device 202 and the computing environment 212 in the target recognition, analysis, and tracking system 10. Additionally, the microphone 30 may be used to receive audio signals that may also be provided by the user to control applications such as gaming applications, non-gaming applications, etc., that may be executed by the computing environment 212.

In an exemplary embodiment, the capture device 202 may also include a processor 32 that may be in operable communication with the image camera component 22. Processor 32 may include a standard processor, a special purpose processor, a microprocessor, etc. that may execute instructions, which may include instructions for receiving a depth image, determining whether a suitable target may be included in a depth image, converting a suitable target into a skeletal representation or model of the target, or any other suitable instructions. For example, a computer-readable medium may include computer-executable instructions for receiving scene data, wherein the data includes data representing a target in a physical space. The instructions include instructions for gesture profile personalization and gesture profile roaming as described herein.

The capture device 202 may also include a memory component 34, where the memory component 34 may store instructions executable by the processor 32, images or frames of images captured by the 3-D camera 26 or the RGB camera 28, or any other suitable information, images, or the like. According to an example embodiment, the memory component 34 may include Random Access Memory (RAM), Read Only Memory (ROM), cache, flash memory, a hard disk, or any other suitable storage component. As shown in FIG. 4, in one embodiment, the memory component 34 may be a separate component in communication with the image capture component 22 and the processor 32. According to another embodiment, the memory component 34 may be integrated into the processor 32 and/or the image capture component 22.

As shown in FIG. 4, the capture device 202 may be in communication with the computing environment 212 via a communication link 36. The communication link 36 may be a wired connection including, for example, a USB connection, a firewire connection, an ethernet cable connection, etc., and/or a wireless connection such as a wireless 802.11b, 802.11g, 802.11a, or 802.11n connection, etc. According to one embodiment, the computing environment 212 may provide a clock to the capture device 202 via the communication link 36 that may be used to determine when to capture, for example, a scene.

Additionally, the capture device 202 may provide the depth information and images captured by, for example, the 3-D camera 26 and/or the RGB camera 28, as well as a skeletal model that may be generated by the capture device 202, to the computing environment 212 via the communication link 36. The computing environment 212 may then use the skeletal model, depth information, and captured images to, for example, control an application such as a game or word processor. For example, as shown in FIG. 4, computing environment 212 may include gesture library 192.

As shown, in FIG. 4, the computing environment 212 may include a gesture library 192 and a gesture recognition engine 190. The gesture recognition engine 190 may include a set of gesture filters 191. The filters may include code and associated data that can recognize gestures or otherwise process depth, RGB, or skeletal data. Each filter 191 may include information that defines a gesture along with parameters or metadata for the gesture. For example, a motion that includes one hand from behind the body to passing in front of the body may be implemented as a gesture filter 191 that includes information representing the movement of one hand of the user from behind the body to passing in front of the body, which movement is to be captured by the depth camera. Parameters may then be set for the gesture. Where the gesture is a throw, the parameters may be a threshold speed that the hand must reach, a distance that the hand must travel (either absolute or relative to the overall size of the user), and a confidence rating by the recognizer engine that the gesture occurred. These parameters for a gesture may vary from application to application, from context to context of a single application, or over time within one context of one application.

Although it is contemplated that the gesture recognition engine 190 may include a collection of gesture filters, where the filters may include code representing or otherwise represent components for processing depth, RGB, or skeletal data, the use of filters is not intended to limit analysis to filters. A filter is a representation of an example component or portion of code that analyzes data of a scene received by a system and compares the data to basic information representing a gesture. As a result of the analysis, the system may generate an output corresponding to whether the input data corresponds to a gesture. The underlying information representing the gesture may be adjusted to correspond to features that recur in the data history representing the user's captured motion. For example, the basic information may be part of a gesture filter as described above. However, any suitable manner for analyzing the input data and the gesture data is contemplated.

In an example embodiment, a gesture may be recognized as a trigger to enter a modification mode in which a user may modify gesture parameters in the user's gesture profile. For example, the gesture filter 191 may include information for identifying a modification trigger gesture. If a modification trigger gesture is recognized, the application may enter a modification mode. The modification trigger gesture may vary between applications, between systems, between users, and so forth. For example, the same gesture in a tennis game application may not be the same modification-triggered gesture in a bowling game application.

The data in the form of the skeletal models captured by the cameras 26, 28 and device 202 and the movements associated therewith may be compared to the gesture filters 191 in the gesture library 192 to identify when the user (as represented by the skeletal models) has performed one or more gestures. Thus, inputs to a filter such as filter 191 may include content such as joint data about the user's joint position, such as the angle formed by the intersecting bones at the joint, RGB color data from the scene, and the rate of change of an aspect of the user. As mentioned, parameters may be set for the gesture. The output from the filter 191 may include such things as the confidence that a given gesture is being made, the speed at which the gesture motion is made, and the time at which the gesture occurred.

The computing environment 212 may include a processor 195 that may process the depth image to determine what objects are in the scene, such as the user 18 or objects in the room. This may be accomplished, for example, by grouping together pixels of the depth image that share similar distance values. The image may also be parsed to produce a skeletal representation of the user in which features such as joints and tissues flowing between joints are identified. Skeletal mapping techniques exist that capture individuals using a depth camera and determine therefrom the following: a plurality of points on the user's skeleton, joints of the hands, wrists, elbows, knees, nose, ankles, shoulders, and where the pelvis meets the spine. Other techniques include: transforming the image into a mannequin representation of the person and transforming the image into a mesh model representation of the person.

In an embodiment, the processing is performed on the capture device 202 itself, and the raw image data of the depth and color (where the capture device 202 includes the 3D camera 26) values is transmitted to the computing environment 212 via the link 36. In another embodiment, the processing is performed by the processor 32 coupled to the camera 402, and the parsed image data is then sent to the computing environment 212. In yet another embodiment, both the raw image data and the parsed image data are sent to the computing environment 212. The computing environment 212 may receive the parsed image data, but it may still receive the raw data for executing the current process or application. For example, if an image of a scene is transmitted over a computer network to another user, the computing environment 212 may transmit the raw data for processing by another computing environment.

Computing environment 212 may use gesture library 192 and gesture profile 205, such as shown in FIG. 4, to interpret movements of the skeletal model and control the application based on the movements. The computing environment 212 may model and display a representation of the user, for example, in the form of an avatar or pointer on a display such as the display device 193. The display device 193 may include a computer monitor, a television screen, or any suitable display device. For example, a camera-controlled computer system may capture user image data and display user feedback mapped to the user's gestures on a television screen. The user feedback may be displayed as an avatar on the screen as shown in fig. 1. The motion of the avatar may be controlled directly by mapping the avatar's movements to the user's movements. The user's gestures may be interpreted to control certain aspects of the application.

According to an example embodiment, a target may be a human target in any pose, such as standing or sitting, a human target with an object, two or more human targets, one or more appendages of one or more human targets, etc., which may be scanned, tracked, modeled, and/or evaluated to: generate a virtual screen, compare the user to one or more stored profiles, and/or store gesture profiles 205 associated with the user in a computing environment, such as computing environment 212. Gesture profile 205 may be user, application, or system specific. Gesture profile 205 may be accessed, for example, via an application, or may be available system-wide. Gesture profile 205 may include a lookup table for loading particular user profile information. The virtual screen may interact with an application that may be executed by the computing environment 212 described above with reference to FIG. 1.

Gesture profile 205 may include user identification data such as a scanned or estimated body size of the target, a skeletal model, a body model, a voice sample or password, a target gender, a target age, previous gestures, target limitations, and a target's standard use of the system such as, for example, a tendency to sit, left-handed or right-handed, or a tendency to stand very close to the capture device. This information may be used to determine whether there is a match between a target in the captured scene and one or more users. If there is a match, the user's gesture profile 205 may be loaded and, in one embodiment, may allow the system to adapt gesture recognition techniques to the user or to adapt other elements of the computing or gaming experience according to the gesture profile 205.

One or more gesture profiles 205 may be stored in computer environment 212 and used in multiple user sessions, or one or more personal profiles may be created for only a single session. The user may have the option of setting up a profile where they may provide information to the system such as voice or body scan, age, personal preferences, right or left handed, avatar, name, etc. Gesture profiles may also be generated or provided for "guests" who do not provide any information to the system other than entering the capture space. A temporary personal profile may be established for one or more visitors. At the end of the guest session, the guest gesture profile may be stored or deleted.

The gesture library 192, gesture recognition engine 190, and gesture profile 205 may be implemented in hardware, software, or a combination of both. For example, gesture library 192 and gesture recognition engine 190 may be implemented as software executing on a processor, such as processor 195, of computing environment 212 shown in FIG. 4, or executing on processing unit 101 of FIG. 7 or processing unit 259 of FIG. 8.

It is emphasized that the block diagrams depicted in fig. 4 and fig. 7 and 8 described below are exemplary and are not intended to imply a particular implementation. Thus, processor 195 or 32 of FIG. 4, processing unit 101 of FIG. 7, and processing unit 259 of FIG. 8 may be implemented as a single processor or multiple processors. The multiple processors may be distributed or centrally located. For example, the gesture library 192 may be implemented as software executing on the processor 32 of the capture device, or it may be implemented as software executing on the processor 195 in the computing environment 212. Any combination of processors suitable for performing the techniques disclosed herein is contemplated. The multiple processors may communicate wirelessly, via hardwiring, or a combination thereof.

The gesture library and filter parameters may be fine-tuned by the gesture tool for the application or the context of the application. The context may be a cultural context and may be an environmental context. Cultural context refers to the culture of the user using the system. Different cultures may use similar gestures to convey distinctly different meanings. For example, a us user who wishes to inform another user of "seeing" or "using his eyes" may place his index finger on his head near the end of his eyes. However, for an italian user, this gesture may be interpreted as a reference to a black-handed party.

Similarly, there may be different contexts among different environments of a single application. Take as an example a first user shooting game involving operating a motorcycle. When the user is walking, a fist making with the fingers towards the ground and extending the fist forward and out of the body may represent a fist gesture. When the user is in a driving context, the same motion may represent a "shift" gesture.

Gestures may be grouped together into a style package of complementary gestures that may be used by an application in that style. Complementary gestures-either complementary as in those commonly used together, or complementary as in a change in a parameter of one gesture will change a parameter of another gesture-can be grouped together into style packets. The packages may be provided to an application, which may select at least one of them. The application may fine tune or modify the gesture or parameters of the gesture filter 191 to best fit the unique aspect of the application. When the parameter is fine-tuned, the second complementary parameter of the gesture or second gesture is also fine-tuned (in an interdependent sense) such that the parameters remain complementary. The style pack for the video game may include styles such as first user shooting, action, driving, and sports.

FIG. 5A depicts an example skeletal mapping of a user that may be generated from the capture device 202. In this embodiment, the individual joints and bones are identified: each hand 502, each forearm 504, each elbow 506, each bicep 508, each shoulder 510, each hip 512, each thigh 514, each knee 516, each calf 518, each foot 520, the head 522, the torso 524, the top 526 and bottom 528 of the spine, and the waist 530. Where more points are tracked, additional features may be identified, such as the bones and joints of the fingers or toes, or individual features of the face, such as the nose and eyes.

The user may create gestures by moving his body. Gestures include movements or gestures of the user, which can be captured as image data and interpreted in its meaning. The gesture may be dynamic, including motion, such as mimicking a pitch. The gesture may be a static gesture such as holding his forearm 504 across in front of a person's torso 524. The gesture may be a single movement (e.g., jumping) or a continuous gesture (e.g., driving), and may be short or long in duration (e.g., driving for 202 minutes). Gestures may also incorporate props, such as by waving a simulated sword. A gesture may include more than one body part, such as clapping hands 502, or a more subtle movement, such as pursing one's lips.

The user's gestures may be used as input in a general computing context. For example, various movements of the hand 502 or other body parts may correspond to common system-level tasks, such as navigating up or down in a hierarchical list, opening a file, closing a file, and saving a file. For example, the user can hold his hand stationary with the fingers pointing up and the palm facing the capture device 202. He may then gather the fingers towards the palm to form a fist, and this may be a gesture indicating that the focus window in the window-based user interface computing environment should be closed. Gestures may also be used in a video game specific context depending on the game. For example, for a driving game, various movements of the hand 502 and foot 520 may correspond to maneuvering the vehicle in a direction, shifting gears, accelerating, and braking. As such, gestures may indicate a wide variety of motions mapped to displayed user representations in a wide variety of applications such as video games, text editors, word processing, data management, and so forth.

The user may generate a gesture corresponding to walking or running by walking or running on the spot by himself. For example, the user may alternatively lift and drop each leg 512 and 520 to simulate walking without moving. The system may resolve the gesture by analyzing each hip 512 and each thigh 514. A step may be identified when one hip-thigh angle (as measured with respect to the vertical, where a standing leg has a hip-thigh angle of 0 ° and a forward horizontally extending leg has a hip-thigh angle of 90 °) exceeds a certain threshold with respect to the other thigh. Walking or running may be identified after a certain number of consecutive steps of alternating legs. The time between two most recent steps can be considered as a period. After not satisfying the threshold angle for a certain number of cycles, the system may determine that the walking or running gesture has stopped.

Given a "walk or run" gesture, an application may set values for parameters associated with the gesture. These parameters may include the threshold angle described above, the number of steps required to initiate a walking or running gesture, the number of cycles of the ending gesture where no steps occur, and a threshold period to determine whether the gesture is walking or running. A fast period may correspond to running, as the user will move his legs quickly, while a slower period may correspond to walking.

The gesture may be initially associated with a set of default parameters that the application may override with its own parameters. In this scenario, the application is not forced to provide parameters, but the application may instead use a set of default parameters that allow gestures to be recognized without application-defined parameters. Information relating to gestures may be stored for the purpose of prerecorded gesture animations.

There are various outputs that may be associated with a gesture. There may be a baseline "yes or no" as to whether a gesture is occurring. There may also be a confidence level corresponding to the likelihood that the user-tracked movement corresponds to a gesture. This may be a linear scale of floating point numbers ranging between 0 and 1, inclusive. In applications where the application receiving the gesture information cannot accept false positives as input, it may use only recognized gestures with a high confidence level, such as at least 0.95. In the case where an application must recognize every instance of a gesture, it may use gestures that have at least a much lower confidence level, even at the expense of false positives, such as those that are only greater than 0.2. A gesture may have an output of the time between the two most recent steps, and in the case where only the first step is registered, this may be set to a reserved value, such as-1 (since the time between any two steps must be positive). The gesture may also have an output regarding the highest thigh angle reached during the last step.

Another exemplary posture is "heel lift off. In this pose, the user may create the pose by lifting his heels off the ground, but keeping his toes grounded. Alternatively, the user may jump into the air with his feet 520 completely off the ground. The system can resolve the skeleton of the gesture by analyzing the angular relationship of the shoulder 510, hip 512, and knee 516 to see if they are in an aligned position equal to upright. These points and the upper 526 and lower 528 vertebral points can then be monitored for any upward acceleration. A sufficient combination of accelerations may trigger a jump gesture. A sufficient combination of acceleration and a certain posture may satisfy the parameters of the transition point.

Given this "heel-lift jump" gesture, the application may set a parameter setting associated with the gesture. The parameters may include the above acceleration thresholds that determine how quickly some combination of the user's shoulder 510, hip 512, and knee 516 must move upward to trigger the gesture; and includes the maximum alignment angle between the shoulder 510, hip 512 and knee 516 that still triggers jumping. The output may include a confidence level, and a body angle of the user at the time of the jump.

Setting parameters for gestures based on the details of the application that will receive the gesture is important to accurately identify the gesture. Correctly identifying gestures and the user's intent greatly helps to create a positive user experience.

The application may set values for parameters associated with various transition points to identify points at which to use the pre-recorded animation. The transition point may be defined by various parameters, such as the identity of a particular gesture, the velocity, the angle of a target or object, or any combination thereof. If the transition point is defined, at least in part, by the identification of a particular gesture, then correctly identifying the gesture helps to increase the level of confidence that the parameters of the transition point have been satisfied.

Another parameter for a gesture may be the distance moved. Where the user's gestures control the motion of an avatar in the virtual environment, the avatar may be arm off-ball length. If the user wishes to interact with the ball and grasp it, this may require the user to extend his arm 502 and 510 to full length while making a grasping gesture. In this case, a similar grasping gesture where the user only partially stretches his arm 502 along with 510 may not reach the result of interacting with the ball. Similarly, the parameter of the transition point may be an identification of a grip gesture, wherein if the user only partially stretches his arm 502 and 510, such that the result of interacting with the ball is not achieved, the user's gesture will not satisfy the parameter of the transition point.

A gesture or a portion thereof may take as a parameter the volume of space in which it must occur. Where the gesture includes body movement, the volume of space is typically expressible relative to the body. For example, an american football throwing gesture for a right-handed user may only be recognized in the volume of space no lower than the right shoulder 510a and on the same side of the head 522 as the throwing arm 502a-310 a. It may not be necessary to define all of the boundaries of a volume of space, as for the throwing gesture, where the boundary outward from the body remains undefined, and the volume extends outward indefinitely, or to the edge of the scene being monitored.

FIG. 5B provides further details of one exemplary embodiment of the gesture recognizer engine 190 of FIG. 4. As shown, the gesture recognizer engine 190 may include at least one filter 519 for determining one or more gestures. The filter 519 includes information defining a gesture 526 (hereinafter "gesture"), and may include, for example, at least one parameter 528 or metadata for the gesture 526. For example, a motion that includes one hand from behind the back of the body across the front of the body may be implemented as a gesture 526 that includes information representing movement of the user's one hand from behind the back of the body across the front of the body, which movement is to be captured by the depth camera. The parameters 528 of the gesture 526 may then be set. Where the gesture 526 is a throw, the parameters 528 may be a threshold speed that the hand must reach, a distance that the hand must travel (absolute, or relative to the overall size of the user), and a confidence level that the gesture 526 rated by the recognizer engine 190 occurred. These parameters 528 of the gesture 526 may vary from application to application, from context to context of a single application, or over time within one context of one application.

The filters may be modular or interchangeable. In one embodiment, a filter has a plurality of inputs, each of the inputs having a type, and a plurality of outputs, each of the outputs having a type. In this case, the first filter may be replaced with a second filter having the same number and type of inputs and outputs as the first filter without having to alter other aspects of the recognizer engine 190 architecture. For example, there may be a first filter for driving that takes skeletal data as input and outputs the confidence and steering angle that the gesture 526 associated with that filter is occurring. In the case where it is desired to replace the first driven filter with a second driven filter (which may be because the second driven filter is more efficient and requires less processing resources), this can be done by simply replacing the first filter with the second filter, as long as the second filter has the same inputs and outputs — one input for the skeletal data type, and two outputs for the confidence type and the angle type.

The filter need not have parameter 528. For example, a "user height" filter that returns the height of the user may not allow any parameters that may be trimmed. An alternative "user height" filter may have trimmable parameters, such as whether to consider the user's shoes, hairstyle, headwear, and posture in determining the user's height.

The input to the filter may include such things as joint data about the user's joint position, like the angle formed by the bones that meet at the joint, RGB color data from the scene, and the rate of change of some aspect of the user. The output from the filter may include such things as the confidence that a given gesture is being made, the speed at which the gesture motion is made, and the time at which the gesture motion is made.

The gesture recognizer engine 190 may have a base recognizer engine 517 that provides functionality to a gesture filter 519. In one embodiment, the functions performed by recognizer engine 517 include: an input-over-time archive that tracks recognized gestures and other inputs, a hidden markov model implementation (where the modeling system is assumed to be a markov process with unknown parameters-where the current state encapsulates any past state information needed to determine the future state, so there is no need to maintain a process of any other past state information for this purpose, and hidden parameters are determined from observable data), and other functions needed to solve a particular instance of gesture recognition.

Base recognizer engine 517 may include gesture profile 520. For example, base recognizer engine 517 may temporarily load gesture profile 520 into the gesture recognition engine for the user, store gesture profile 520 with gesture filter information, or otherwise access gesture profile 520 from a remote location. Gesture profile 520 may provide parameters that are adaptive to the information in filter 519 to correspond to a particular user. For example, as described above, the gesture 526 may be a throw with parameters of a threshold speed or distance that the hand must travel. Gesture profile 520 may redefine the threshold speed or distance that the hand must travel for throwing gesture 526. Thus, base recognizer engine 517 may supplement or replace parameters in filter 519 with parameters from gesture profile 520. The filter 519 may be default gesture information and the gesture profile 520 may be loaded specifically for a particular user.

The filters 519 are loaded and implemented on top of the base recognizer engine 517 and may utilize the services that the engine 517 provides to all filters 519. In one embodiment, the base recognizer engine 517 processes the received data to determine if it meets the requirements of any of the filters 519. Since these services, such as parsing the input, are provided by the base recognizer engine 517 at one time rather than by each filter 519, this service need only be processed once over a period of time rather than once per filter 519 over the period of time, thereby reducing the processing required to determine gestures.

The application may use the filter 519 provided by the recognizer engine 190 or it may provide its own filter 519 which is inserted into the base recognizer engine 517. Similarly, the gesture profile may be inserted into base recognizer engine 517. In one embodiment, all filters 519 have a common interface that enables this plug-in feature. Moreover, all filters 519 may utilize parameters 528, so a single gesture tool, as described below, may be used to debug and fine tune the entire filter system 519.

These parameters 528 may be adjusted by the gesture tool 521 for the application or the context of the application. In one embodiment, the gesture tool 521 includes a plurality of sliders 523, each slider 523 corresponding to a parameter 528, and a representation of the body 524. When the parameters 528 are adjusted with the respective sliders 523, the body 524 may exhibit actions that will be recognized as gestures with the parameters 528 and actions that will not be recognized as gestures with the parameters 528, as identified. This visualization of the parameters 528 of the gesture provides an effective means of debugging and fine-tuning the gesture.

FIG. 6 depicts an example flow diagram of a method of establishing a shared presentation experience for multiple users. For example, a system 200, 300 such as that shown in FIGS. 1-3 may perform the operations shown herein.

At 602, the system may present an information presentation. As described above, a presentation of information may include any productivity scenario in which information is presented, where the presentation may take on a variety of formats. At 604, the system captures data from a physical space that includes a target, such as a user or a non-human object. As described above, a capture device may capture data of a scene, such as a depth image of the scene, and scan for targets in the scene. The capture device may determine whether one or more targets in the scene correspond to a human target, such as a user. For example, to determine whether a target or object in a scene corresponds to a human target, each target may be flood filled and compared to the pattern of the human model. Each target or object that matches the human body model may then be scanned to generate a skeletal model associated therewith. For example, a target identified as a human may be scanned to generate a skeletal model associated therewith. The skeletal model may then be provided to a computing environment to track the skeletal model and present a visual representation associated with the skeletal model.

Features of a target in a physical space may be detected using any known technique or techniques disclosed herein that provide the ability to scan known/unknown objects, scan humans, and scan background aspects (e.g., floors, walls) in a scene. The scanned data for each object, including a combination of depth and RGB data, may be used to create a three-dimensional model of the object. The RGB data is applied to the corresponding regions of the model. Temporal tracking between frames can improve confidence and adapt object data in real time. Thus, object characteristics and tracking of changes in object characteristics over time can be used to reliably track objects whose position and orientation change between frames in real time. The capture device captures data at an interactive rate, thereby improving the fidelity of the data and allowing the disclosed techniques to process raw depth data, digitize objects in the scene, extract the surface and texture of the objects, and perform any of these techniques in real-time so that the display can provide a real-time depiction of the scene. Further, multiple capture devices may capture data of the physical space. The data may be merged such that the fidelity of gesture recognition is increased, wherein the recognition is based on the additional data. The capture device may be focused on a single user or may capture data about many users. If there are multiple capture devices that can share data, a second capture device in the physical space can capture the user's data if the first capture device does not have a view or does not have a good view of the user.

At 606, the system may identify each user in the physical space and associate each user with a visual representation at 614. At 608, the system may specify a level of control for each user, where control is accomplished via gestures in physical space. For example, the user may be a primary user, a secondary user, or an observing user. Depending on the gesture and the user performing the gesture, the gesture may control aspects of the information presentation. At 610, the system may determine whether the user performed a gesture and use the gesture to control the program at 612. For example, a gesture includes a user's posture or motion, which can be captured as image data and its meaning interpreted. At 610, the parsed image data may be filtered by, for example, a gesture recognition engine to determine whether to perform a gesture. Thus, via a gesture-based system, presentation of information may be controlled by multiple users. Control may be shared, transferred, etc. for various participants of the presentation.

At 614, a visual representation may be associated with each user and at 616, the system may animate the visual representation to correspond to the gesture or the control derived from the gesture. The visual representation may be associated with more than one user, or each user may have a unique visual representation. For example, if multiple users are associated with the same visual representation, the system may transfer control between the users.

At 618, if the presentation of the information is non-sequential, as described above, a gesture may control an aspect of the non-sequential information. Thus, the gesture is applicable to user selection of a desired portion of non-sequential information. The display of the selected portion may provide a conversion of a canvas of non-sequential information to a focused portion of such canvas. The user may navigate through the canvas to change the focused portion of the assets available in the presentation.

The computer-executable instructions may include instructions for establishing a shared presentation experience and transferring control between users, as described herein. Any of the methods described herein for sharing a presentation experience via gestures may be implemented as computer-executable instructions.

FIG. 7 illustrates an example embodiment of a computing environment that may be used to interpret one or more gestures in a target recognition, analysis, and tracking system. The computing environment such as computing environment 212 described above with reference to FIG. 1 may be a multimedia console 100, such as a gaming console. As shown in FIG. 7, the multimedia console 100 has a Central Processing Unit (CPU) 101 having a primary cache 102, a secondary cache 104, and a flash ROM (read Only memory) 106. The level one cache 102 and the level two cache 104 temporarily store data and thus reduce the number of memory access cycles, thereby improving processing speed and throughput. The CPU 101 may be provided with more than one core and thus with additional level one caches 102 and level two caches 104. The flash ROM 106 may store executable code that is loaded during an initial phase of a boot process when the multimedia console 100 is powered ON.

A Graphics Processing Unit (GPU) 108 and a video encoder/video codec (coder/decoder) 114 form a video processing pipeline for high speed and high resolution graphics processing. Data is carried from the graphics processing unit 108 to the video encoder/video codec 114 via a bus. The video processing pipeline outputs data to an a/V (audio/video) port 140 for transmission to a television or other display. A memory controller 110 is connected to the GPU 108 to facilitate processor access to various types of memory 112, such as, but not limited to, RAM (random access memory).

The multimedia console 100 includes an I/O controller 120, a system management controller 122, an audio processing unit 123, a network interface controller 124, a first USB host controller 126, a second USB controller 128, and a front panel I/O subassembly 130 that are preferably implemented on a module 118. The USB controllers 126 and 128 serve as hosts for peripheral controllers 142(1) -142(2), a wireless adapter 148, and an external memory device 146 (e.g., flash memory, external CD/DVD ROM drive, removable media, etc.). The network interface 124 and/or wireless adapter 148 provide access to a network (e.g., the Internet, home network, etc.) and may be any of a wide variety of various wired or wireless adapter components including an Ethernet card, a modem, a Bluetooth module, a cable modem, and the like.

System memory 143 is provided to store application data that is loaded during the boot process. A media drive 144 is provided and may comprise a DVD/CD drive, hard drive, or other removable media drive, among others. The media drive 144 may be internal or external to the multimedia console 100. Application data may be accessed via the media drive 144 for execution, playback, etc. by the multimedia console 100. The media drive 144 is connected to the I/O controller 2120 via a bus, such as a Serial ATA bus or other high speed connection (e.g., IEEE 1394).

The system management controller 122 provides various service functions related to ensuring availability of the multimedia console 100. The audio processing unit 123 and the audio codec 132 form a corresponding audio processing pipeline with high fidelity and stereo processing. Audio data is transmitted between the audio processing unit 123 and the audio codec 132 via a communication link. The audio processing pipeline outputs data to the A/V port 140 for reproduction by an external audio player or device having audio capabilities.

The front panel I/O subassembly 130 supports the functionality of the power button 150 and the eject button 152, as well as any LEDs (light emitting diodes) or other indicators exposed on the outer surface of the multimedia console 100. The system power supply module 136 provides power to the components of the multimedia console 100. A fan 138 cools the circuitry within the multimedia console 100.

The CPU 101, GPU 108, memory controller 110, and various other components within the multimedia console 100 are interconnected via one or more buses, including serial and parallel buses, a memory bus, a peripheral bus, and a processor or local bus using any of a variety of bus architectures. By way of example, these architectures may include a Peripheral Component Interconnect (PCI) bus, a PCI-Express bus, and the like.

When the multimedia console 100 is powered ON, application data may be loaded from the system memory 143 into memory 112 and/or caches 102, 104, and executed on the CPU 101. The application may present a graphical user interface that provides a consistent user experience when navigating to different media types available on the multimedia console 100. In operation, applications and/or other media contained within the media drive 144 may be launched or played from the media drive 144 to provide additional functionalities to the multimedia console 100.

The multimedia console 100 may be operated as a standalone system by simply connecting the system to a television or other display. In the standalone mode, the multimedia console 100 allows one or more users to interact with the system, watch movies, or listen to music. However, with the integration of broadband connectivity made available through the network interface 2124 or the wireless adapter 148, the multimedia console 100 may further be operated as a participant in a larger network community.

When the multimedia console 100 is powered on, a set amount of hardware resources may be reserved for use by the multimedia console operating system as a system. These resources may include a reserve of memory (such as 16 MB), a reserve of CPU and GPU cycles (such as 5%), a reserve of network bandwidth (such as 8 kbs), and so on. Because these resources are reserved at system boot time, the reserved resources are not present from an application perspective.

In particular, the memory reservation is preferably large enough to contain the launch kernel, concurrent system applications and drivers. The CPU reservation is preferably constant such that if the reserved CPU usage is not used by the system applications, the idle thread will consume any unused cycles.

For the GPU reservation, lightweight messages generated by the system applications (e.g., popups) are displayed by GPU interrupts that use scheduling code to render popup into an overlay. The amount of memory required for the overlay depends on the overlay area size, and the overlay preferably scales with the screen resolution. Where the concurrent system application uses a full user interface, it is preferable to use a resolution that is independent of the application resolution. A scaler may be used to set this resolution so that there is no need to change the frequency and cause a TV resynch.

After the multimedia console 100 boots and system resources are reserved, concurrent system applications execute to provide system functionality. The system functions are encapsulated in a set of system applications that execute within the reserved system resources described above. The operating system kernel identifies threads that are system application threads as opposed to threads that are game application threads. The system applications are preferably scheduled to run on the CPU 101 at predetermined times and intervals in order to provide a consistent view of system resources to the application. The scheduling is to minimize cache disruption for the gaming application running on the console.

When the concurrent system application requires audio, audio processing is asynchronously scheduled to the gaming application due to time sensitivity. A multimedia console application manager (described below) controls the audio level (e.g., mute, attenuate) of the gaming application while system applications are active.

The input devices (e.g., controllers 142(1) and 142 (2)) are shared by the gaming application and the system application. Rather than reserving resources, the input devices are switched between the system application and the gaming application so that each has a focus of the device. The application manager preferably controls the switching of input stream without knowledge of the gaming application's knowledge, and the driver maintains state information regarding focus switches. The cameras 26, 28 and capture device 202 may define additional input devices for the console 100.

FIG. 8 illustrates another example embodiment of a computing environment 220 that may be used to interpret one or more gestures in a target recognition, analysis, and tracking system, which may be the computing environment 212 shown in FIG. 1. The computing system environment 220 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the presently disclosed subject matter. Neither should the computing environment 220 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 220. In some embodiments, the various depicted computing elements may include circuitry configured to instantiate specific aspects of the present disclosure. For example, the term "circuitry" as used in this disclosure may include dedicated hardware components configured to perform functions through firmware or switches. In other examples, the term circuitry may include a general purpose processing unit, memory, etc., configured by software instructions that implement logic that may be used to perform functions. In example embodiments where circuitry includes a combination of hardware and software, an implementer may write source code embodying logic and the source code can be compiled into machine readable code that can be processed by the general purpose processing unit. Because those skilled in the art will appreciate that the prior art has evolved to the point where there is little difference between hardware, software, or a combination of hardware/software, the selection of hardware or software to implement a particular function is a design choice left to the implementer. More specifically, those skilled in the art will appreciate that a software process can be transformed into an equivalent hardware structure, and a hardware structure can itself be transformed into an equivalent software process. Thus, the choice of hardware or software implementation is one of design choice and left to the implementer.

In FIG. 8, computing environment 220 includes a computer 241, computer 241 typically including a variety of computer-readable media. Computer readable media can be any available media that can be accessed by computer 241 and includes both volatile and nonvolatile media, removable and non-removable media. The system memory 222 includes computer storage media in the form of volatile and/or nonvolatile memory such as Read Only Memory (ROM) 223 and Random Access Memory (RAM) 261. A basic input/output system 224 (BIOS), containing the basic routines that help to transfer information between elements within computer 241, such as during start-up, is typically stored in ROM 223. RAM 261 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 259. By way of example, and not limitation, FIG. 8 illustrates operating system 225, application programs 226, other program modules 227, and program data 228.

The computer 241 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 8 illustrates a hard disk drive 238 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 239 that reads from or writes to a removable, nonvolatile magnetic disk 254, and an optical disk drive 240 that reads from or writes to a removable, nonvolatile optical disk 253 such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 238 is typically connected to the system bus 221 through a non-removable memory interface such as interface 234, and magnetic disk drive 239 and optical disk drive 240 are typically connected to the system bus 221 by a removable memory interface, such as interface 235.

The drives and their associated computer storage media discussed above and illustrated in FIG. 8, provide storage of computer readable instructions, data structures, program modules and other data for the computer 241. In FIG. 8, for example, hard disk drive 238 is illustrated as storing operating system 258, application programs 257, other program modules 256, and program data 255. Note that these components can either be the same as or different from operating system 225, application programs 226, other program modules 227, and program data 228. Operating system 258, application programs 257, other program modules 256, and program data 255 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 241 through input devices such as a keyboard 251 and pointing device 252, commonly referred to as a mouse, trackball or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 259 through a user input interface 236 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a Universal Serial Bus (USB). The cameras 26, 28 and capture device 202 may define additional input devices for the console 100. A monitor 242 or other type of display device is also connected to the system bus 221 via an interface, such as a video interface 232. In addition to the monitor, computers may also include other peripheral output devices such as speakers 244 and printer 243, which may be connected through an output peripheral interface 233.

The computer 241 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 246. The remote computer 246 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 241, although only a memory storage device 247 has been illustrated in fig. 8. The logical connections depicted in FIG. 8 include a Local Area Network (LAN) 245 and a Wide Area Network (WAN) 249, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.

When used in a LAN networking environment, the computer 241 is connected to the LAN 245 through a network interface or adapter 237. When used in a WAN networking environment, the computer 241 typically includes a modem 250 or other means for establishing communications over the WAN 249, such as the Internet. The modem 250, which may be internal or external, may be connected to the system bus 221 via the user input interface 236, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 241, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 8 illustrates remote application programs 248 as residing on memory device 247. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.

It will be appreciated that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated may be performed in the sequence illustrated, in other sequences, in parallel, and so forth. Also, the order of the above-described processes may be changed.

Moreover, while the invention has been described in connection with certain aspects as illustrated in the various figures, it is to be understood that other similar aspects may be used or modifications and additions may be made to the described aspects for performing the same function of the present invention without deviating therefrom. The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof. Thus, the methods and apparatus of the disclosed embodiments, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium. When the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus configured to practice the disclosed embodiments.

In addition to the specific implementations explicitly set forth herein, other aspects and implementations will be apparent to those skilled in the art from consideration of the specification disclosed herein. Accordingly, the present invention should not be limited to any single aspect, but rather should be construed in breadth and scope in accordance with the appended claims. For example, the various processes described herein may be implemented in hardware or software, or a combination of both.

Claims

1. A method for establishing a shared presentation experience, the method comprising:

presenting the demonstration of the information;

capturing data of a physical space, wherein the captured data represents gestures of a plurality of users;

determining an identity of a first user of the plurality of users based on the captured data of the physical space;

accessing a user profile of the first user based on the identity of the first user, wherein the user profile provides an indication of a level of access to the presentation by the first user; and

the gestures are recognized from the captured data, wherein each gesture is adapted to control an aspect of the presentation such that the plurality of users share control of the presentation via the gestures.

2. The method of claim 1, wherein at least one of the plurality of users is designated as at least one of a primary user, a secondary user, or an observing user.

3. The method of claim 1, wherein the captured data is a union of data in the physical space captured by multiple capture devices.

4. The method of claim 1, wherein at least one of the gestures is a trigger for information collection regarding the gesture, wherein the information collection includes passive collection of information regarding the gesture and recording the information for subsequent access.

5. The method of claim 4, wherein the collected information is provided to at least one of the plurality of users in real-time or as output to a display.

6. The method of claim 1, wherein primary control of the presentation can be transferred between the plurality of users.

7. A method for presenting a visual representation in a shared presentation experience, the method comprising:

presenting the demonstration of the information;

capturing data of a physical space, wherein the captured data represents a plurality of users in the physical space;

presenting at least one visual representation corresponding to each of the plurality of users, wherein the at least one visual representation is adapted to interact with the presented information presentation.

8. The method of claim 7, wherein the at least one visual representation is adapted to interact with the presented presentation of information via an animation that interacts with a portion of the presentation.

9. The method of claim 8, wherein the animation of the at least one visual representation corresponds to a gesture of at least one user of the plurality of users.

10. The method of claim 7, wherein at least one of the plurality of users corresponds to:

a different visual representation, or

The same visual representation.

11. The method of claim 7, wherein the at least one visual representation corresponds to a feature of a user detected by a capture device.

12. The method of claim 7, wherein control of the presentation by at least one of the plurality of users is indicated by a feature corresponding to a visual representation of the at least one of the plurality of users.

13. A method for establishing a non-sequential presentation experience, the method comprising:

assembling a plurality of information for presentation;

presenting a non-sequential presentation of the information, wherein the presentation comprises an information format adapted for progressive navigation of the information;

a gesture is identified that controls an aspect of the non-sequential presentation.

14. The method of claim 13, wherein a plurality of users share control of the non-sequential presentation via gestures.

15. The method of claim 13, wherein the display of the non-sequential presentation is adapted to select at least a portion of the non-sequential presentation from an asset canvas.