CN102047203B

CN102047203B - Pose-based control using 3D information extracted within an extended depth of field

Info

Publication number: CN102047203B
Application number: CN200980120542.XA
Authority: CN
Inventors: 皮埃尔·St·希莱尔; 约翰·S·昂德科夫勒
Original assignee: Oblong Industries Inc
Current assignee: Oblong Industries Inc
Priority date: 2008-04-02
Filing date: 2009-04-02
Publication date: 2016-08-17
Anticipated expiration: 2029-04-02
Also published as: EP2266016A2; JP5697590B2; CN102047203A; WO2009124181A2; EP2266016A4; WO2009124181A3; JP2011523112A; KR20100136993A; KR101550478B1

Abstract

Systems and methods for gesture-based control using three-dimensional information extracted over an extended depth of field are described. The system includes a plurality of optical detectors coupled to at least one processor. The plurality of optical detectors images the body. At least two of the plurality of optical detectors comprise wavefront coded cameras. The processor automatically detects a posture of the body, wherein the posture includes an instantaneous state of the body. The detecting includes aggregating pose data for the pose for only a moment and excluding use of background data in the detecting of the pose. The pose data includes focus-resolved position data of the body relative to a neutral position as an absolute position and orientation in space, wherein the position data is three-dimensional information. The processor translates the gesture into a gesture signal and uses the gesture signal to control a component coupled to the processor.

Description

Pose-based control using 3D information extracted within an extended depth of field

相关申请related application

本申请是2006年2月8日提交的美国专利申请号11/350,697的部分继续申请。This application is a continuation-in-part of US Patent Application Serial No. 11/350,697, filed February 8,2006.

本申请要求2008年4月2日提交的美国专利申请号61/041,892的优先权。This application claims priority to US Patent Application No. 61/041,892, filed April 2, 2008.

本申请是2008年4月24日提交的美国专利申请号12/109,263的部分继续申请。This application is a continuation-in-part of US Patent Application Serial No. 12/109,263 filed April 24,2008.

本申请要求2008年10月14日提交的美国专利申请号61/105,243的优先权。This application claims priority to US Patent Application No. 61/105,243, filed October 14, 2008.

本申请要求2008年10月14日提交的美国专利申请号61/105,253的优先权。This application claims priority to US Patent Application No. 61/105,253, filed October 14, 2008.

技术领域technical field

本发明概括地说涉及计算机系统领域，具体地说涉及通过在扩展的景深内提取三维信息来进行基于姿态的控制的系统和方法。The present invention relates generally to the field of computer systems, and more particularly to systems and methods for gesture-based control by extracting three-dimensional information within an extended depth of field.

背景技术Background technique

当在成像系统中在扩展的景深内提取三维信息时，距场景中一点的距离可根据它在被同时捕捉到的两个或更多图像中的位置来估算出。当图像之间的三维(3D)关系已知时，该点的3D位置可根据基本几何关系来计算出。根据多个图像计算空间位置所面临的挑战(常称为立体相关或立体深度计算)是将一点在一个图像中的映射与它在另一个图像中的映射自动和准确地相关联。这最通常是通过使从一个图像到一个或多个其他图像的图像特征相关来完成的。然而，所有立体匹配方法中的基本假设是：图像中必须存在某个可标识的局部对比度或特征，以便将该点与它在另一个图像中的位置相匹配。因此，当由于散焦而在图像中没有局部对比度或特征时，会产生问题——立体匹配在图像的焦点没对准的区域内不产生准确的结果。When extracting three-dimensional information over an extended depth of field in an imaging system, the distance to a point in the scene can be estimated from its position in two or more images that are captured simultaneously. When the three-dimensional (3D) relationship between images is known, the 3D position of the point can be calculated from the underlying geometric relationship. The challenge in computing spatial location from multiple images (often referred to as stereo correlation or stereo depth computation) is to automatically and accurately relate the mapping of a point in one image to its mapping in another image. This is most commonly done by correlating image features from one image to one or more other images. However, a fundamental assumption in all stereo matching methods is that there must be some identifiable local contrast or feature in an image in order to match a point with its location in another image. Therefore, when there is no local contrast or features in the image due to defocus, a problem arises - stereo matching does not produce accurate results in areas of the image that are out of focus.

用于扩展图像焦深的常规手段是减小相机镜头光瞳的直径(“缩小”)。然而，两个副作用限制了该技术的实用性。首先，成像系统的灵敏度降低了与光瞳内外径比的平方相等的倍数。其次，最大空间频率响应降低了与光瞳内外径比相等的倍数，这限制了图像中的分辨率和对比度。因此，在常规成像系统中存在景深、曝光时间和总体对比度之间的权衡。在多相机测距系统的情况下，净效应将是立体深度准确度与工作范围之间的折衷。A conventional means for extending the depth of focus of an image is to reduce the diameter of the camera lens pupil ("zoom out"). However, two side effects limit the utility of this technique. First, the sensitivity of the imaging system is reduced by a factor equal to the square of the ratio of the inner and outer diameters of the pupil. Second, the maximum spatial frequency response is reduced by a factor equal to the ratio of the inner and outer pupil diameters, which limits resolution and contrast in the image. Therefore, there is a trade-off between depth of field, exposure time, and overall contrast in conventional imaging systems. In the case of a multi-camera ranging system, the net effect will be a tradeoff between stereo depth accuracy and working range.

通过引用的合并incorporation by reference

将本说明书中提到的每个专利、专利申请和/或出版物通过引用而整体合并于此，就如同特意地且单独地表明将每个专利、专利申请和/或出版物通过引用而合并一样。Each patent, patent application and/or publication mentioned in this specification is hereby incorporated by reference in its entirety as if each patent, patent application and/or publication were specifically and individually indicated to be incorporated by reference Same.

附图说明Description of drawings

图1是本发明的系统的一个实施例的图；Figure 1 is a diagram of one embodiment of the system of the present invention;

图2是本发明的标记标签的一个实施例的图；Figure 2 is a diagram of one embodiment of the marking label of the present invention;

图3是本发明的一个实施例中的姿态词汇中的姿势的图；Figure 3 is a diagram of gestures in a gesture vocabulary in one embodiment of the invention;

图4是本发明的一个实施例中的姿态词汇中的取向的图；Figure 4 is a diagram of orientations in a gesture vocabulary in one embodiment of the invention;

图5是本发明的一个实施例中的姿态词汇中的双手组合的图；Figure 5 is a diagram of the combination of both hands in the gesture vocabulary in one embodiment of the present invention;

图6是本发明的一个实施例中的姿态词汇中的取向混合的图；Figure 6 is a graph of orientation blending in a gesture vocabulary in one embodiment of the invention;

图7是图示了本发明的系统的一个实施例中的操作的流程图；Figure 7 is a flowchart illustrating operation in one embodiment of the system of the present invention;

图8是该系统的一个实施例中的命令的例子；Figure 8 is an example of a command in one embodiment of the system;

图9是一个实施例下的、在扩展的景深内提取三维信息的基于姿态的控制系统的框图；9 is a block diagram of a gesture-based control system for extracting three-dimensional information within an extended depth of field, under one embodiment;

图10是一个实施例下的、在基于姿态的控制系统中使用的波前编码成像系统的框图；Figure 10 is a block diagram of a wavefront encoded imaging system for use in an attitude-based control system, under one embodiment;

图11是一个实施例下的基于姿态的控制系统的框图，该系统采用包括两个波前编码相机的波前编码成像系统在扩展的景深内提取三维信息；11 is a block diagram of an attitude-based control system, under one embodiment, that employs a wavefront-encoded imaging system including two wavefront-encoded cameras to extract three-dimensional information over an extended depth of field;

图12是一个实施例下的、使用在扩展的景深内提取的三维信息进行的基于姿态的控制的流程图；Figure 12 is a flowchart of gesture-based control using 3D information extracted within an extended depth of field, under one embodiment;

图13是一个实施例下的、在基于姿态的控制系统中使用的波前编码设计过程的框图。Figure 13 is a block diagram of a wavefront code design process for use in a gesture-based control system, under one embodiment.

发明内容Contents of the invention

下面描述使用在扩展的景深内提取的三维信息进行基于姿态的控制的系统和方法。一种系统包括与至少一个处理器耦合的多个光学检测器。该多个光学检测器对身体成像。该多个光学检测器中的至少两个光学检测器包括波前编码相机。该处理器自动检测身体的姿态，其中该姿态包括身体的瞬时状态。该检测包括仅聚集一瞬间的该姿态的姿态数据并且在对该姿态的检测中排除对背景数据的使用。该姿态数据包括身体相对于作为空间中的绝对位置和取向的中性位置的焦点分辨位置数据，其中该位置数据是三维信息。该处理器将该姿态翻译成姿态信号，并使用该姿态信号来控制与该处理器耦合的部件。Systems and methods for gesture-based control using three-dimensional information extracted within an extended depth of field are described below. A system includes a plurality of optical detectors coupled to at least one processor. The plurality of optical detectors images the body. At least two optical detectors of the plurality of optical detectors include wavefront encoded cameras. The processor automatically detects a pose of the body, wherein the pose includes a momentary state of the body. The detection involves gathering gesture data for only a moment of the gesture and excluding the use of background data in the detection of the gesture. The pose data includes focus-resolved position data of the body relative to a neutral position which is an absolute position and orientation in space, wherein the position data is three-dimensional information. The processor translates the gesture into a gesture signal and uses the gesture signal to control components coupled to the processor.

一个种方法包括利用成像系统对身体成像，其中该成像包括产生身体的波前编码图像。该方法自动检测身体的姿态，其中该姿态包括身体的瞬时状态。该检测包括仅聚集一瞬间的该姿态的姿态数据并且在对该姿态的检测中排除对背景数据的使用。该姿态数据包括身体相对于作为空间中的绝对位置和取向的中性位置的焦点分辨位置数据，其中该位置数据是三维信息。该方法包括将该姿态翻译成姿态信号，并响应于该姿态信号而控制与计算机耦合的部件。One method includes imaging a body with an imaging system, where the imaging includes generating a wavefront encoded image of the body. The method automatically detects the pose of the body, where the pose includes the instantaneous state of the body. The detection involves gathering gesture data for only a moment of the gesture and excluding the use of background data in the detection of the gesture. The pose data includes focus-resolved position data of the body relative to a neutral position which is an absolute position and orientation in space, wherein the position data is three-dimensional information. The method includes translating the gesture into a gesture signal, and controlling components coupled to the computer in response to the gesture signal.

在下面的描述中，详细描述了多个特征，以便提供对这里描述的实施例的更彻底的理解。显然，本发明可以在没有这些具体细节的情况下实施。In the following description, various features are described in detail in order to provide a more thorough understanding of the embodiments described herein. It is evident that the present invention may be practiced without these specific details.

具体实施方式detailed description

系统system

在图1中示出了本发明的一个实施例的框图。用户将手101和102放在相机阵列104A-104D的观察区域内。相机检测手指及手101和102的定位、取向和移动，并产生输出信号给预处理器105。预处理器105将相机输出翻译成姿态信号，姿态信号被提供给系统的计算机处理单元 107。计算机107使用输入信息来产生命令以控制一个或多个屏上光标，并将视频输出提供给显示器103。A block diagram of one embodiment of the invention is shown in FIG. 1 . A user places hands 101 and 102 within the field of view of camera arrays 104A-104D. The camera detects the position, orientation and movement of the fingers and hands 101 and 102 and generates output signals to the pre-processor 105 . The pre-processor 105 translates the camera output into attitude signals, which are provided to the computer processing unit 107 of the system. Computer 107 uses the input information to generate commands to control one or more on-screen cursors and provides video output to display 103 .

尽管系统是以单个用户的手作为输入而示出的，本发明也可采用多个用户来实现。另外，代替手或除了手以外，系统可跟踪用户身体的任何一个或多个部分，包括头、脚、腿、臂、肘、膝等。Although the system is shown with a single user's hand as input, the invention can also be implemented with multiple users. Additionally, instead of or in addition to hands, the system may track any one or more parts of the user's body, including head, feet, legs, arms, elbows, knees, and the like.

在所示实施例中，采用四个相机来检测用户的手101和102的定位、取向和移动。应该理解，在不脱离本发明的范围或精神的情况下，本发明对更多或更少的相机同样适用。另外，尽管在示例实施例中相机是对称布置的，但在本发明中这种对称不是必需的。在本发明中，可以使用允许用户的手的定位、取向和移动的任何数目或定位的相机。In the illustrated embodiment, four cameras are employed to detect the position, orientation and movement of the user's hands 101 and 102 . It should be understood that the invention is equally applicable to more or fewer cameras without departing from the scope or spirit of the invention. Also, although the cameras are arranged symmetrically in the example embodiment, such symmetry is not required in the present invention. In the present invention, any number or positioning of cameras that allows for the positioning, orientation and movement of the user's hand may be used.

在本发明的一个实施例中，所用相机是能够捕捉灰阶图像的运动捕捉相机。在一个实施例中，所用相机是由Vicon制造的相机，如Vicon MX40相机。此相机包括相机上处理并且能够进行每秒1000帧的图像捕捉。运动捕捉相机能够检测并定位标记。In one embodiment of the invention, the camera used is a motion capture camera capable of capturing grayscale images. In one embodiment, the camera used is a camera manufactured by Vicon, such as the Vicon MX40 camera. This camera includes on-camera processing and is capable of image capture at 1000 frames per second. Motion capture cameras are able to detect and locate markers.

在所描述的实施例中，相机用于光检测。在其他实施例中，相机或其他检测器可用于电磁的、静磁的、RFID或任何其他合适类型的检测。In the described embodiment, a camera is used for light detection. In other embodiments, cameras or other detectors may be used for electromagnetic, magnetostatic, RFID, or any other suitable type of detection.

预处理器105用来产生三维空间点重构和骨点标签设定。姿态翻译器106用来将3D空间信息和标记运动信息翻译成命令语言，该命令语言可由计算机处理器解释以更新显示器上光标的位置、形状和动作。在本发明的一个可替选实施例中，预处理器105和姿态翻译器106可组合成单个装置。The preprocessor 105 is used to generate 3D space point reconstruction and bone point label setting. Gesture translator 106 is used to translate the 3D spatial information and marker motion information into a command language that can be interpreted by a computer processor to update the position, shape and motion of the cursor on the display. In an alternative embodiment of the invention, pre-processor 105 and pose translator 106 may be combined into a single device.

计算机107可以是比如由苹果、戴尔或任何合适的制造商制造的任何通用计算机。计算机107运行应用并提供显示输出。原本来自鼠标或其他现有技术的输入装置的光标信息现在来自姿态系统。Computer 107 may be any general purpose computer, such as manufactured by Apple, Dell, or any suitable manufacturer. Computer 107 runs applications and provides display output. Cursor information that would have come from a mouse or other prior art input device now comes from the gesture system.

标记标签mark label

本发明考虑在用户的一个或多个手指上使用标记标签，以使得系统可定位用户的手，标识其正在观察左手还是右手以及哪些手指是可视的。这允许系统检测用户的手的定位、取向和移动。此信息允许多个姿态被系统识别并被用户用作命令。The present invention contemplates the use of marker tags on one or more of the user's fingers so that the system can locate the user's hand, identify whether it is looking at the left or right hand and which fingers are visible. This allows the system to detect the position, orientation and movement of the user's hand. This information allows multiple gestures to be recognized by the system and used as commands by the user.

一个实施例中的标记标签是物理标签，该物理标签包括衬底(在本实施例中适合于贴附到人手上的各个位置)和以唯一标识图案布置在衬底表面上的离散标记。The marking label in one embodiment is a physical label comprising a substrate (in this embodiment adapted to be attached to various locations on a human hand) and discrete indicia arranged on the surface of the substrate in a uniquely identifying pattern.

标记和相关联的外部感测系统可以在允许准确、精确、迅速且持续地获取其三维空间位置的任何域(光域、电磁域、静磁域等)中工作。标记本身可主动地(例如，通过发射结构化电磁脉冲)工作或被动地(例如，通过如本实施例中的光学回射标记)工作。The marker and associated external sensing system can operate in any domain (optical, electromagnetic, magnetostatic, etc.) that allows accurate, precise, rapid and continuous acquisition of its three-dimensional position. The markers themselves can work actively (eg by emitting structured electromagnetic pulses) or passively (eg by optical retroreflective markers as in this embodiment).

在每个获取帧，检测系统接收由恢复后的三维空间位置构成的聚集“云”，其包括目前在仪器化工作空间体积内(在相机或其他检测器的可视范围内)的标签上的所有标记。每个标签上的标记具有足够的多样性，并且被布置成唯一的图案，从而检测系统可执行以下任务：(1)分割，其中每个恢复后的标记位置被分配给构成单个标签的点的一个且仅一个子集；(2)标签设定，其中点的每个分割后的子集被标识为特定标签；(3)定位，其中被标识的标签的三维空间位置被恢复；以及(4)取向，其中被标识的标签的三维空间取向被恢复。如下面所述和如图2中的一个实施例所示的那样，通过标记图案的特定性质，使得任务(1)和(2)成为可能。At each acquisition frame, the detection system receives an aggregated "cloud" of recovered 3D spatial positions including the positions on the labels that are currently within the instrumented workspace volume (within view of the camera or other detector). All tags. The markers on each label are of sufficient diversity and are arranged in a unique pattern that the detection system can perform the following tasks: (1) Segmentation, where each recovered marker position is assigned to the number of points that make up a single label. one and only one subset; (2) label setting, wherein each segmented subset of points is identified as a specific label; (3) localization, wherein the three-dimensional position of the identified label is recovered; and (4 ) orientation, where the three-dimensional spatial orientation of the identified tag is restored. Tasks (1) and (2) are made possible by specific properties of the marking pattern, as described below and shown in one embodiment in FIG. 2 .

一个实施例中的标签上的标记被贴附于规则网格位置的子集。此基本网格如本实施例中那样可以是传统笛卡尔类型的；或者代之以可以是某种其他的规则平面棋盘形布置(例如，三角形/六边形平铺布置)。鉴于标记感测系统的已知空间分辨率而确立网格的比例和间距，使得相邻网格位置不大可能被混淆。所有标签的标记图案的选择应满足以下约束：标签的图案不应通过旋转、平移或镜像的任意组合而与任何其他标签的图案相一致，标记的多样性和布置可进一步选择为使得容许某个规定数目的分量标记的损失(或遮蔽(occlusion))。在任何任意变换之后，应仍然不大可能将受损的模块与任何其他模块混淆。Indicia on the label in one embodiment are affixed to a subset of regular grid locations. This basic grid may be of the traditional Cartesian type as in this embodiment; or instead may be some other regular planar tessellation (eg a triangular/hexagonal tiling). The scale and spacing of the grid is established in view of the known spatial resolution of the marker sensing system such that adjacent grid positions are less likely to be confused. The marking patterns of all labels shall be selected to meet the following constraints: the pattern of the label shall not be consistent with the pattern of any other label through any combination of rotation, translation or mirroring, and the diversity and arrangement of markings may be further Loss (or occlusion) of a specified number of component labels. After any arbitrary transformation, it should still be unlikely to confuse the damaged module with any other module.

现在参考图2，示出了多个标签201A-201E(左手)和202A-202E(右手)。每个标签都是矩形的，并且在本实施例中由5×7网格阵列构成。选择矩形形状来帮助确定标签的取向以及降低镜像重复的可能性。在所示的实施例中，每个手上的每个指头都有标签。在一些实施例中，每个手使用一个、两个、三个或四个标签可能是足够的。每个标签具有不同灰阶或色彩明暗的边界。此边界内是3×5网格阵列。标记(由图2中的黑点表示)被布置在该网格阵列的某些点处以提供信息。Referring now to FIG. 2, a plurality of tags 201A-201E (left hand) and 202A-202E (right hand) are shown. Each label is rectangular and in this embodiment consists of a 5x7 grid array. Choose a rectangular shape to help orient the label and reduce the possibility of mirror duplication. In the illustrated embodiment, there are labels for each finger on each hand. In some embodiments, it may be sufficient to use one, two, three or four tags per hand. Each label has a border of a different grayscale or shade of color. Inside this boundary is a 3x5 grid array. Markers (represented by black dots in Figure 2) are placed at certain points of this grid array to provide information.

通过将每个图案分割成“共同的”和“唯一的”子图案，鉴定信息可以用标签的标记图案进行编码。例如，本实施例规定了两种可能的“边界图案”(标记围绕矩形边界的分布)。由此建立了标签“族”——意图用于左手的标签由此可能都使用如标签201A-201E中所示的相同边界图案，而附着到右手手指的标签可能被分配如标签202A-202E中所示的不同图案。此子图案被选择为使得在标签的所有取向上都可区分左图案与右图案。在所示的例子中，左手图案在每个角落都包括标记，并且在从角落起第二个网格位置包括标记。右手图案在仅两个角落有标记，并且在非角落网格位置有两个标记。从该图案可以看出：只要四个标记中的任何三个是可视的，就能明确区分左手图案与右手图案。在一个实施例中，边界的色彩或阴影也可用作手型性的指示。By segmenting each pattern into "common" and "unique" sub-patterns, identification information can be encoded with the tag's marking pattern. For example, this embodiment specifies two possible "boundary patterns" (distribution of markers around a rectangular boundary). A "family" of labels is thus established - labels intended for the left hand may thus all use the same border pattern as shown in labels 201A-201E, while labels attached to the fingers of the right hand may be assigned as in labels 202A-202E Different patterns shown. This sub-pattern is chosen such that the left and right patterns are distinguishable in all orientations of the label. In the example shown, the left hand pattern includes markings at each corner and at the second grid position from the corner. The right-hand pattern has marks at only two corners, and two marks at non-corner grid locations. From this pattern it can be seen that as long as any three of the four markings are visible, the left-hand pattern can be clearly distinguished from the right-hand pattern. In one embodiment, the color or shading of the border can also be used as an indication of chirality.

每个标签当然必须仍采用唯一的内部图案，标记分布在其族的共同边界内。在所示的实施例中，已发现，内部网格阵列中的两个标记足以唯一地标识十个手指中的每个，而不会因手指的旋转或取向而发生重复。即使标记中有一个被遮蔽，标签的手型性和图案的组合也产生唯一的标识符。Each label must of course still adopt a unique internal pattern, with the marks distributed within the common boundaries of its family. In the illustrated embodiment, it has been found that two markers in the internal grid array are sufficient to uniquely identify each of the ten fingers without duplication due to rotation or orientation of the fingers. Even if one of the marks is obscured, the combination of the tag's hand shape and pattern produces a unique identifier.

在本实施例中，网格位置可视地存在于刚性衬底上，来帮助执行将每个回射标记贴附于其预期位置的人工任务。借助彩色喷墨打印机将这些网格和预期标记位置精确地打印到衬底上，这里衬底是由初始时为挠性的“收缩膜”构成的片。将每个模块从该片切下，然后用炉烘烤，在该热处理过程中每个模块经历精确和可重复的收缩。在此过程后的短暂间隔内，冷却标签可略微变形——以模仿例如手指的纵向弯曲；此后，衬底是适当刚性的，标记可被粘附于所指示的网格点。In this embodiment, the grid locations are visually present on the rigid substrate to aid in the manual task of affixing each retroreflective marker in its intended location. These grids and intended marking positions are precisely printed onto a substrate, here a sheet of initially flexible "shrink film", by means of a color inkjet printer. Each module is cut from the sheet and then oven baked, during which heat treatment each module undergoes precise and repeatable shrinkage. In a short interval after this process, the cooling label can deform slightly - to mimic, for example, the longitudinal bending of a finger; thereafter, the substrate is suitably rigid and markings can be adhered to the indicated grid points.

在一个实施例中，标记本身是三维的，比如借助粘合剂或其他合适的装置贴附到衬底的小反射球。标记的三维性可帮助对二维标记的检测和定位。然而，在不脱离本发明的精神和范围的情况下，可使用任一个。In one embodiment, the markings themselves are three-dimensional, such as small reflective spheres attached to the substrate by means of adhesive or other suitable means. The three-dimensional nature of markers can aid in the detection and localization of two-dimensional markers. However, either may be used without departing from the spirit and scope of the present invention.

目前，标签借助Velcro或其他合适的装置贴附到操作者佩戴的手套，或者可替选地使用双面胶带直接贴附到操作者的手指。在第三实施例中，可以完全省去刚性衬底，而将各个标记贴附(或“涂”)在操作者的手指和手上。Currently, tags are attached to gloves worn by the operator by means of Velcro or other suitable means, or alternatively directly to the operator's fingers using double-sided tape. In a third embodiment, the rigid substrate can be omitted entirely, and the individual indicia can be affixed (or "painted") to the operator's fingers and hands.

姿态词汇gesture vocabulary

本发明考虑由手姿势、取向、手组合和取向混合组成的姿态词汇。还实施记号语言来设计和传达本发明的姿态词汇中的姿势和姿态。姿态词汇是以紧凑的文本形式来表示运动学联动机构的瞬时‘姿势状态’的系统。所讨论的联动机构可以是生物的(例如人手；或整个人体；或蚱蜢腿；或狐猴的具关节的脊柱)或者代之以可以是非生物的(例如机器臂)。在任何情况下，该联动机构可以是简单的(脊柱)或有分支的(手)。本发明的姿态词汇系统为任何具体联动机构建立恒定长度的串；于是，占据该串的‘字符位置’的具体ASCII字符集是联动机构的瞬时状态或‘姿势’的唯一描述。The present invention considers a gesture vocabulary consisting of hand poses, orientations, hand combinations, and orientation blends. A notational language is also implemented to design and communicate gestures and poses in the gesture vocabulary of the present invention. A pose vocabulary is a system for representing the instantaneous 'pose state' of a kinematic linkage in compact text form. The linkage in question may be biological (such as a human hand; or a whole human body; or a grasshopper leg; or the articulated spine of a lemur) or alternatively may be non-biological (such as a robotic arm). In any case, the linkage can be simple (spine) or branched (hands). The pose vocabulary system of the present invention builds a string of constant length for any particular linkage; then, the particular set of ASCII characters occupying the 'character positions' of that string are unique descriptions of the linkage's instantaneous state or 'pose'.

手姿势hand pose

图3图示了使用本发明的姿态词汇的一个实施例中的手姿势。本发明假设手上的五个手指中的每个都被使用。这些手指是诸如p-小指、r-无名指、m-中指、i-食指和t-拇指的码。图3中定义并示出了手指和拇指的多个姿势。姿态词汇串为联动机构(在此情况下为手指)中的每个可表达的自由度确立单个字符位置。此外，每个这样的自由度被理解为离散化的(或 ‘量子化的’)，从而可通过在该串位置分配有限数目的标准ASCII字符之一来表达其全程运动。这些自由度是相对于身体特有的原点和坐标系(手的背面，蚱蜢身体的中心；机器臂的底座等)而表达的。因此，使用小数目的额外姿态词汇字符位置来表达联动机构‘作为整体’在更全局的坐标系中的位置和取向。Figure 3 illustrates hand gestures in one embodiment using the gesture vocabulary of the present invention. The present invention assumes that each of the five fingers on the hand is used. These fingers are codes such as p-little, r-ring, m-middle, i-index, and t-thumb. A number of finger and thumb gestures are defined and shown in FIG. 3 . The pose vocabulary string establishes a single character position for each expressible degree of freedom in the linkage (in this case, the fingers). Furthermore, each such degree of freedom is understood to be discretized (or 'quantized') such that its full range of motion can be expressed by assigning one of a limited number of standard ASCII characters to the position of the string. These degrees of freedom are expressed relative to an origin and coordinate system specific to the body (the back of the hand, the center of the grasshopper's body; the base of a robotic arm, etc.). Therefore, a small number of additional gesture vocabulary character positions are used to express the position and orientation of the linkage 'as a whole' in a more global coordinate system.

仍参考图3，使用ASCII字符定义和标识多个姿势。其中一些姿势在拇指和非拇指之间加以划分。本发明在此实施例中使用编码，从而ASCII字符本身就暗示着姿势。然而，无论暗示与否，任何字符都可用来表示姿势。另外，在本发明中不必须为记号串使用ASCII字符。在不脱离本发明的范围和精神的情况下，可以使用任何合适的符号、数字或其他表示法。例如，如果需要，记号可采用每手指两位或某个其他位数。Still referring to FIG. 3 , multiple gestures are defined and identified using ASCII characters. Some of these poses are divided between thumb and non-thumb. The invention uses encodings in this embodiment so that the ASCII characters themselves imply gestures. However, any character may be used to represent a gesture, implied or not. In addition, it is not necessary to use ASCII characters for token strings in the present invention. Any suitable symbols, numbers or other representations may be used without departing from the scope and spirit of the invention. For example, the indicia could take two or some other number of digits per finger, if desired.

弯曲的手指由字符“^”表示，而弯曲的拇指由“>”表示。指向上方的直的手指或拇指由“1”表示，指向某个角度的直的手指或拇指由“\”或“/”表示。“-”表示指向正侧方的拇指，“x”表示指向平面内的拇指。A bent finger is indicated by the character "^", while a bent thumb is indicated by ">". A straight finger or thumb pointing upward is indicated by "1", and a straight finger or thumb pointing at an angle is indicated by "\" or "/". "-" indicates a thumb pointing straight to the side, and "x" indicates a thumb pointing in-plane.

采用这些单独的手指和拇指描述，可观数目的手姿势可使用本发明的方案来限定和书写。每个姿势由如上所述顺序为p-r-m-i-t的五个字符表示。图3图示了多个姿势，并且在此通过图示和举例对少数姿势进行了描述。保持平坦且平行于地面的手由“11111”表示。拳头由“^^^^>”表示。“OK”符号由“111^>”表示。Using these individual finger and thumb descriptions, a considerable number of hand gestures can be defined and written using the scheme of the present invention. Each pose is represented by five characters in the order p-r-m-i-t as described above. Figure 3 illustrates a number of gestures, and a few gestures are described herein by way of illustration and example. A hand kept flat and parallel to the ground is indicated by "11111". A fist is represented by "^^^^>". The "OK" symbol is represented by "111^>".

字符串当使用暗示性字符时提供了浅显“易读”的机会。着眼于快速的识别和直白的模拟，通常可选择描述每个自由度的可能字符的集合。例如，竖条(‘|’)意在表明联动机构元件是‘直的’，L形(‘L’)可表示九十度弯曲，抑扬符(‘^’)可表示锐角弯曲。如上所述，可按照需要使用任何字符或编码。Strings provide an opportunity for shallow "legibility" when using suggestive characters. With a view to fast recognition and straightforward simulation, the set of possible characters that describe each degree of freedom can often be chosen. For example, a vertical bar ('|') is intended to indicate that a linkage element is 'straight', an L-shape ('L') may indicate a ninety-degree bend, and a circumflex ('^') may indicate an acute-angle bend. As above, any character or encoding can be used as desired.

采用如这里所述的姿态词汇串的任何系统都受益于串比较的高计算效率——标识或搜索任何规定的姿势事实上变成期望姿势串与瞬时实际串之间的‘串比较’(例如UNIX的‘strcmp()’函数)。此外，‘通配符’的使用为程序员或系统设计者提供了额外的常见效能和功效：可将其瞬时状态对于匹配不相关的自由度规定为问号(‘？’)；可赋予额外的通配符含义。Any system that employs pose vocabulary strings as described here benefits from the high computational efficiency of string comparisons—identifying or searching for any specified pose effectively becomes a 'string comparison' between the desired pose string and the instantaneous actual string (e.g. UNIX's 'strcmp()' function). Furthermore, the use of 'wildcards' provides programmers or system designers with additional common capacities and functions: degrees of freedom whose instantaneous state is not relevant for matching can be specified as question marks ('?'); additional wildcard meanings can be given .

取向orientation

除了手指和拇指的姿势以外，手的取向也可表示信息。显然，亦可选择描述全局空间取向的字符：字符‘<’、‘>’、‘^’和‘v’当以取向字符位置出现时可用来表示左、右、上和下的概念。图4图示了组合了姿势和取向的编码的例子以及手取向描述符。在本发明的一个实施例中，两个字符位置首先规定手掌的方向，然后规定手指的方向(如果手指是直的，不管手指的实际弯曲)。这两个位置的可能字符表达取向的‘身体中心’记号：‘-、‘+’、‘x’、‘*’、‘^’和‘v’描述中间的、侧面的、前面的(向前的，离开身体)、后面的(向后的，离开身体)、头部的(向上的)和尾部的(向下的)。In addition to finger and thumb gestures, hand orientation can also represent information. Obviously, the characters describing the orientation in the global space can also be chosen: the characters '<', '>', '^' and 'v' can be used to represent the concepts of left, right, up and down when they appear in orientation character positions. Figure 4 illustrates an example of encoding combining pose and orientation and a hand orientation descriptor. In one embodiment of the invention, the two character positions first dictate the orientation of the palm, and then the orientation of the fingers (if the fingers are straight, regardless of the actual curvature of the fingers). The possible characters for these two positions express the 'centre of body' notation of orientation: '-,'+','x','*','^' and 'v' describe medial, lateral, anterior (forward , away from the body), posterior (backward, away from the body), cranial (upward) and caudal (downward).

在本发明的一个实施例的记号方案中，表示五个手指姿势的字符的后面是冒号和两个取向字符，以定义完整的命令姿势。在一个实施例中，起始位置被称为“xyz”姿势，其中拇指指向正上方，食指指向前方，中指垂直于食指，当姿势用右手作出时指向左方。这由串“^^x1-:-X”表示。In the notation scheme of one embodiment of the invention, the character representing the five-finger gesture is followed by a colon and two orientation characters to define the complete command gesture. In one embodiment, the starting position is referred to as the "xyz" pose, where the thumb points straight up, the index finger points forward, the middle finger is perpendicular to the index finger, and points left when the pose is done with the right hand. This is represented by the string "^^x1-:-X".

“XYZ-手”是利用人手的几何结构来允许对视觉上呈现的三维结构进行全部六个自由度的导航的技术。尽管该技术仅依赖于操作者的手的整体平移和旋转——从而其手指原则上可保持在任何期望姿势——在本实施例中更可取的是静态配置，其中食指指向远离身体的方向；拇指指向天花板；中指指向左-右。这三个手指由此描述(粗略但意图明晰地描述)了三维空间坐标系的三个互相正交的轴：由此‘XYZ-手’。"XYZ-Hand" is a technique that exploits the geometry of the human hand to allow navigation of all six degrees of freedom of visually presented three-dimensional structures. Although this technique relies only on the overall translation and rotation of the operator's hand - so that his fingers can in principle be held in any desired pose - a static configuration is preferred in this embodiment, with the index finger pointing away from the body; Thumb points to the ceiling; middle finger points left-right. These three fingers thus describe (roughly but clearly intended to describe) the three mutually orthogonal axes of the three-dimensional spatial coordinate system: thus 'XYZ-hand'.

于是，进行XYZ-手导航，其中手、手指呈如上所述的姿势并保持在操作者身体之前、预定的‘中性位置’。以下面的自然方式实现对三维空间物体(或相机)的三个平移和三个旋转自由度的理解：手的左右移动(相对于身体的自然坐标系)导致沿着计算环境的x轴的移动；手的上下移动导致沿着受控环境的y轴的移动；手的前后移动(朝着或远离操作者的身体)导致该环境内的z轴运动。类似地，操作者的手围绕食指的旋转导致计算环境的取向的‘滚动’变化；类似地，分别通过使操作者的手围绕中指和拇指的旋转来实现‘俯仰’和‘摇摆’的变化。Then, XYZ-hand navigation is performed, wherein the hand, fingers assume the posture as described above and remain in front of the operator's body, in a predetermined 'neutral position'. The understanding of the three translational and three rotational degrees of freedom of an object (or camera) in 3D space is achieved in the following natural way: Left-right movement of the hand (relative to the body's natural coordinate system) results in movement along the x-axis of the computing environment ; up and down movement of the hand results in movement along the y-axis of the controlled environment; forward and backward movement of the hand (towards or away from the operator's body) results in z-axis movement within the environment. Similarly, rotation of the operator's hand about the index finger results in 'roll' changes in the orientation of the computing environment; similarly, 'pitch' and 'roll' changes are achieved by rotating the operator's hand about the middle finger and thumb, respectively.

注意，尽管‘计算环境’这里用来指代由XYZ-手方法控制的实体——并且似乎暗示合成的三维空间物体或者相机，但应理解该技术同样可用于控制真实世界物体的各种自由度：例如配备有适当旋转致动器的摄像机或摄影机的摇摄/倾斜/滚动控制。此外，由XYZ-手姿势提供的物理自由度可能稍微有些不精确地映射在虚拟域中：在本实施例中，XYZ-手还用来提供对大的全景显示图像的导航访问，从而操作者的手的左右和上下运动导致围绕图像的预期左右或上下‘摇摄’，而操作者的手的前后运动映射到‘变焦’控制。Note that although 'computing environment' is used here to refer to entities controlled by an XYZ-hand approach - and seems to imply synthetic 3D objects or cameras, it should be understood that this technique can equally be used to control various degrees of freedom of real world objects : such as a camera equipped with an appropriate rotary actuator or a pan/tilt/roll control of the camera. Furthermore, the physical degrees of freedom provided by the XYZ-hand pose may be somewhat imprecisely mapped in the virtual domain: in this example, the XYZ-hand is also used to provide navigational access to a large panoramic Left and right and up and down movements of the hand result in a desired left and right or up and down 'pan' around the image, while back and forth movement of the operator's hand is mapped to a 'zoom' control.

在所有情况下，手的运动与所引起的计算平移/旋转之间的耦合可以是直接的(即，操作者的手的位置或旋转偏移借助某个线性或非线性函数而一一映射到在计算环境内物体或相机的位置或旋转偏移)或间接的(即，操作者的手的位置或旋转偏移借助某个线性或非线性函数而一一映射到在计算环境内位置/取向的一阶或更高阶导数；正在进行的积分于是实现计算环境的实际零阶位置/取向的非静态变化)。此后一控制手段类似于汽车的‘气动踏板’的使用，其中该踏板的恒定偏移或多或少导致恒定的车速。In all cases, the coupling between the motion of the hand and the resulting calculated translation/rotation can be direct (i.e., the position or rotation offset of the operator's hand is mapped one by one to The position or rotational offset of an object or camera within the computing environment) or indirect (i.e., the position or rotational offset of the operator's hand is mapped one-to-one to the position/orientation within the computing environment by some linear or non-linear function the first or higher derivatives of ; the ongoing integration then achieves a non-stationary change in the actual zero-order position/orientation of the computing environment). This latter means of control is analogous to the use of a 'pneumatic pedal' in a car, where a constant deflection of this pedal results in a more or less constant vehicle speed.

担当真实世界XYZ-手的局部六自由度坐标原点的‘中性位置’可以(1)被确立为空间中的绝对位置和取向(相对于比如封闭室)；(2)被确立为相对于操作者自身的固定位置和取向(例如，在身体前方八英寸、在下巴下方十英寸、以及在侧面与肩膀平面成一直线)，而无论操作者的总体位置和‘朝向’如何；或者(3)通过操作者的有意的副动作(例如采用由操作者的‘另一只’手作出的姿态命令，所述命令表明XYZ-手的当前位置和取向自此以后应当被用作平移和旋转原点)来被交互式地确立。The 'neutral position', which serves as the origin of the real-world XYZ-hand's local 6DOF coordinates, can be (1) established as an absolute position and orientation in space (relative to, say, an enclosure); (2) established relative to the manipulation the operator's own fixed position and orientation (e.g., eight inches in front of the body, ten inches below the chin, and in line with the plane of the shoulders on the side), regardless of the operator's general position and 'orientation'; or (3) by Deliberate secondary actions of the operator (e.g. employing gesture commands by the operator's 'other' hand indicating that the current position and orientation of the XYZ-hand should henceforth be used as the translation and rotation origin) to is established interactively.

此外，方便的是，提供围绕XYZ-手的中性位置的‘封锁’区(或‘死区’)，使得该体积内的移动不映射到受控环境内的移动。Furthermore, it is convenient to provide a 'blockade' zone (or 'dead zone') around the neutral position of the XYZ-hand, so that movements within this volume do not map to movements within the controlled environment.

可包括其他姿势：Other poses can be included:

[| | | | |：vx]是手掌面朝下方且手指朝前方的扁平手(拇指平行于手指)。[|||||:vx] is a flat hand with the palm facing down and the fingers facing forward (thumb parallel to the fingers).

[| | | | |：x^]是手掌面朝前方且手指朝天花板的扁平手。[|||||:x^] is a flat hand with the palm facing forward and the fingers facing the ceiling.

[| | | | |：-x]是手掌面朝身体中心(左手情况下为右，右手情况下为左)且手指朝前方的扁平手。[|||||:-x] is a flat hand with the palm facing the center of the body (right in left-handed case, left in right-handed case) with fingers facing forward.

[^^^^-：-x]是单手竖起大拇指(拇指指向天花板)[^^^^-:-x] is a one-handed thumbs up (thumb pointing to the ceiling)

[^^^|-：-x]是模仿枪指向前方[^^^|-:-x] is an imitation gun pointing forward

双手组合combination of hands

本发明考虑单手命令和姿势，同样也考虑双手命令和姿势。图5示出了本发明的一个实施例中的双手组合和相关联的记号的例子。观察第一个例子的记号，“完全停止”表明其包括两个合上的拳头。“快照”例子的每个手的拇指和食指伸展，各拇指指向彼此，从而定义球门柱形状的框架。“方向舵和油门起始位置”是手指和拇指指向上方，手掌面朝屏幕。The present invention contemplates one-handed commands and gestures, as well as two-handed commands and gestures. Figure 5 shows an example of a combination of hands and associated notation in one embodiment of the invention. Looking at the notation of the first example, "full stop" indicates that it involves two closed fists. The "snapshot" example has the thumb and forefinger of each hand extended, with the thumbs pointing toward each other, defining the frame of the goalpost shape. The "rudder and throttle home position" is with fingers and thumb pointing up and palm facing the screen.

取向混合mixed orientation

图6图示了本发明的一个实施例中的取向混合的例子。在所示的例子中，通过在手指姿势串之后将成对的取向记号括在括号内来表示该混合。例如，第一个命令示出了全部直指的手指位置。第一对取向命令将导致手掌平坦地朝着显示器，第二对使手旋转到斜向显示器45度的斜度。尽管本例子中示出了成对的混合，但在本发明中可考虑任何数目的混合。Figure 6 illustrates an example of orientation mixing in one embodiment of the invention. In the example shown, this blending is indicated by enclosing pairs of orientation tokens in parentheses after the string of finger gestures. For example, the first command shows all pointing finger positions. The first pair of orientation commands will cause the palm to face flat against the display, and the second pair will rotate the hand to a 45 degree incline towards the display. Although paired mixes are shown in this example, any number of mixes are contemplated in the present invention.

示例命令example command

图8示出了适用于本发明的多个可能的命令。尽管这里的讨论有一些是关于控制显示器上的光标的，但本发明不限于该工作。实际上，本发明在操纵屏幕上的任何和所有数据及部分数据、以及显示器的状态时有大的应用。例如，这些命令可用于在视频媒体的回放过程中取代视频控制。这些命令可用来暂停、快进、倒回等。另外，可执行命令来缩小或放大图像、改变图像的取向、在任何方向上摇摄等。本发明还可代替诸如打开、关闭、保存等的菜单命令而使用。换句话说，任何可想象的命令或工作都可用姿态实现。Figure 8 shows a number of possible commands suitable for use with the present invention. Although some of the discussion herein is about controlling a cursor on a display, the invention is not limited to this work. In fact, the invention has great application in manipulating any and all data and portions of data on the screen, as well as the state of the display. For example, these commands can be used to override video controls during playback of video media. These commands can be used to pause, fast forward, rewind, etc. Additionally, commands may be executed to zoom out or zoom in on the image, change the orientation of the image, pan in any direction, etc. The present invention can also be used in place of menu commands such as open, close, save, and the like. In other words, any conceivable command or job can be performed with gestures.

操作operate

图7是图示了一个实施例中的本发明的操作的流程图。在步骤701，检测系统检测标记和标签。在判定块702，确定是否检测到标签和标记。如果未检测到，系统返回到步骤701。如果在步骤702检测到标签和标记，系统进入步骤703。在步骤703，系统根据检测到的标签和标记标识手、手指和姿势。在步骤704，系统标识姿势的取向。在步骤705，系统标识检测到的一个或多个手的三维空间位置。(请注意步骤703、704和705中的任何或全部步骤可组合为单个步骤)。Figure 7 is a flowchart illustrating the operation of the invention in one embodiment. In step 701, a detection system detects markers and labels. At decision block 702, it is determined whether tags and markers are detected. If not detected, the system returns to step 701. If tags and markers are detected at step 702, the system proceeds to step 703. At step 703, the system identifies hands, fingers, and gestures from the detected tags and markers. At step 704, the system identifies the orientation of the gesture. At step 705, the system identifies the detected three-dimensional spatial location of one or more hands. (Note that any or all of steps 703, 704 and 705 may be combined into a single step).

在步骤706，信息被翻译成上面描述的姿态记号。在判定块707，确定姿势是否有效。这可以通过使用所产生的记号串进行简单的串比较来实现。如果姿势无效，系统返回到步骤701。如果姿势有效，系统在步骤708将记号和位置信息发送给计算机。计算机在步骤709确定为响应于姿态而要采取的合适动作，并相应地在步骤710更新显示器。At step 706, the information is translated into gesture tokens as described above. At decision block 707, it is determined whether the gesture is valid. This can be achieved by a simple string comparison using the generated token string. If the gesture is invalid, the system returns to step 701. If the gesture is valid, the system sends the marker and location information to the computer at step 708 . The computer determines the appropriate action to take in response to the gesture at step 709 and updates the display at step 710 accordingly.

在本发明的一个实施例中，步骤701-705是通过相机上处理器实现的。在其他实施例中，如果需要，该处理可由系统计算机实现。In one embodiment of the present invention, steps 701-705 are implemented by an on-camera processor. In other embodiments, this process can be implemented by the system computer, if desired.

分析和翻译analysis and translation

系统能够“分析”和“翻译”被底层系统恢复的低级姿态所构成的流，并将那些经分析和翻译的姿态变成可用于控制大范围的计算机应用和系统的命令或事件数据所构成的流。这些技术和算法可包含在由计算机代码构成的系统中，该系统既提供实施这些技术的引擎也提供构建对该引擎的能力进行利用的计算机应用的平台。The system is capable of "analyzing" and "translating" the stream of low-level gestures recovered by the underlying system, and turning those analyzed and translated gestures into commands or event data that can be used to control a wide range of computer applications and systems flow. These techniques and algorithms may be embodied in a system of computer code that provides both an engine for implementing the techniques and a platform for building computer applications that exploit the capabilities of the engine.

一个实施例致力于在计算机接口中实现人手的丰富姿态使用，但也能够识别由其他身体部分(包括但不限于臂、躯干、腿和头)以及各种各样的非手的物理工具(静态的有关节的)作出的姿态，所述非手的物理工具包括但不限于卡钳、两角规、挠性弯曲合拢器以及各种形状的指点装置。可按照需要将标记和标签施加于可由操作者携带和使用的物品和工具。One embodiment is dedicated to enabling rich gestural use of the human hand in computer interfaces, but is also capable of recognizing objects made of other body parts (including, but not limited to, arms, torso, legs, and head) as well as a wide variety of non-hand physical tools (static articulated) gestures made by non-hand physical tools including, but not limited to, calipers, two-squares, flexible flexures, and pointing devices of various shapes. Markings and labels can be applied as desired to items and implements that can be carried and used by the operator.

这里所述的系统合并了使得构建在可被识别和作用于的姿态的范围方面丰富的姿态系统成为可能的多个创新，同时提供向应用中的简单集成。The system described here incorporates several innovations that make it possible to build gesture systems that are rich in the range of gestures that can be recognized and acted upon, while providing simple integration into applications.

一个实施例中的姿态分析和翻译系统的组成如下：The gesture analysis and translation system in one embodiment consists of the following:

1)规定(为用在计算机程序中而编码)如下几个不同聚集水平的姿态的简洁有效的方式：1) A compact and efficient way of specifying (encoded for use in a computer program) the following poses with different levels of aggregation:

a.单只手的“姿势”(手的各部分相对于彼此的配置和取向)单只手在三维空间中的取向和位置。a. "Pose" of a single hand (configuration and orientation of parts of the hand relative to each other) Orientation and position of a single hand in three-dimensional space.

b.双手组合，对于任一只手，考虑姿势、位置或两者。b. Combinations of hands, for either hand, consider posture, position, or both.

c.多人组合；系统可跟踪多于两只的手，因此多于一个的人可协同地(或竞争地，在游戏应用的情况下)控制目标系统。c. Multiplayer combinations; the system can track more than two hands, so more than one person can cooperatively (or competitively, in the case of gaming applications) control the target system.

d.按顺序的姿态，其中姿势被组合成一系列；我们称它们为“活动的”姿态。d. Sequential poses, where poses are combined into a series; we call them "active" poses.

e.“语义图”姿态，其中操作者在空间中描绘形状。e. "Semantic graph" poses, in which the operator traces shapes in space.

2)用于注册与给定应用环境相关的上述每个种类中的具体姿态的编程技术。2) Programming techniques for registering specific gestures in each of the above categories relevant to a given application environment.

3)用于分析姿态流以使得经注册的姿态可被标识且封装了这些姿态的事件可被传送给相关应用环境的算法。3) Algorithms for analyzing the gesture stream so that registered gestures can be identified and events encapsulating these gestures can be delivered to the relevant application environment.

具有组成要素(1a)到(1f)的规定系统(1)为利用这里所述的系统的姿态分析和翻译能力提供了基础。The prescribed system (1) having constituent elements (1a) through (1f) provides the basis for exploiting the gesture analysis and translation capabilities of the system described herein.

单手“姿势”被表示为A one-handed "pose" is denoted as

i)手指与手背之间的相对取向所构成的串，i) the string formed by the relative orientation between the fingers and the back of the hand,

ii)被量子化成小数目的离散状态。ii) is quantized into a small number of discrete states.

使用相对联接取向使得这里所述的系统可避免与不同的手大小和几何结构相关联的问题。本系统不要求“操作者校准”。另外，将姿势规定为相对取向的串或集合使得可通过将姿势表示法与另外的滤波器和规定相结合而容易地创建更复杂的姿态规定。The use of relative articulation orientations allows the systems described herein to avoid problems associated with different hand sizes and geometries. The system does not require "operator calibration". Additionally, specifying gestures as strings or sets of relative orientations allows for the easy creation of more complex gesture specifications by combining gesture representations with additional filters and specifications.

使用小数目的用于姿势规定的离散状态使得简洁地规定姿势以及使用多种基本跟踪技术(例如，使用相机的被动光学跟踪、使用发光点和相机的主动跟踪、电磁场跟踪等)来确保准确的姿势识别成为可能。The use of a small number of discrete states for pose specification enables concise pose specification and the use of a variety of basic tracking techniques (e.g., passive optical tracking using cameras, active tracking using luminous points and cameras, electromagnetic field tracking, etc.) Gesture recognition becomes possible.

可部分(或最小限度地)规定(1a)到(1f)的每个种类中的姿态，以使得非关键的数据被忽略。例如，其中两个手指的位置明确且其他手指位置不重要的姿态可由这样的单个规定来表示：其中给出两个相关手指的操作位置，且在同一个串内，为其他手指列出“通配符”或一般的“忽略这些”指示。The poses in each category of (1a) to (1f) may be specified partially (or minimally) such that non-critical data is ignored. For example, a gesture in which the position of two fingers is unambiguous and the position of the other fingers is unimportant can be represented by a single specification in which the manipulative positions of the two relevant fingers are given, and within the same string, a "wildcard character" is listed for the other fingers ” or the general “ignore these” directive.

这里所述的用于姿态识别的所有创新(包括但不限于多层化规定技术、相对取向的使用、数据的量子化以及对每一级的部分或最小规定的允许)超越人的姿态的规定而推广到使用其他身体部分和“人造”工具和物体作出的姿态的规定。All of the innovations described here for gesture recognition (including but not limited to multilayered specification techniques, use of relative orientations, quantization of data, and allowing for partial or minimal specification at each level) go beyond the specification of human poses Rather, it extends to the provision of gestures made with other body parts and "artificial" tools and objects.

用于“注册姿态”(2)的编程技术由限定的一组应用编程接口调用组成，其允许程序员限定引擎应使哪些姿态可为运行的系统的其他部分所用。The programming technique for "registering gestures" (2) consists of a defined set of application programming interface calls that allow the programmer to define which gestures the engine should make available to other parts of the running system.

这些API例程可以在应用建立时间使用，从而创建在运行的应用的整个生命期使用的静态接口定义。它们还可以在运行的过程中使用，从而允许接口特性在运行中改变。接口的该实时变更使得可以：These API routines can be used at application build time, thereby creating static interface definitions that are used throughout the lifetime of the running application. They can also be used on the fly, allowing interface properties to change on the fly. This live change of interface makes it possible to:

i)构建复杂的环境和条件控制状态，i) Construct complex environments and conditional control states,

ii)向控制环境动态地添加滞后，和ii) dynamically add hysteresis to the control environment, and

iii)创建使得用户能够变更或扩展运行的系统本身的接口词汇的应用。iii) Create applications that enable users to change or extend the interface vocabulary of the running system itself.

用于分析姿态流(3)的算法将在(1)中规定并在(2)中注册的姿态与输入的低级姿态数据相比较。当经注册的姿态的匹配被识别出时，表示匹配的姿态的事件数据被堆栈上传到运行的应用。The algorithm used to analyze the pose stream (3) compares the poses specified in (1) and registered in (2) to the incoming low-level pose data. When a match to a registered gesture is identified, event data representing the matched gesture is stacked to the running application.

在本系统的设计中期望高效的实时匹配，将规定的姿态处理为被尽可能快地处理的可能性的树。Efficient real-time matching is desired in the design of this system, processing a given pose as a tree of possibilities that is processed as quickly as possible.

另外，在内部使用以识别规定的姿态的简单比较操作符也被暴露以供应用程序员使用，使得进一步的比较(例如，复杂或复合姿态中的灵活的状态检查)甚至可以从应用环境内发生。In addition, simple comparison operators used internally to recognize specified gestures are also exposed for use by application programmers, so that further comparisons (e.g., flexible state checks in complex or compound gestures) can even occur from within the application environment .

识别“锁定”语义是这里所述系统的创新。这些语义由注册API(2)暗示(并且，对于较小的范围，嵌入在规定词汇(1)内)。注册API调用包括，Recognizing "lock" semantics is an innovation of the system described here. These semantics are implied by the registry API (2) (and, to a lesser extent, embedded within the specified vocabulary (1)). Register API calls include,

i)“进入”状态通知符和“持续”状态通知符，和i) "Entering" state notifier and "Ongoing" state notifier, and

ii)姿态优先级说明符。ii) Gesture priority specifier.

如果姿态已被识别出，则其“持续”状态优先于相同或较低优先级的姿态的所有“进入”状态。进入和持续状态之间的这种区别显著增大了感知到的系统可用性。If a gesture has been recognized, its "persistent" state takes precedence over all "entry" states of gestures of the same or lower priority. This distinction between entry and persistence states significantly increases perceived system availability.

这里所述的系统包括用于面对真实世界数据误差和不确定性的鲁棒操作的算法。来自低级跟踪系统的数据可能是不完整的(由于多种原因，包括光学跟踪中的标记遮蔽、网络掉线或处理滞后等)。The system described here includes algorithms for robust operation in the face of real world data errors and uncertainties. Data from low-level tracking systems may be incomplete (due to a number of reasons, including marker occlusion in optical tracking, network drops or processing lag, etc.).

取决于丢失数据的量和上下文，丢失数据由分析系统标记，并被插入到“最后已知的”或“很可能的”状态中。Missing data is flagged by the analysis system and inserted into a "last known" or "probable" status, depending on the amount and context of the missing data.

如果关于特定姿态成分(例如，特定关节的取向)的数据丢失，但该特定成分的“最后已知的”状态可被分析为在物理上是可能的，则系统在其实时匹配时使用此最后已知的状态。If data about a particular pose component (e.g., the orientation of a particular joint) is missing, but the "last known" state of that particular component can be analyzed as physically possible, the system uses this last known state in its real-time matching. known state.

相反，如果最后已知的状态被分析为在物理上是不可能的，则系统后退到该成分的“最佳猜测范围”，并在其实时匹配时使用此合成数据。Conversely, if the last known state is analyzed as physically impossible, the system falls back to a "best guess range" for that component and uses this synthetic data when it matches in real time.

仔细地设计了这里所述的规定和分析系统以支持“手型性不可知论”，从而对于多手姿态，任一只手都被允许满足姿态要求。The specification and analysis system described here was carefully designed to support "handedness agnosticism" such that for multi-handed gestures, either hand is allowed to satisfy the gesture requirements.

一致的虚拟/显示和物理空间Consistent virtual/display and physical spaces

系统可提供这样的环境，其中描绘在一个或多个显示装置(“屏幕”)上的虚拟空间被处理为与系统的一个或多个操作者所处的物理空间相一致。这里描述了这种环境的一个实施例。该当前实施例包括固定位置处的三个由投影仪驱动的屏幕，由单个台式计算机驱动，并使用这里所述的姿态词汇和接口系统来控制。然而，请注意：所描述的技术支持任何数目的屏幕；这些屏幕可以是移动的(而非固定的)；这些屏幕可由很多独立的计算机同时驱动；且整个系统可通过任何输入装置或技术来控制。The system may provide an environment in which a virtual space depicted on one or more display devices ("screens") is processed to coincide with the physical space in which one or more operators of the system reside. One embodiment of such an environment is described here. This current embodiment consists of three projector-driven screens at fixed locations, driven by a single desktop computer, and controlled using the gesture vocabulary and interface system described here. Note, however: the technology described supports any number of screens; these screens can be mobile (rather than fixed); these screens can be driven simultaneously by many independent computers; and the entire system can be controlled by any input device or technology .

本公开中描述的接口系统应具有确定屏幕在物理空间中的尺度、取向和位置的方法。给定此信息，系统能够将这些屏幕所处(且系统的操作者所处)的物理空间作为投影动态地映射到在系统上运行的计算机应用的虚拟空间中。作为该自动映射的一部分，系统还根据由系统掌管的应用的需要、以多种方式翻译这两个空间的比例、角度、深度、尺度和其他空间特性。The interface system described in this disclosure shall have a method of determining the dimensions, orientation and position of the screen in physical space. Given this information, the system is able to dynamically map the physical space in which these screens reside (and the operator of the system resides) as a projection into the virtual space of computer applications running on the system. As part of this automatic mapping, the system also translates the scale, angle, depth, scale, and other spatial characteristics of the two spaces in various ways, depending on the needs of the application hosted by the system.

物理和虚拟空间之间的这种连续翻译使得多种接口技术的一致和普遍使用成为可能，这些接口技术在现有的应用平台上难以实现或者必须针对在现有平台上运行的每个应用个别地实施。这些技术包括(但不限于)：This continuous translation between physical and virtual spaces enables the consistent and ubiquitous use of multiple interface technologies that are either difficult to implement on existing application platforms or must be specific to each application running on an existing platform implemented. These techniques include (but are not limited to):

1)使用“精确指点”——在姿态接口环境中使用手，或使用物理指点工具或装置——作为普遍和自然的接口技术。1) Use "precision pointing" - using the hand in a gestural interface environment, or using a physical pointing tool or device - as a pervasive and natural interface technique.

2)对屏幕的移动或重新定位的自动补偿。2) Automatic compensation for movement or repositioning of the screen.

3)图形渲染，其依操作者位置而变，例如模拟视差位移以增强深度感。3) Graphics rendering that varies with operator position, such as simulating parallax displacement to enhance depth perception.

4)在屏幕显示中包括物理物体——考虑真实世界位置、取向、状态等。例如，站在大的不透明屏幕前方的操作者既能看到应用图形也能看到在屏幕后方(并且可能正在移动或改变取向)的比例模型的真实位置的表示。4) Include physical objects in the screen display - taking into account real world position, orientation, state, etc. For example, an operator standing in front of a large opaque screen can see both the application graphics and a representation of the true position of the scale model behind the screen (and possibly moving or changing orientation).

重要的是，注意精确指点不同于在基于鼠标的窗口接口和大多数其他现代系统中使用的抽象指点。在那些系统中，操作者必须学习管理虚拟指点器和物理指点装置之间的翻译，并且必须在这两者之间有认知力地映射。It is important to note that precise pointing differs from the abstract pointing used in mouse-based windowing interfaces and most other modern systems. In those systems, the operator must learn to manage the translation between the virtual pointing device and the physical pointing device, and must map cognitively between the two.

通过比较，在本公开中所述的系统中，无论从应用还是用户角度，虚拟和物理空间之间没有差异(除了虚拟空间更适合于数学变换)，因此操作者不需要有认知力的翻译。By comparison, in the system described in this disclosure, there is no difference between virtual and physical spaces, either from an application or user perspective (except that virtual spaces are more amenable to mathematical transformations), so no cognitive translation is required by the operator .

由这里所述的实施例提供的精确指点的最接近模拟是触摸屏(例如，可在很多ATM机上看到)。触摸屏提供了屏幕上的二维显示空间与屏幕表面的二维输入空间之间的一一映射。以类似的方式，这里所述的系统提供了显示在一个或多个屏幕上的虚拟空间与操作者所处的物理空间之间的灵活映射(可能但不必须是一一映射)。不管模拟的实用性如何，都值得理解该“映射方法”扩展到三维、任意大的架构环境以及多个屏幕不是无关紧要的。The closest analog to the precise pointing provided by the embodiments described herein is a touch screen (eg, found on many ATM machines). A touch screen provides a one-to-one mapping between the two-dimensional display space on the screen and the two-dimensional input space on the screen surface. In a similar fashion, the system described herein provides a flexible mapping (possibly but not necessarily a one-to-one mapping) between the virtual space displayed on one or more screens and the physical space in which the operator is located. Regardless of the practicality of the simulation, it is worth understanding that the extension of this "mapping method" to three dimensions, arbitrarily large architectural environments, and multiple screens is not trivial.

除了这里所述的部件，系统还可执行实现环境的物理空间与每个屏幕上的显示空间之间的连续的系统级映射(可能已通过旋转、平移、比例缩放或其他几何变换加以修改)的算法。In addition to the components described here, the system implements a continuous system-level mapping (possibly modified by rotation, translation, scaling, or other geometric transformations) between the physical space of the environment and the display space on each screen. algorithm.

渲染堆栈，其获取计算对象和映射，并输出虚拟空间的图形表示。The rendering stack, which takes compute objects and maps, and outputs a graphical representation of the virtual space.

输入事件处理堆栈，其获取来自控制系统的事件数据(在本实施例中，是来自系统和鼠标输入的姿态和指点数据)，并将来自输入事件的空间数据映射到虚拟空间中的坐标。然后，经翻译的事件被传送到运行的应用。An input event processing stack that takes event data from the control system (in this example, gesture and pointing data from the system and mouse input) and maps the spatial data from the input events to coordinates in virtual space. The translated events are then delivered to the running application.

“胶层”，其允许系统掌管在局域网上的若干计算机之间运行的应用。使用在扩展的景深内提取的三维信息进行的基于姿态的控制A "glue layer" that allows the system to host applications running between several computers on a local area network. Pose-based control using 3D information extracted within an extended depth of field

图9是一个实施例下的、包括在扩展的景深内提取三维信息的成像系统的基于姿态的控制系统900的框图。用户将手101和102放在相机阵列904A-904D的观察区域内。阵列904A-904D中的至少两个相机是波前编码相机，其中每个都包含包括波前编码掩模(这里也称为“非球面光学元件”或“光学元件”)的波前编码成像系统元件，如下面详细所述。用户的手和/或手指可能包括或可能不包括上述标记标签。FIG. 9 is a block diagram of a gesture-based control system 900 including an imaging system that extracts three-dimensional information within an extended depth of field, under one embodiment. The user places hands 101 and 102 within the field of view of camera arrays 904A-904D. At least two of the cameras in arrays 904A-904D are wavefront-encoded cameras, each of which includes a wavefront-encoded imaging system that includes a wavefront-encoded mask (also referred to herein as an "aspherical optic" or "optical element") elements, as described in detail below. The user's hands and/or fingers may or may not include the aforementioned marking labels.

相机904A-904D检测或捕捉包括手指和手101和102的定位、取向和移动的手指和手101和102的图像，并产生输出信号给预处理器905。预处理器905可包括或耦合到波前编码数字信号处理908，如下所述。可替选地，波前编码数字信号处理可包含在系统900的一个或多个其他部件中、耦合到该一个或多个其他部件或者分布在该一个或多个其他部件当中。波前编码数字信号处理908被配置成极大地扩展成像系统的景深。Cameras 904A-904D detect or capture images of fingers and hands 101 and 102 including their position, orientation and movement and generate output signals to pre-processor 905 . Preprocessor 905 may include or be coupled to wavefront encoded digital signal processing 908, as described below. Alternatively, wavefront encoded digital signal processing may be included in, coupled to, or distributed among one or more other components of system 900 . Wavefront encoded digital signal processing 908 is configured to greatly extend the depth of field of the imaging system.

预处理器905将相机输出翻译成姿态信号，姿态信号被提供给系统的计算机处理单元907。在这样做时，预处理器905产生三维空间点重构和骨点标签设定。姿态翻译器906将3D空间信息和标记运动信息翻译成命令语言，命令语言可由计算机处理器解释以更新显示器上光标的位置、形状和动作。计算机907使用输入信息来产生命令以控制一个或多个屏上光标，并将视频输出提供给显示器903。The pre-processor 905 translates the camera output into attitude signals, which are provided to the computer processing unit 907 of the system. In doing so, the pre-processor 905 produces 3D space point reconstructions and bone point labeling sets. Gesture translator 906 translates the 3D spatial information and marker motion information into a command language that can be interpreted by a computer processor to update the position, shape and motion of the cursor on the display. Computer 907 uses the input information to generate commands to control one or more on-screen cursors and provides video output to display 903 .

一个可替选实施例的预处理器905、姿态翻译器906和计算机907中的一个或多个可组合成单个装置。无论系统配置如何，预处理器905、姿态翻译器906和计算机907中的每个的功能和/或功能性都是如上面参照图1-8和其他地方所描述的。One or more of the pre-processor 905, gesture translator 906 and computer 907 of an alternative embodiment may be combined into a single device. Regardless of system configuration, the function and/or functionality of each of pre-processor 905, pose translator 906, and computer 907 is as described above with reference to Figures 1-8 and elsewhere.

此外，尽管本例子示出了用于检测用户的手101和102的定位、取向和移动的四个相机，但本实施例不限于此。系统配置可根据系统或工作站配置的需要而包括两个或更多相机。另外，尽管在示例实施例中相机是对称布置的，但在本发明中这种对称不是必需的。因此，在下文中可以使用允许用户的手的定位、取向和移动的任何定位的至少两个相机。Furthermore, although the present example shows four cameras for detecting the position, orientation and movement of the user's hands 101 and 102, the present embodiment is not limited thereto. System configurations may include two or more cameras as required by the system or workstation configuration. Also, although the cameras are arranged symmetrically in the example embodiment, such symmetry is not required in the present invention. Hence, at least two cameras allowing any positioning of the position, orientation and movement of the user's hand may be used in the following.

尽管系统是以单个用户的手作为输入而示出的，系统也可跟踪任何数目的多个用户的手。另外，代替手或除了手以外，系统可跟踪用户身体的任何一个或多个部分，包括头、脚、腿、臂、肘、膝等。此外，系统可跟踪任何数目的有生命的或无生命的物体，而不限于跟踪身体的部分。Although the system is shown with a single user's hand as input, the system can track the hands of any number of multiple users. Additionally, instead of or in addition to hands, the system may track any one or more parts of the user's body, including head, feet, legs, arms, elbows, knees, and the like. Furthermore, the system can track any number of animate or inanimate objects and is not limited to tracking body parts.

特别而言，对于定位光学传感器使之蓄意地或潜在地接近操作者的手(或被等效地跟踪的器具)的姿态分析系统，由此察觉到的要素通常将涵盖整个自然顺序的操作者运动、相对距离的几个或许多量级。持续焦点分辨地记录横越这一范围的距离的事件超出了传统光学成像系统的能力。然而，这些接近中距离的几何在为了宏观装置和产品设计的目的而跟踪物体或操作者的情况下常常是所期望的。因此，值得提供在预期的操作者活动范围内确保局部对比度或突出特征稳定性的技术(为此目的，传统光学是不够的)。In particular, for a gesture analysis system that positions an optical sensor in deliberate or potential proximity to an operator's hand (or equivalently tracked implement), the elements thus perceived will generally encompass the entire natural order of the operator's Several or many orders of magnitude of motion, relative distance. Sustained focus-resolved recording of events across this range of distances is beyond the capabilities of conventional optical imaging systems. However, these near mid-range geometries are often desirable in the context of tracking objects or operators for macroscopic device and product design purposes. Therefore, it is worthwhile to provide techniques that ensure local contrast or prominent feature stability over the expected range of operator motion (for this purpose, conventional optics are insufficient).

当描述在这里的系统中在扩展的景深内提取三维信息时，距场景中一点的距离可根据它在被同时捕捉到的两个或更多图像中的位置来估算出。当图像之间的三维(3D)关系已知时，该点的3D位置可根据基本几何关系来计算出。根据多个图像计算空间位置所面临的挑战(常称为立体相关或立体深度计算)是将一点在一个图像中的映射与它在另一个图像中的映射自动和准确地相关联。这最通常是通过使从一个图像到一个或多个其他图像的图像特征相关来完成的。然而，所有立体匹配方法中的基本假设是：图像中必须存在某个可标识的局部对比度或特征，以便将该点与它在另一个图像中的位置相匹配。因此，当由于散焦而在图像中没有局部对比度或特征时，会产生问题——立体匹配在图像的焦点没对准的区域内不产生准确的结果。When extracting three-dimensional information within an extended depth of field in the system described here, the distance to a point in the scene can be estimated from its position in two or more images captured simultaneously. When the three-dimensional (3D) relationship between images is known, the 3D position of the point can be calculated from the underlying geometric relationship. The challenge in computing spatial location from multiple images (often referred to as stereo correlation or stereo depth computation) is to automatically and accurately relate the mapping of a point in one image to its mapping in another image. This is most commonly done by correlating image features from one image to one or more other images. However, a fundamental assumption in all stereo matching methods is that there must be some identifiable local contrast or feature in an image in order to match a point with its location in another image. Therefore, when there is no local contrast or features in the image due to defocus, a problem arises - stereo matching does not produce accurate results in areas of the image that are out of focus.

用于扩展图像焦深的常规方法是减小相机镜头光瞳的直径(“缩小”)。然而，两个副作用限制了该技术的实用性。首先，成像系统的灵敏度降低了与光瞳内外径比的平方相等的倍数。其次，最大空间频率响应降低了与光瞳内外径比相等的倍数，这限制了图像中的分辨率和对比度。因此，在常规成像系统中存在景深、曝光时间和总体对比度之间的权衡。在多相机测距系统的情况下，净效应将是立体深度准确度与工作范围之间的折衷。A conventional method for extending the depth of focus of an image is to reduce the diameter of the camera lens pupil ("zoom out"). However, two side effects limit the utility of this technique. First, the sensitivity of the imaging system is reduced by a factor equal to the square of the ratio of the inner and outer diameters of the pupil. Second, the maximum spatial frequency response is reduced by a factor equal to the ratio of the inner and outer pupil diameters, which limits resolution and contrast in the image. Therefore, there is a trade-off between depth of field, exposure time, and overall contrast in conventional imaging systems. In the case of a multi-camera ranging system, the net effect will be a tradeoff between stereo depth accuracy and working range.

增大景深而不缩小镜头的一种替代方法是：在相机镜头的光瞳中引入规定要求的相位掩模。利用恰当选择的相位函数，可通过对在传感器上捕捉到的图像进行后续电子处理来恢复扩展的景深。这种被称作波前编码的技术通常提供景深、相机动态范围和信噪比之间的权衡。波前编码使得针对具体应用优化相机参数成为可能。不需要很高的动态范围且照明在用户控制之下的应用(比如这里所述的姿态识别)可大大受益于波前编码，从而在指定的空间体积内实现高的准确度。An alternative to increasing the depth of field without shrinking the lens is to introduce a phase mask as required by regulation in the pupil of the camera lens. With a properly chosen phase function, the extended depth of field can be recovered by subsequent electronic processing of the image captured on the sensor. This technique, known as wavefront encoding, typically provides a trade-off between depth of field, camera dynamic range, and signal-to-noise ratio. Wavefront coding makes it possible to optimize camera parameters for specific applications. Applications that do not require a high dynamic range and where lighting is under user control, such as gesture recognition as described here, can greatly benefit from wavefront encoding to achieve high accuracy within a specified volume of space.

如上所述，一个实施例的系统包括使用多个波前编码相机的经处理的输出来确定场景内选定物体的范围和位置的技术。由波前编码产生的扩展的景深可用于多种应用，包括姿态识别和一大批其他基于任务的成像工作，以显著提高它们的性能。尽管要求最少两个相机，但在该实施例中可使用的相机的数目没有上限。场景提取可包括用于通过两个或更多相机进行范围提取的多种处理技术(比如相关)中的任一种。这里所述的实施例包括在处理后产生扩展的景深的所有波前编码相位函数及其对应的解码内核。As noted above, the system of one embodiment includes techniques for using the processed output of multiple wavefront-encoded cameras to determine the extent and location of selected objects within a scene. The extended depth of field produced by wavefront encoding can be used in a variety of applications, including pose recognition and a host of other task-based imaging tasks, to significantly improve their performance. Although a minimum of two cameras is required, there is no upper limit to the number of cameras that can be used in this embodiment. Scene extraction may include any of a variety of processing techniques, such as correlation, for range extraction with two or more cameras. The embodiments described here include all wavefront encoded phase functions and their corresponding decoding kernels that produce extended depth of field after processing.

在波前编码成像系统中使用的波前编码是使用广义非球面光学(装置)和数字信号处理来大大提高成像系统的性能和/或降低成像系统的成本的一般技术。所采用的该类非球面光学(装置)产生对散焦相关偏差很不敏感的光学成像特性。锐利和清晰的图像不直接从该光学(装置)产生，然而，施加于采样图像的数字信号处理产生锐利和清晰的、同样对散焦相关偏差不敏感的最终图像。Wavefront encoding as used in wavefront encoding imaging systems is a general technique that uses generalized aspheric optics (device) and digital signal processing to greatly increase the performance and/or reduce the cost of an imaging system. The type of aspheric optics employed yields optical imaging properties that are very insensitive to defocus-related deviations. Sharp and clear images are not produced directly from the optics (device), however, digital signal processing applied to the sampled images produces a final image that is sharp and clear, also insensitive to defocus related biases.

波前编码用来大大提高图像系统的性能，同时还减小成像系统的大小、重量和成本。波前编码将非旋转对称的非球面光学元件与数字信号处理以基本的方式相结合，来极大地扩展成像系统的景深。利用波前编码，例如对于给定的孔径大小或F#，成像系统的景深或焦深可相对于传统成像系统增大十倍或更多。一个实施例的波前编码光学元件是相位面，因而不吸收光或增加曝光或照明要求。传统成像技术在不产生极大光功率损耗(比如缩小孔径时必产生极大光功率损耗)的情况下不可能实现这种扩展景深的性能。增大的景深/焦深还通过控制散焦相关偏差来使得成像系统能够在物理上更便宜、更小或更轻，而散焦相关偏差传统上是通过添加透镜元件或提高透镜复杂性来控制的。可利用波前编码控制的散焦相关偏差包括色差、Petzval曲率、散光、球差以及温度相关散焦。Wavefront coding is used to greatly increase the performance of imaging systems while also reducing the size, weight, and cost of imaging systems. Wavefront encoding combines non-rotationally symmetric aspheric optics with digital signal processing in a fundamental way to greatly extend the depth of field of imaging systems. Using wavefront encoding, for example, for a given aperture size or F#, the depth of field or focus of an imaging system can be increased by a factor of ten or more relative to conventional imaging systems. The wavefront encoding optical element of one embodiment is a phase plane and thus does not absorb light or increase exposure or illumination requirements. It is impossible for traditional imaging technology to achieve this kind of extended depth-of-field performance without incurring a huge optical power loss (for example, when reducing the aperture). Increased depth of field/focus also enables imaging systems to be physically cheaper, smaller, or lighter by controlling defocus-related aberrations that have traditionally been controlled by adding lens elements or increasing lens complexity of. Defocus-related aberrations that can be controlled using wavefront encoding include chromatic aberration, Petzval curvature, astigmatism, spherical aberration, and temperature-dependent defocus.

作为混合成像方法的波前编码将光学和电子学相结合来增大景深并减少光学部件的数目、制造公差和整个系统成本。图10是一个实施例下的、在基于姿态的控制系统中使用的波前编码成像系统1000的框图。波前编码成像系统1000的光学部分1001是常规光学系统或相机，但被修改为在孔径光阑附近放置波前编码光学元件1002。添加该编码光学元件导致图像具有对散焦不敏感的、专门化的鲜明的模糊或点扩散函数(point spreadfunction)。施加于采样图像的数字处理1003产生对散焦效应很不敏感的、锐利的和清晰的图像1004。Wavefront encoding as a hybrid imaging approach combines optics and electronics to increase depth of field and reduce the number of optical components, manufacturing tolerances, and overall system cost. Figure 10 is a block diagram of a wavefront encoded imaging system 1000 for use in a gesture-based control system, under one embodiment. The optical portion 1001 of the wavefront encoded imaging system 1000 is a conventional optical system or camera, but modified to place a wavefront encoded optical element 1002 near the aperture stop. Adding this encoded optical element results in an image with a specialized sharp blur or point spread function that is insensitive to defocus. Digital processing 1003 applied to the sampled image produces a sharp and clear image 1004 that is very insensitive to defocus effects.

图11是一个实施例下的基于姿态的控制系统1100的框图，控制系统1100采用包括两个波前编码相机的波前编码成像系统在扩展的景深内提取三维信息。如上面参照图10所述，系统1100包括至少两个波前编码相机1101和1102。处理器被耦合以接收波前编码相机1101和1102的输出并对相机输出执行数据处理。数据处理包括例如去卷积1120和范围提取1130，并产生扩展的焦点范围图1140。FIG. 11 is a block diagram of a gesture-based control system 1100 employing a wavefront-encoded imaging system including two wavefront-encoded cameras to extract three-dimensional information over an extended depth of field, under one embodiment. As described above with reference to FIG. 10 , system 1100 includes at least two wavefront encoded cameras 1101 and 1102 . A processor is coupled to receive the outputs of the wavefront encoded cameras 1101 and 1102 and to perform data processing on the camera outputs. Data processing includes, for example, deconvolution 1120 and range extraction 1130 and produces an expanded focus range map 1140 .

在波前编码系统1100中，系统的光学部分(例如波前编码相机1101和1102)对所得到的图像进行“编码”以产生中间图像1110。由于波前编码元件(例如图10中的元件1002)有目的地模糊了任何图像中的所有点，所以中间图像1110呈现散焦。在这种中间图像1110中，视野内几乎所有的物体都是模糊的，但它们是同等模糊的。相比之下，传统光学(装置)通常形成具有依赖于距场景中每个物体的距离的可变模糊函数的图像。In wavefront encoding system 1100 , the optical portion of the system (eg, wavefront encoding cameras 1101 and 1102 ) “encodes” the resulting image to produce intermediate image 1110 . The intermediate image 1110 appears defocused because a wavefront encoding element (such as element 1002 in Figure 10) purposely blurs all points in any image. In this intermediate image 1110, nearly all objects in the field of view are blurred, but they are equally blurred. In contrast, conventional optics typically form images with variable blur functions that depend on the distance from each object in the scene.

为了从中间波前编码图像1110产生锐利和清晰的图像，使用电子学(例如，波前编码数字信号处理)、通过去除依赖于系统的图像模糊来对模糊的中间图像进行处理或“解码”1120和1130。可通过软件或利用专门化的硬件方案来实时地执行数字滤波。To produce a sharp and clear image from the intermediate wavefront encoded image 1110, the blurred intermediate image is processed or "decoded" 1120 using electronics (e.g., wavefront encoded digital signal processing) by removing system-dependent image blur and 1130. Digital filtering can be performed in real-time by software or using specialized hardware solutions.

如上面参照图10所述，一个实施例的系统光学(装置)包括：具有执行波前编码功能的至少一个额外光学元件的常规部件。该元件被放置在光路中，通常在系统的孔径光阑附近以最小化渐晕。对检测到的图像执行的信号处理依赖于光学(装置)、波前编码元件以及数字检测器的一阶特性。As described above with reference to FIG. 10, the system optics of one embodiment include conventional components with at least one additional optical element performing the wavefront encoding function. This element is placed in the light path, usually near the system's aperture stop to minimize vignetting. The signal processing performed on the detected image relies on the first-order properties of the optics (device), wavefront encoding elements, and digital detectors.

尽管也可使用衍射表面，一般的波前编码元件是非旋转对称的和光滑的。该元件可以是单独的部件，或者它可通过添加广义非球面表面而被集成到传统透镜元件上。所有编码元件都使光改变方向，从而除了轴上光线以外，没有光线射向传统几何焦点。实际上，没有两个光线沿着光轴射向同一点。系统在任何像平面都不会形成清晰的图像。Typical wavefront encoding elements are rotationally asymmetric and smooth, although diffractive surfaces can also be used. This element can be a separate component, or it can be integrated onto a conventional lens element by adding a generalized aspheric surface. All encoding elements redirect the light so that no rays hit the focal point of conventional geometry, except on-axis rays. In fact, no two rays hit the same point along the optical axis. The system does not produce sharp images at any image plane.

波前编码成像系统的光学部分的主要作用是使得所得到的图像对焦点相关偏差(如散焦、球差、散光或场曲)不敏感。中间模糊图像对物体或成像系统的包含散焦偏差的变化不敏感或不随之变化。从系统分析的观点来看，波前编码系统的调制传递函数(modulation transfer function)(MTF)和点扩散函数(PSF)不关于散焦而变化。The main role of the optical part of a wavefront encoded imaging system is to make the resulting image insensitive to focus related deviations such as defocus, spherical aberration, astigmatism or field curvature. The intermediate blurred image is insensitive or invariant to changes in the object or imaging system, including defocus bias. From a system analysis point of view, the modulation transfer function (MTF) and point spread function (PSF) of a wavefront encoded system do not vary with respect to defocus.

尽管来自波前编码系统的中间图像的MTF表现出随散焦变化很少，但是相对于焦点对准的传统系统而言，该MTF的确具有降低的功率。由于未使用变迹法(apodization)，所以保持了总的光功率。采用数字滤波或图像重构处理来形成清晰的图像。这些最终MTF对散焦很不敏感--因此，波前编码成像系统具有很大的景深。类似地，来自波前编码系统的中间PSF不同于传统系统PSF，但它们随散焦的变化而变化非常少。Although the MTF of the intermediate image from the wavefront encoded system exhibits little variation with defocus, the MTF does have reduced power relative to an in-focus conventional system. Since no apodization is used, the total optical power is maintained. Digital filtering or image reconstruction processing is used to form a clear image. These final MTFs are very insensitive to defocus—thus, wavefront-encoded imaging systems have a large depth of field. Similarly, the intermediate PSFs from wavefront encoded systems differ from conventional system PSFs, but they vary very little with defocus.

再次参考图10，专用非球面光学元件被放置在常规成像系统的孔径光阑处或其附近，以形成波前编码成像系统。此光学元件以这样的方式修改成像系统：即，使得所得到的PSF和光学传递函数(OTF)对一定范围的散焦或散焦相关偏差不敏感。然而，该PSF和OTF与利用高质量的焦点对准的成像系统获得的PSF和OTF不相同。使得成像系统对散焦偏差不敏感的过程产生具有专门化的鲜明的模糊的图像；该模糊利用波前编码数字信号处理来去除。Referring again to FIG. 10, specialized aspheric optical elements are placed at or near the aperture stop of a conventional imaging system to form a wavefront encoded imaging system. This optical element modifies the imaging system in such a way that the resulting PSF and optical transfer function (OTF) are insensitive to a range of defocus or defocus-related deviations. However, the PSF and OTF are not the same as those obtained with a high quality in-focus imaging system. The process of making the imaging system insensitive to defocus deviations produces images with specialized sharp blurring; this blurring is removed using wavefront encoded digital signal processing.

例如来自常规成像系统的PSF随散焦急剧地变化，而来自波前编码成像系统的PSF表现出随散焦几乎没有明显变化。施加于散焦的传统成像系统用以去除散焦模糊的数字处理依赖于图像的不同区域中存在的散焦的量而进行处理。在很多情况下，散焦的量是未知的并且难以计算。另外，散焦的传统成像系统的MTF常常可能包含零或空值，这进一步增大了数字处理的难度。相比之下，来自波前编码系统的PSF随散焦的恒定性质正是消除数字处理对散焦的依赖性所需要的。施加于电荷耦合器件(CCD)或互补金属氧化物半导体(CMOS)检测到的图像的数字处理独立于散焦和被成像的实际场景。另外，焦点对准和焦点没对准的波前编码成像系统的MTF都不包含零或空值，从而允许高质量的最终图像。For example, the PSF from a conventional imaging system varies sharply with defocus, while the PSF from a wavefront encoded imaging system exhibits little noticeable variation with defocus. The digital processing applied to defocus by conventional imaging systems to remove defocus blur is processed depending on the amount of defocus present in different regions of the image. In many cases, the amount of defocus is unknown and difficult to calculate. In addition, the MTF of a defocused conventional imaging system may often contain zero or null values, further complicating the digital processing. In contrast, the constant nature of PSF with defocus from wavefront-encoded systems is exactly what is needed to remove the dependence of digital processing on defocus. The digital processing applied to the charge-coupled device (CCD) or complementary metal-oxide-semiconductor (CMOS) detected image is independent of defocus and the actual scene being imaged. Additionally, the MTFs of both in-focus and out-of-focus wavefront encoded imaging systems contain zero or null values, allowing for high quality final images.

用于扩展景深的波前编码能够使通常无法接受传统方法(即，缩小孔径)的成像应用增值。对照明水平、曝光时间或空间分辨率的约束常常限制了现有光学方法的应用。通过使用波前编码，在不牺牲曝光时间或要求大量照明的情况下，应用可受益于散焦相关问题的减少。Wavefront encoding for extended depth of field can add value to imaging applications that would normally not be amenable to traditional methods (ie, reduced aperture). Constraints on illumination levels, exposure time, or spatial resolution often limit the application of existing optical methods. By using wavefront encoding, applications can benefit from a reduction in defocus-related issues without sacrificing exposure time or requiring large amounts of illumination.

如上所述，波前编码成像系统包括对所得到的图像的数字信号处理和非常规光学设计。所使用的信号处理依赖于具体光学系统。波前编码光学(装置)依赖于待使用的信号处理的类型和量。由于光学(装置)和信号处理是紧密耦合的，所以其中光学和数字部件在设计过程中被联合优化的系统自然预期会有最佳的性能。光学部件被配置成使光学(装置)随散焦效应的变化或对散焦效应的敏感度最小化以及实现高效的信号处理。数字部件被设计成使算法复杂性、处理时间以及数字处理对图像噪声的影响最小化。As mentioned above, wavefront encoded imaging systems include digital signal processing of the resulting images and unconventional optical designs. The signal processing used depends on the particular optical system. The wavefront encoding optics (device) depend on the type and amount of signal processing to be used. Since optics (device) and signal processing are tightly coupled, systems in which optical and digital components are jointly optimized during the design process are naturally expected to have the best performance. The optical components are configured to minimize optical (device) variation with or sensitivity to defocus effects and to enable efficient signal processing. The digital components are designed to minimize algorithmic complexity, processing time, and the impact of digital processing on image noise.

图12是一个实施例下的、使用在扩展的景深内提取的三维信息进行的基于姿态的控制的流程图。一个实施例的基于姿态的控制包括利用成像系统对身体成像1202。成像1202包括产生身体的波前编码图像。一个实施例的基于姿态的控制包括自动检测1204身体的姿态，该姿态包括身体的瞬时状态。检测1204包括聚集一瞬间的该姿态的姿态数据。该姿态数据包括成像系统的景深内的身体的焦点分辨数据。一个实施例的基于姿态的控制包括将姿态翻译1206成姿态信号。一个实施例的基于姿态的控制包括响应于姿态信号而控制1208与计算机耦合的部件。Figure 12 is a flowchart of gesture-based control using three-dimensional information extracted within an extended depth of field, under one embodiment. Gesture-based control of an embodiment includes imaging 1202 the body with an imaging system. Imaging 1202 includes generating a wavefront encoded image of the body. The gesture-based control of one embodiment includes automatically detecting 1204 the gesture of the body, including the instantaneous state of the body. Detecting 1204 includes aggregating gesture data for a moment in time for the gesture. The pose data includes focus resolved data of the body within the depth of field of the imaging system. The gesture-based control of one embodiment includes translating 1206 the gesture into a gesture signal. Gesture-based control of one embodiment includes controlling 1208 components coupled to the computer in response to the gesture signal.

用于一个实施例的波前编码的基本例程可包括对通过典型球面和非球面表面以及一般波前编码表面形式的光线进行追踪的光线追踪程序。光线追踪程序用来计算出射光瞳和优化给定的一组光学和数字品质(merit)函数或操作数。图13是一个实施例下的、在基于姿态的控制系统中使用的波前编码设计过程1300的框图。该设计的输出包括但不限于：传统光学表面、材料、厚度和间距；波前编码表面的参数；和数字滤波器系数。The basic routine for wavefront encoding for one embodiment may include a ray tracing program that traces rays through typical spherical and aspheric surfaces as well as general wavefront encoding surface forms. A ray tracing program is used to calculate the exit pupil and optimize a given set of optical and numerical merit functions or operands. Figure 13 is a block diagram of a wavefront code design process 1300 for use in an attitude-based control system, under one embodiment. The output of this design includes, but is not limited to: conventional optical surfaces, materials, thicknesses, and spacings; parameters of the wavefront-encoded surfaces; and digital filter coefficients.

现在参照图13描述一般的光学/数字设计环路。光线追踪程序1302时通过光学表面的光线进行追踪，以计算出射光瞳光程差(OPD)1304并优化给定的一组光学和数字品质函数或操作数。至光线追踪程序1302的输入包括例如光学表面、厚度和工作条件(波长、视野、温度范围、样本物体图像等)。在1306计算或产生OTF，并在1308添加与检测器几何结构相关的像素OTF。在1310计算采样OTF和PSF。针对基于采样PSF选择的处理算法，产生1312数字滤波器系数。该处理接下来形成用于滤波器的品质因数(例如，波前编码操作数)，该品质因数是基于最小化以下二者的：采样PSF和MTF因混叠、因温度变化、随色彩、随场角、通过焦点等的变化；诸如处理量、处理形式、处理相关图像噪声、数字滤波器噪声增益等的数字处理参数。通过优化例程将波前编码操作数与传统光学操作数(Seidel波前偏差、RMS波前误差等)相结合以修改光学表面。操作返回到通过传统光线追踪来产生1302出射光瞳光程差(OPD)。A general optical/digital design loop is now described with reference to FIG. 13 . The ray tracing procedure 1302 traces rays through an optical surface to calculate an exit pupil path difference (OPD) 1304 and optimize a given set of optical and numerical quality functions or operands. Inputs to the ray tracing program 1302 include, for example, optical surfaces, thicknesses, and operating conditions (wavelength, field of view, temperature range, sample object images, etc.). OTFs are calculated or generated at 1306 and pixel OTFs related to the detector geometry are added at 1308 . At 1310 the sampled OTF and PSF are calculated. For a processing algorithm selected based on the sampled PSF, 1312 digital filter coefficients are generated. The process then forms the figure of merit for the filter (e.g., wavefront-encoded operands) based on minimizing both: sampled PSF and MTF due to aliasing, due to temperature variation, due to color, due to Variations in field angle, pass focus, etc.; digital processing parameters such as processing volume, processing form, processing related image noise, digital filter noise gain, etc. Wavefront encoded operands are combined with conventional optical operands (Seidel wavefront deviation, RMS wavefront error, etc.) by optimization routines to modify optical surfaces. Operation returns to generating 1302 the exit pupil path difference (OPD) by conventional ray tracing.

使用理论计算出的波前编码表面形式作为光学优化的出发点。在归一化坐标中给出可矩形分离的表面形式的一个一般族：Use the theoretically calculated wavefront-encoded surface form as a starting point for optical optimization. A general family of rectangularly separable surface forms is given in normalized coordinates:

S(x)＝|β|sign(x)|x|^α S(x)＝|β|sign(x)|x| ^α

其中对于x＞0，sign(x)＝+1；对于x≤0，sign(x)＝-1。Wherein for x>0, sign(x)=+1; for x≤0, sign(x)=-1.

指数参数α控制散焦范围内的MTF的高度，参数β控制对散焦的敏感度。通常，增大参数β降低了对散焦的敏感度，同时降低了MTF的高度并增大了所得到的PSF的长度。The exponential parameter α controls the height of the MTF in the defocus range, and the parameter β controls the sensitivity to defocus. In general, increasing the parameter β reduces the sensitivity to defocus while reducing the height of the MTF and increasing the length of the resulting PSF.

用于重构中间图像并产生最终图像的滤波处理可能施加计算负担。取决于由编码处理引入的景深增强以及光学系统，图像重构所需的滤波器内核的大小可多达70×70个系数。通常，景深扩展越大，则滤波器内核就越大，噪声特性恶化或噪声增益也越大。此外，由于图像中的每个像素因波前编码而模糊，所以每个像素需要被滤波；因此，较大的图像相对于较小的图像可能需要较多的计算。对于接近千万像素的图像大小，高效的计算方案被用于实际的和经济的系统。计算实施(比如可矩形分离的滤波器近似)可帮助减小内核尺度。例如所使用的波前编码元件可具有如下式所述的可矩形分离的立方相形式：The filtering process used to reconstruct the intermediate image and produce the final image can impose a computational burden. Depending on the depth-of-field enhancement introduced by the encoding process and the optics, the size of the filter kernel required for image reconstruction can be as much as 70×70 coefficients. Generally, the larger the depth of field extension, the larger the filter kernel and the worse the noise characteristics or the noise gain. Furthermore, since each pixel in the image is blurred by wavefront encoding, each pixel needs to be filtered; therefore, larger images may require more computation than smaller images. For image sizes approaching 10 million pixels, computationally efficient schemes are used for practical and economical systems. Computational implementations such as rectangularly separable filter approximations can help reduce kernel size. For example, the wavefront encoding element used may have the form of a rectangularly separable cubic phase as described by:

S(x，y)＝a(x³+y³)S(x,y)=a(x ³ +y ³ )

对模糊图像滤波以去除模糊实质上是根据空间频率施加放大和相移。该放大既增大了最终图像中的信号也增大了最终图像中的噪声。对于很大(例如超过10倍)的景深增强，波前编码系统中的噪声增益可为四倍或五倍。对于二到四倍的较适中的景深增强，噪声增益通常为二倍或更小。Filtering a blurred image to remove blur essentially applies magnification and phase shifting according to spatial frequency. This amplification increases both the signal and the noise in the final image. For very large (eg, more than 10 times) depth enhancements, the noise gain in the wavefront encoding system may be quadrupled or quintupled. For moderate depth-of-field enhancements of two to four times, the noise gain is usually two times or less.

对于不相关的高斯噪声(对于大多数图像的良好假设)，噪声增益为滤波器系数的RMS值。对于景深扩展太大以至于不能产生适当小的噪声增益值的系统，减小数字滤波器的分辨率或空间带宽可降低噪声增益。降低最终图像的对比度也能减小增大的噪声的总体影响。专门化的非线性滤波是去除波前编码图像中的噪声的最佳方案。For uncorrelated Gaussian noise (a good assumption for most images), the noise gain is the RMS value of the filter coefficients. For systems where the depth of field extension is too large to produce suitably small noise gain values, reducing the resolution or spatial bandwidth of the digital filter can reduce the noise gain. Reducing the contrast of the final image can also reduce the overall effect of increased noise. Specialized nonlinear filtering is the best solution to remove noise in wavefront encoded images.

由于一个实施例中用于形成MTF和PSF的波前编码光学元件是可矩形分离的，所以所使用的信号处理也可以是可矩形分离的。可矩形分离的处理可以使所需计算量降低一个或多个量级。由于数字滤波是利用空间卷积进行的，所以一个实施例的计算方法包括：用滤波器系数对数据进行比例缩放的一系列乘法；以及遍及整个内核将所有经比例缩放的数据值相加的求和。该计算的基本单元是乘-累加运算。用于大的景深增大的典型2D波前编码滤波器内核可为30×30个系数。该滤波器的可矩形分离的版本包含长度为30个系数的行滤波器和高度为30个系数的列滤波器，或者包含60个总系数。尽管波前编码元件在设计上可以是可矩形分离的，但它们不限于此，高度偏差的系统可使用不可分离的滤波。Since the wavefront encoding optics used to form the MTF and PSF are rectangularly separable in one embodiment, the signal processing used may also be rectangularly separable. Rectangle separable processing can reduce the required computation by one or more orders of magnitude. Since digital filtering is performed using spatial convolutions, the computation method of one embodiment includes: a series of multiplications that scale the data by the filter coefficients; and a summation of all scaled data values throughout the kernel and. The basic unit of this calculation is the multiply-accumulate operation. A typical 2D wavefront encoding filter kernel for large depth-of-field augmentation may be 30x30 coefficients. A rectangular separable version of this filter contains a row filter with a length of 30 coefficients and a column filter with a height of 30 coefficients, or 60 total coefficients. Although the wavefront encoding elements may be rectangularly separable by design, they are not limited thereto, and non-separable filtering may be used for highly skewed systems.

通过将光学成像技术与电子滤波相结合，波前编码技术可改善各种各样的成像系统的性能。高性能成像系统的性能提升可包括在不牺牲光收集或空间分辨率的情况下实现很大的景深。较低成本成像系统的性能提升可包括在比传统上所需的更少的物理部件的情况下实现良好的图像质量。By combining optical imaging techniques with electronic filtering, wavefront encoding improves the performance of a wide variety of imaging systems. Performance gains in high-performance imaging systems can include achieving a large depth of field without sacrificing light collection or spatial resolution. Increased performance of lower cost imaging systems can include achieving good image quality with fewer physical parts than traditionally required.

这里所述的实施例包括一种系统，该系统包括：多个光学检测器，其中该多个光学检测器中的至少两个光学检测器包括波前编码相机，其中该多个光学检测器对身体成像；与该多个光学检测器耦合的处理器，该处理器自动检测身体的姿态，其中该姿态包括身体的瞬时状态，其中该检测包括聚集一瞬间的该姿态的姿态数据，该姿态数据包括成像系统的景深内的身体的焦点分辨数据，该处理器将姿态翻译成姿态信号并使用该姿态信号来控制与该处理器耦合的部件。Embodiments described herein include a system comprising: a plurality of optical detectors, wherein at least two of the plurality of optical detectors comprise wavefront encoded cameras, wherein the plurality of optical detectors body imaging; a processor coupled to the plurality of optical detectors, the processor automatically detecting a pose of the body, wherein the pose includes an instantaneous state of the body, wherein the detecting includes gathering pose data for a moment in time of the pose, the pose data Including focus-resolved data of the body within a depth of field of the imaging system, the processor translates the pose into a pose signal and uses the pose signal to control components coupled to the processor.

一个实施例的波前编码相机包括波前编码光学元件。The wavefront encoded camera of one embodiment includes wavefront encoded optics.

一个实施例的成像包括产生身体的波前编码图像。Imaging of one embodiment includes generating a wavefront encoded image of the body.

一个实施例的波前编码相机包括增大成像的焦深的相位掩模。The wavefront encoded camera of one embodiment includes a phase mask that increases the depth of focus of imaging.

一个实施例的姿态数据包括景深内的身体的焦点分辨范围数据。The pose data of one embodiment includes focus resolution range data of the body within the depth of field.

一个实施例的景深内的身体的焦点分辨范围数据来自于波前编码相机的输出。The body's focus resolution range data within depth of field of one embodiment comes from the output of a wavefront encoded camera.

一个实施例的姿态数据包括景深内的身体的焦点分辨位置数据。The pose data of one embodiment includes focus-resolved position data of the body within the depth of field.

一个实施例的景深内的身体的焦点分辨位置数据来自于波前编码相机的输出。The focus resolved position data of the body within the depth of field of one embodiment comes from the output of the wavefront encoded camera.

一个实施例的系统包括不随身体与成像系统之间的距离而变化的调制传递函数和点扩散函数。The system of one embodiment includes a modulation transfer function and a point spread function that do not vary with distance between the body and the imaging system.

一个实施例的系统包括不关于散焦而变化的调制传递函数和点扩散函数。The system of one embodiment includes a modulation transfer function and a point spread function that do not vary with respect to defocus.

一个实施例的处理器通过对由波前编码相机收集到的图像进行编码来产生中间图像。The processor of one embodiment generates the intermediate image by encoding the image collected by the wavefront encoded camera.

一个实施例的中间图像是模糊的。The intermediate image of one embodiment is blurred.

一个实施例的中间图像对包含散焦偏差的多个光学检测器或者身体的变化不敏感。The intermediate image of one embodiment is insensitive to multiple optical detectors or body variations including defocus bias.

一个实施例的姿态数据是表示姿态的三维空间位置数据。The gesture data of one embodiment is three-dimensional spatial position data representing gestures.

一个实施例的检测包括检测身体的位置、检测身体的取向中的至少一种，且检测包括检测身体的运动。The detecting of an embodiment includes at least one of detecting a position of the body, detecting an orientation of the body, and detecting includes detecting motion of the body.

一个实施例的检测包括标识姿态，其中标识包括标识身体的一部分的姿势和取向。The detecting of an embodiment includes identifying a pose, where identifying includes identifying a pose and orientation of a part of the body.

一个实施例的检测包括检测身体的第一组附属物和第二组附属物中的至少一种。The detecting of an embodiment includes detecting at least one of the first set of appendages and the second set of appendages of the body.

一个实施例的检测包括动态检测至少一个标签的位置。The detecting of an embodiment comprises dynamically detecting the position of at least one tag.

一个实施例的检测包括检测与身体的一部分耦合的一组标签的位置。The detecting of an embodiment includes detecting the position of a set of tags coupled to the body part.

一个实施例的该组标签中的每个标签包括图案，其中该组标签中的每个标签的每个图案不同于多个标签中的任何剩余标签的任何图案。Each label of the set of labels of an embodiment includes a pattern, wherein each pattern of each label of the set of labels is different from any pattern of any remaining labels of the plurality of labels.

一个实施例的检测包括动态检测和定位身体上的标记。The detection of one embodiment includes dynamically detecting and locating markers on the body.

一个实施例的检测包括检测与身体的一部分耦合的一组标记的位置。The detecting of one embodiment includes detecting the position of a set of markers coupled to the body part.

一个实施例的该组标记形成身体上的多个图案。The set of markings of one embodiment form a plurality of patterns on the body.

一个实施例的检测包括使用与每个附属物耦合的一组标记来检测身体的多个附属物的位置。The detecting of an embodiment includes detecting the position of the plurality of appendages of the body using a set of markers coupled to each appendage.

一个实施例的翻译包括将姿态的信息翻译成姿态记号。The translation of one embodiment includes translating information of gestures into gesture tokens.

一个实施例的姿态记号表示姿态词汇，且姿态信号包括姿态词汇的传达。The gesture token of one embodiment represents a gesture vocabulary, and the gesture signal includes a conveyance of the gesture vocabulary.

一个实施例的姿态词汇以文本形式表示身体的运动学联动机构的瞬时姿势状态。The pose vocabulary of one embodiment represents in textual form the instantaneous pose state of the body's kinematic linkages.

一个实施例的姿态词汇以文本形式表示身体的运动学联动机构的取向。The gesture vocabulary of one embodiment represents the orientation of the body's kinematic linkages in textual form.

一个实施例的姿态词汇以文本形式表示身体的运动学联动机构的取向的组合。The gesture vocabulary of one embodiment represents, in textual form, combinations of orientations of the body's kinematic linkages.

一个实施例的姿态词汇包括表示身体的运动学联动机构的状态的字符串。The gesture vocabulary of one embodiment includes character strings representing states of the body's kinematic linkages.

一个实施例的运动学联动机构是身体的至少一个第一附属物。The kinematic linkage of an embodiment is at least one first appendage of the body.

一个实施例的系统包括将该串中的每个位置分配给第二附属物，第二附属物连接到第一附属物。The system of one embodiment includes assigning each position in the string to a second appendage, the second appendage being connected to the first appendage.

一个实施例的系统包括将多个字符中的字符分配给第二附属物的多个位置中的每个。The system of one embodiment includes assigning a character of the plurality of characters to each of the plurality of positions of the second appendage.

一个实施例的该多个位置是相对于坐标原点而确立的。The plurality of positions of one embodiment are established relative to a coordinate origin.

一个实施例的系统包括：使用空间中的绝对位置和取向来确立坐标原点；使用相对于身体的固定位置和取向来确立坐标原点，而无论身体的总体位置和朝向如何；或者响应于身体的动作而交互式地确立坐标原点。The system of one embodiment includes: establishing an origin of coordinates using an absolute position and orientation in space; establishing an origin of coordinates using a fixed position and orientation relative to the body regardless of the overall position and orientation of the body; or responding to motion of the body Instead, establish the coordinate origin interactively.

一个实施例的系统包括将多个字符中的字符分配给第一附属物的多个取向中的每个。The system of one embodiment includes assigning a character of the plurality of characters to each of the plurality of orientations of the first appendage.

一个实施例的检测包括检测身体的推断位置何时与虚拟空间相交，其中虚拟空间包括描绘在与计算机耦合的显示装置上的空间。The detecting of one embodiment includes detecting when the inferred position of the body intersects a virtual space, wherein the virtual space includes a space depicted on a display device coupled to the computer.

一个实施例的控制该部件包括：当推断位置与虚拟空间中的虚拟物体相交时控制虚拟物体。Controlling the component of one embodiment includes controlling the virtual object when the inferred position intersects the virtual object in the virtual space.

一个实施例的控制该部件包括：响应于虚拟空间中的推断位置而控制虚拟空间中的虚拟物体的位置。Controlling the component of one embodiment includes controlling a position of the virtual object in the virtual space in response to the inferred position in the virtual space.

一个实施例的控制该部件包括：响应于姿态而控制虚拟空间中的虚拟物体的姿态。Controlling the component of an embodiment includes controlling a gesture of the virtual object in the virtual space in response to the gesture.

一个实施例的系统包括对检测和控制进行比例控制以产生虚拟空间与物理空间之间的一致，其中虚拟空间包括描绘在与处理器耦合的显示装置上的空间，其中物理空间包括身体所处的空间。The system of one embodiment includes proportional control of detection and control to create a correspondence between a virtual space including a space depicted on a display device coupled to a processor, and a physical space including a space in which a body is located. space.

一个实施例的系统包括响应于物理空间中的至少一个物理物体的移动而控制虚拟空间中的至少一个虚拟物体。The system of one embodiment includes controlling at least one virtual object in a virtual space in response to movement of at least one physical object in the physical space.

一个实施例的控制包括以下控制中的至少一个：控制处理器上掌管的应用的运行；和控制处理器上显示的部件。The control of an embodiment includes at least one of: controlling the execution of an application hosted on the processor; and controlling a component displayed on the processor.

这里所述的实施例包括一种方法，该方法包括：利用成像系统对身体成像，该成像包括产生身体的波前编码图像；自动检测身体的姿态，其中该姿态包括身体的瞬时状态，其中该检测包括聚集一瞬间的该姿态的姿态数据，该姿态数据包括成像系统的景深内的身体的焦点分辨数据；将该姿态翻译成姿态信号；或者响应于该姿态信号而控制与计算机耦合的部件。Embodiments described herein include a method comprising: imaging a body with an imaging system, the imaging comprising generating a wavefront encoded image of the body; automatically detecting a pose of the body, wherein the pose includes a momentary state of the body, wherein the Detecting includes aggregating pose data for a moment in time of the pose, the pose data including focus resolved data of the body within a depth of field of the imaging system; translating the pose into a pose signal; or controlling a component coupled to the computer in response to the pose signal.

一个实施例的成像系统包括多个光学检测器，其中光学检测器中的至少两个是包括波前编码光学元件的波前编码相机。The imaging system of one embodiment includes a plurality of optical detectors, wherein at least two of the optical detectors are wavefront-encoded cameras that include wavefront-encoded optics.

一个实施例的成像系统包括多个光学检测器，其中光学检测器中的至少两个是包括增大成像的焦深的相位掩模的波前编码相机。The imaging system of one embodiment includes a plurality of optical detectors, wherein at least two of the optical detectors are wavefront encoded cameras including a phase mask that increases the depth of focus of imaging.

一个实施例的景深内的身体的焦点分辨范围数据来自于成像系统的输出。The focus resolution range data of the body within the depth of field of one embodiment comes from the output of the imaging system.

一个实施例的景深内的身体的焦点分辨位置数据来自于成像系统的输出。The focus resolved position data of the body within the depth of field of one embodiment comes from the output of the imaging system.

一个实施例的方法包括产生不随身体与成像系统之间的距离而变化的调制传递函数和点扩散函数。The method of one embodiment includes generating a modulation transfer function and a point spread function that are invariant to the distance between the body and the imaging system.

一个实施例的方法包括产生不关于散焦而变化的调制传递函数和点扩散函数。The method of one embodiment includes generating a modulation transfer function and a point spread function that do not vary with respect to defocus.

一个实施例的方法包括通过对由波前编码相机收集到的图像进行编码来产生中间图像。The method of one embodiment includes generating an intermediate image by encoding images collected by a wavefront encoded camera.

一个实施例的中间图像对包含散焦偏差的成像系统的多个光学检测器或者身体的变化不敏感。The intermediate image of one embodiment is insensitive to multiple optical detectors of the imaging system or to changes in the body including defocus bias.

一个实施例的检测包括检测身体的位置。The detecting of one embodiment includes detecting the position of the body.

一个实施例的检测包括检测身体的取向。The detection of one embodiment includes detecting the orientation of the body.

一个实施例的检测包括检测身体的运动。The detection of one embodiment includes detecting motion of the body.

一个实施例的方法包括将该串中的每个位置分配给第二附属物，第二附属物连接到第一附属物。The method of one embodiment includes assigning each position in the string to a second appendage, the second appendage being connected to the first appendage.

一个实施例的方法包括将多个字符中的字符分配给第二附属物的多个位置中的每个。The method of one embodiment includes assigning a character of the plurality of characters to each of the plurality of positions of the second appendage.

一个实施例的方法包括：使用空间中的绝对位置和取向来确立坐标原点；使用相对于身体的固定位置和取向来确立坐标原点，而无论身体的总体位置和朝向如何；或者响应于身体的动作而交互式地确立坐标原点。The method of one embodiment includes: establishing the coordinate origin using an absolute position and orientation in space; establishing the coordinate origin using a fixed position and orientation relative to the body, regardless of the body's overall position and orientation; or responding to body motion Instead, establish the coordinate origin interactively.

一个实施例的方法包括将多个字符中的字符分配给第一附属物的多个取向中的每个。The method of one embodiment includes assigning a character of the plurality of characters to each of the plurality of orientations of the first appendage.

一个实施例的方法包括：对检测和控制进行比例控制以产生虚拟空间与物理空间之间的一致，其中虚拟空间包括描绘在与处理器耦合的显示装置上的空间，其中物理空间包括身体所处的空间。The method of one embodiment includes scaling the sensing and controlling to create a coincidence between a virtual space including a space depicted on a display device coupled to a processor, and a physical space including a physical space where a body is located. Space.

一个实施例的方法包括：根据与处理器耦合的至少一个应用的需要、在虚拟空间与物理空间之间翻译比例、角度、深度和尺度。The method of one embodiment includes translating scale, angle, depth and scale between virtual space and physical space as required by at least one application coupled to the processor.

一个实施例的方法包括：响应于物理空间中的至少一个物理物体的移动而控制虚拟空间中的至少一个虚拟物体。The method of an embodiment includes controlling at least one virtual object in a virtual space in response to movement of at least one physical object in the physical space.

一个实施例的控制包括控制处理器上掌管的应用的运行。The controlling of one embodiment includes controlling the execution of applications hosted on the processor.

一个实施例的控制包括控制处理器上显示的部件。Control of one embodiment includes controlling components displayed on the processor.

这里所述的系统和方法包括处理系统，并且/或者在处理系统下运行和/或与处理系统相关联地运行。在本领域中已知，处理系统包括处理系统或装置的部件、或者一起工作的计算装置或基于处理器的装置的任何集合。例如，处理系统可包括便携式计算机、在通信网络中工作的便携式通信装置和/或网络服务器中的一个或多个。便携式计算机可以是从个人计算机、移动电话、个人数字助理、便携式计算装置和便携式通信装置中选择的任何多个装置和/或装置组合，但不限于此。处理系统可包括更大的计算机系统内的部件。The systems and methods described herein include and/or operate under and/or in association with a processing system. As is known in the art, a processing system includes components of a processing system or device, or any collection of computing or processor-based devices working together. For example, a processing system may include one or more of a portable computer, a portable communication device operating in a communication network, and/or a network server. A portable computer may be any number and/or combination of devices selected from, but not limited to, a personal computer, a mobile phone, a personal digital assistant, a portable computing device, and a portable communication device. The processing system may include components within a larger computer system.

一个实施例的处理系统包括至少一个处理器和至少一个存储器件或子系统。处理系统还可包括或耦合到至少一个数据库。这里广泛使用的术语“处理器”指的是任何逻辑处理单元，比如一个或多个中央处理单元(CPU)、数字信号处理器(DSP)、专用集成电路(ASIC)等。处理器和存储器可单片地集成到单个芯片上，分布在主机系统的多个芯片或部件当中，和/或由某个算法组合提供。这里所述的方法可以用软件算法、程序、固件、硬件、部件、电路中的一个或多个以任意组合加以实现。The processing system of one embodiment includes at least one processor and at least one memory device or subsystem. The processing system may also include or be coupled to at least one database. The term "processor" is used broadly herein to refer to any logical processing unit, such as one or more central processing units (CPUs), digital signal processors (DSPs), application specific integrated circuits (ASICs), and the like. The processor and memory may be monolithically integrated on a single chip, distributed among multiple chips or components in the host system, and/or provided by some algorithmic combination. The methods described here can be implemented by any combination of one or more of software algorithms, programs, firmware, hardware, components, and circuits.

体现这里所述的系统和方法的系统部件可放置在一起或者放置在分开的位置。因此，体现这里所述的系统和方法的系统部件可以是单个系统、多个系统和/或地理上分开的系统的部件。这些部件也可以是单个系统、多个系统和/或地理上分开的系统的子部件或子系统。这些部件可耦合到主机系统或与主机系统耦合的系统的一个或多个其他部件。System components embodying the systems and methods described herein may be located together or in separate locations. Thus, system components embodying the systems and methods described herein may be components of a single system, multiple systems, and/or geographically separated systems. These components may also be subcomponents or subsystems of a single system, multiple systems, and/or geographically separated systems. These components may be coupled to one or more other components of the host system or a system coupled to the host system.

通信路径将系统部件耦合，并包括用于在部件之间通信或传递文件的介质。通信路径包括无线连接、有线连接和混合无线/有线连接。通信路径还包括：与包括局域网(LAN)、城域网(MAN)、万维网(WAN)、专用网络、办公网络或后端网络以及因特网在内的网络的耦合或连接。此外，通信路径包括可拆卸的固定介质，如软盘、硬盘驱动器和CD-ROM盘、以及闪存RAM、通用串行总线(USB)连接、RS-232连接、电话线、总线和电子邮件消息。Communication paths couple system components and include the media used to communicate or transfer files between the components. Communication paths include wireless connections, wired connections, and hybrid wireless/wired connections. Communication paths also include couplings or connections to networks including Local Area Networks (LANs), Metropolitan Area Networks (MANs), World Wide Webs (WANs), private networks, office or backend networks, and the Internet. Additionally, communication paths include removable fixed media such as floppy disks, hard drives, and CD-ROM disks, as well as flash RAM, Universal Serial Bus (USB) connections, RS-232 connections, telephone lines, bus, and email messages.

除了上下文另有明确要求，在整个描述中，词语“包括(comprise)”、“包括(comprising)”等应以与排他性或穷举性含义相反的包括性含义加以解释；也就是说，应以“包括但不限于”的含义加以解释。同样，使用单数或复数的词语分别包括复数或单数。另外，词语“这里”、“下文中”、“上面”、“下面”和意思类似的词语是指本申请的整体，而不是指本申请的任何特定部分。当针对两个或更多项目的列表而使用词语“或”时，该词语涵盖该词语的所有以下解释：该列表中的任何项目、该列表中的所有项目以及该列表中的项目的任何组合。Unless the context clearly requires otherwise, throughout this description the words "comprise", "comprising", etc. are to be construed in an inclusive sense as opposed to an exclusive or exhaustive meaning; that is, with The meaning of "including but not limited to" shall be interpreted. Likewise, words using the singular or the plural include the plural or the singular respectively. Additionally, the words "herein," "hereafter," "above," "below," and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word "or" is used with reference to a list of two or more items, that word covers all of the following constructions of that word: any item in that list, all items in that list, and any combination of items in that list .

对处理环境的实施例的以上描述不意在是排他性的，所描述的系统和方法不限于所公开的精确形式。尽管这里为说明的目的描述了处理环境的具体实施例和例子，但本领域的技术人员应认识到，在其他系统和方法的范围内可以进行各种等效修改。这里提供的处理环境的教导可应用于其他处理系统和方法，而不仅是上面描述的系统和方法。The above description of embodiments of a processing environment is not intended to be exhaustive, and the systems and methods described are not limited to the precise forms disclosed. While specific embodiments of, and examples for, processing environments are described herein for illustrative purposes, various equivalent modifications are possible within the scope of other systems and methods, those skilled in the art will recognize. The teachings of the processing environment provided here can be applied to other processing systems and methods, not just the ones described above.

可将上面描述的各种实施例的要素和操作相组合以提供更多的实施例。可根据上面的详细描述对处理环境进行这些和其他改变。The elements and operations of the various embodiments described above can be combined to provide further embodiments. These and other changes to the processing environment can be made in light of the detailed description above.

Claims

1. use the three-dimensional information extracted in the depth of field of extension to carry out a system for control based on attitude, including:

Multiple fluorescence detectors, at least two fluorescence detector in wherein said multiple fluorescence detectors includes wavefront coded phase Machine, wherein said multiple fluorescence detectors are to body imaging；And

The processor coupled with the plurality of fluorescence detector, described processor detects the attitude of health, wherein said appearance automatically State include the instantaneous state of described health, wherein said detection include only assembling flashy described attitude attitude data and Getting rid of the use to background data in the detection to described attitude, described attitude data includes that described health is relative to as sky Absolute position and the focus of the neutral position of orientation between differentiate position data, and wherein said position data is three-dimensional information, Described attitude is translated into attitude signal and uses described attitude signal to control to couple with described processor by described processor Parts.

System the most according to claim 1, wherein said wavefront coded camera includes wavefront coded optical element.

System the most according to claim 1, wherein said imaging includes the wavefront coded image producing described health.

System the most according to claim 1, wherein said wavefront coded camera includes the phase increasing the depth of focus of described imaging Bit mask.

System the most according to claim 1, wherein said attitude data includes that the focus of the described health in the depth of field is differentiated Range data.

System the most according to claim 5, the described focus resolving range data of the described health in the wherein said depth of field Come from the output of described wavefront coded camera.

System the most according to claim 1, wherein said attitude data includes that the focus of the described health in the depth of field is differentiated Position data.

System the most according to claim 7, the described focus of the described health in the wherein said depth of field differentiates position data Come from the output of described wavefront coded camera.

System the most according to claim 1, including with the distance between described health and the plurality of fluorescence detector And the modulation transfer function (MTF) changed and point spread function.

System the most according to claim 1, the modulation transfer function (MTF) changed including being not in relation to defocus and point spread function Number.

11. systems according to claim 1, wherein said processor is by being collected by described wavefront coded camera Image carries out coding to produce intermediate image.

12. systems according to claim 11, wherein said intermediate image is fuzzy.

13. systems according to claim 11, wherein said intermediate image defocuses the plurality of optics of deviation to comprising The change of detector or described health is insensitive.

14. systems according to claim 1, wherein said attitude data is the three-dimensional space position number representing described attitude According to.

15. systems according to claim 1, wherein said detection includes detecting the position of described health, detecting described body At least one in the orientation of body, and detection include detecting the motion of described health.

16. systems according to claim 1, wherein said detection includes identifying described attitude, and wherein said mark includes Identify posture and the orientation of a part for described health.

17. systems according to claim 1, wherein said detection includes the first group of appurtenance and detecting described health At least one in two groups of appurtenances.

18. systems according to claim 1, wherein said detection includes the position dynamically detecting at least one label.

19. systems according to claim 18, wherein said detection includes detecting what the part with described health coupled The position of one group of label.

20. systems according to claim 19, wherein each label in this group label includes pattern, wherein this group label In any pattern of any residue label of being different from this group label of each pattern of each label.

21. systems according to claim 1, wherein said detection includes dynamically detecting and position the mark on described health Note.

22. systems according to claim 21, wherein said detection includes detecting what the part with described health coupled The position of one group echo.

23. systems according to claim 22, wherein this group echo forms the multiple patterns on described health.

24. systems according to claim 21, wherein said detection includes using in the multiple appurtenances with described health Each appurtenance coupling a group echo detect the plurality of appendicular position.

25. systems according to claim 1, wherein said translation includes the information of described attitude is translated into attitude note Number.

26. system according to claim 25, wherein said attitude mark represents attitude vocabulary, and described attitude signal bag Include the reception and registration of described attitude vocabulary.

27. systems according to claim 26, wherein said attitude vocabulary represents the motion of described health in the form of text Learn the momentary gesture state of link gear.

28. systems according to claim 26, wherein said attitude vocabulary represents the motion of described health in the form of text Learn the orientation of link gear.

29. systems according to claim 26, wherein said attitude vocabulary represents the motion of described health in the form of text Learn the combination of the orientation of link gear.

30. systems according to claim 26, wherein said attitude vocabulary includes the kinesiology linkage representing described health The character string of the state of mechanism.

31. systems according to claim 30, wherein said kinesiology link gear be described health at least one One appurtenance.

32. systems according to claim 31, distribute to the second appurtenance including by each position in described character string, Described second appurtenance is connected to described first appurtenance.

33. systems according to claim 32, distribute to institute including by the character in the multiple characters in described character string That states in second appendicular multiple positions is each.

34. systems according to claim 33, wherein said multiple positions are established relative to zero.

35. systems according to claim 34, including: use the absolute position in space and orientation to establish described coordinate Initial point；Use and establish described zero, regardless of whether described health is total relative to the fixed position of described health and orientation Body position and towards how；Or interactively establish described zero in response to the action of described health.

36. systems according to claim 33, distribute including by the character in the plurality of character in described character string Each in described first appendicular multiple orientations.

37. systems according to claim 31, wherein said detection include detecting the inferred position of described health when with Virtual Space is intersected, and wherein said Virtual Space includes the space being depicted in the display device coupled with described processor.

38. according to the system described in claim 37, wherein controls described parts and includes: when described inferred position is virtual with described Dummy object in space controls described dummy object when intersecting.

39., according to the system described in claim 38, wherein control described parts and include: in response to the institute in described Virtual Space State inferred position and control the position of the described dummy object in described Virtual Space.

40. according to the system described in claim 38, wherein controls described parts and includes: control described in response to described attitude The attitude of the described dummy object in Virtual Space.

41. systems according to claim 1, control including described detection and control carry out ratio to produce Virtual Space Consistent with between physical space, wherein said Virtual Space includes being depicted in the display device coupled with described processor Space, wherein said physical space includes the space residing for described health.

42. systems according to claim 41, including in response at least one physical objects in described physical space Move and control at least one dummy object in described Virtual Space.

43. systems according to claim 1, wherein said control includes at least one in following control: control described The operation of the application administered on processor；With the parts of display on the described processor of control.

44. 1 kinds use the method that the three-dimensional information extracted in the depth of field of extension carries out control based on attitude, including:

Utilizing imaging system to body imaging, described imaging includes the wavefront coded image producing described health；

Automatically detecting the attitude of health, wherein said attitude includes that the instantaneous state of described health, wherein said detection only include Assemble the attitude data of flashy described attitude and in the detection to described attitude, get rid of the use to background data, institute State attitude data and include that described health differentiates position relative to the focus of the neutral position as the absolute position in space and orientation Putting data, wherein said position data is three-dimensional information；

Described attitude is translated into attitude signal；And

The parts coupled with computer are controlled in response to described attitude signal.

45. methods according to claim 44, wherein said imaging system includes multiple fluorescence detector, wherein said light At least two learned in detector is to include the wavefront coded camera of wavefront coded optical element.

46. methods according to claim 44, wherein said imaging includes the wavefront coded image producing described health.

47. methods according to claim 44, wherein said imaging system includes multiple fluorescence detector, wherein said light Learn the wavefront coded camera that at least two in detector is the phase mask including increasing the depth of focus of described imaging.

48. methods according to claim 44, wherein said attitude data includes that the focus of the described health in the depth of field is divided Distinguish range data.

49. methods according to claim 48, the described focus resolving range number of the described health in the wherein said depth of field According to the output coming from described imaging system.

50. methods according to claim 44, wherein said attitude data includes that the focus of the described health in the depth of field is divided Distinguish position data.

51. methods according to claim 50, the described focus of the described health in the wherein said depth of field differentiates positional number According to the output coming from described imaging system.

52. methods according to claim 44, including producing not with the distance between described health and described imaging system And the modulation transfer function (MTF) changed and point spread function.

53. methods according to claim 44, including producing the modulation transfer function (MTF) being not in relation to defocus and change and some expansion Dissipate function.

54. methods according to claim 44, including by carrying out the image collected by described wavefront coded camera Coding produces intermediate image.

55. methods according to claim 54, wherein said intermediate image is fuzzy.

56. methods according to claim 54, wherein said intermediate image defocuses the described imaging system of deviation to comprising Multiple fluorescence detectors or the change of described health insensitive.

57. methods according to claim 44, wherein said attitude data is the three-dimensional space position representing described attitude Data.

58. methods according to claim 44, wherein said detection includes the position detecting described health.

59. methods according to claim 44, wherein said detection includes the orientation detecting described health.

60. methods according to claim 44, wherein said detection includes the motion detecting described health.

61. methods according to claim 44, wherein said detection includes identifying described attitude, and wherein said mark includes Identify posture and the orientation of a part for described health.

62. methods according to claim 44, wherein said detection include detecting described health first group of appurtenance and At least one in second group of appurtenance.

63. methods according to claim 44, wherein said detection includes the position dynamically detecting at least one label.

64. methods according to claim 63, wherein said detection includes detecting what the part with described health coupled The position of one group of label.

65. methods according to claim 64, wherein each label in this group label includes pattern, wherein this group label In any pattern of any residue label of being different from this group label of each pattern of each label.

66. methods according to claim 44, wherein said detection includes dynamically detecting and position the mark on described health Note.

67. methods according to claim 66, wherein said detection includes detecting what the part with described health coupled The position of one group echo.

68. methods according to claim 67, wherein this group echo forms the multiple patterns on described health.

69. methods according to claim 66, wherein said detection includes using in the multiple appurtenances with described health Each appurtenance coupling a group echo detect the plurality of appendicular position.

70. methods according to claim 44, wherein said translation includes the information of described attitude is translated into attitude note Number.

71. methods according to claim 70, wherein said attitude mark represents attitude vocabulary, and described attitude signal bag Include the reception and registration of described attitude vocabulary.

72. represent the motion of described health in the form of text according to the method described in claim 71, wherein said attitude vocabulary Learn the momentary gesture state of link gear.

73. represent the motion of described health in the form of text according to the method described in claim 71, wherein said attitude vocabulary Learn the orientation of link gear.

74. represent the motion of described health in the form of text according to the method described in claim 71, wherein said attitude vocabulary Learn the combination of the orientation of link gear.

75. according to the method described in claim 71, and wherein said attitude vocabulary includes the kinesiology linkage representing described health The character string of the state of mechanism.

76. according to the method described in claim 75, wherein said kinesiology link gear be described health at least one One appurtenance.

77., according to the method described in claim 76, distribute to the second appurtenance including by each position in described character string, Described second appurtenance is connected to described first appurtenance.

78., according to the method described in claim 77, distribute to institute including by the character in the multiple characters in described character string That states in second appendicular multiple positions is each.

79. according to the method described in claim 78, and wherein said multiple positions are established relative to zero.

80. according to the method described in claim 79, including: use the absolute position in space and orientation to establish described coordinate Initial point；Use and establish described zero, regardless of whether described health is total relative to the fixed position of described health and orientation Body position and towards how；Or interactively establish described zero in response to the action of described health.

81., according to the method described in claim 78, distribute including by the character in the plurality of character in described character string Each in described first appendicular multiple orientations.

82. according to the method described in claim 76, wherein said detection include detecting the inferred position of described health when with Virtual Space is intersected, and wherein said Virtual Space includes the space being depicted in the display device coupled with described computer.

83. methods described in 2 according to Claim 8, wherein control described parts and include: when described inferred position is virtual with described Dummy object in space controls described dummy object when intersecting.

84. methods described in 3 according to Claim 8, wherein control described parts and include: in response to the institute in described Virtual Space State inferred position and control the position of the described dummy object in described Virtual Space.

85. methods described in 3 according to Claim 8, wherein control described parts and include: control in response to described attitude described The attitude of the described dummy object in Virtual Space.

86. methods according to claim 44, control including described detection and control carry out ratio to produce virtual sky Between consistent with between physical space, wherein said Virtual Space includes being depicted in the display device coupled with described computer Space, wherein said physical space includes the space residing for described health.

87. methods described in 6 according to Claim 8, including the needs according at least one application coupled with described computer, Ratio, angle, the degree of depth and yardstick is translated between described Virtual Space and described physical space.

88. methods described in 6 according to Claim 8, including in response at least one physical objects in described physical space Move and control at least one dummy object in described Virtual Space.

89. methods according to claim 44, the application that wherein said control includes controlling administering on described computer Run.

90. methods according to claim 44, wherein said control includes controlling the parts of display on described computer.