WO2000033253A1

WO2000033253A1 - Viewer for optical flow through a 3d time sequence

Info

Publication number: WO2000033253A1
Application number: PCT/US1999/028063
Authority: WO
Inventors: Philip R. Moorby; John S. Robotham
Original assignee: Synapix, Inc.
Priority date: 1998-11-24
Filing date: 1999-11-23
Publication date: 2000-06-08
Also published as: WO2000033253A8; AU1634000A

Abstract

A three-dimensional viewing technique that allows an operator to visualize the result of optical flow analysis through a sequence of images. The technique builds a track representing the movement of each feature point in a sequence of images and furthermore builds such a track for each feature point, to sub-pixel accuracy as desired. The tracks are are displayed in a 3D coordinate system whereby the x and y coordinates correspond to the coordinates of the feature in the image coordinate system and the z coordinate is a number associated with the temporal ordering of each image in the sequence. The resulting track display represents the evolution of the optical flow over time.

Description

VIEWER FOR OPTICAL FLOW THROUGH A 3D TIME SEQUENCE

FIELD OF THE INVENTION

The present invention relates to computer image processing and in particular to a technique for visualizing feature tracks and identifying errors and anomalies therein prior to subsequent processing.

BACKGROUND

An image processing function called feature tracking is the process of selecting features from an initial scene and then tracking these features across a related series of images of the same scene. Each image is typically represented as an array of pixel values, and a feature point in such an image is typically identified as a region of one or more pixels (or sub-pixels) .

Feature tracking is the basis for several techniques whereby multiple feature points are simultaneously tracked across related image frames to develop further information about this scene. These include techniques for tracking two-dimensional shapes across frames, for estimating three-dimensional paths of selected feature points, for estimating three-dimensional camera paths, and for recovering estimated three-dimensional scene structure (including estimated depths of object surfaces) . The use of feature tracking techniques in these applications can be very powerful, because they transform an image processing problem into a domain where geometric constraints can be applied. Most feature tracking methods are highly sensitive to the initial selection of each feature point. Automated feature point selection is typically done using criteria applied solely to the initial frame (such as choosing an area of high contrast) . This selection can easily prove to be a poor choice for tracking in successive frames. Like-wise, a manual selection made by a human operator may not be well suited for tracking over multiple frames.

When features are tracked independently, selection sensitivity becomes critical. Even when multiple features can be correlated and tracked as a group, reducing selection sensitivity depends on tracking all the features across multiple image frames while maintaining the correlation between them.

A feature can be "lost" due to imaging artifacts such as noise or transient lighting conditions. These artifacts can make it difficult or impossible to distinguish the feature identified in one frame from its surroundings in another frame. A feature can also be lost when it is visible in one frame but occluded (or partially occluded) in another. Feature occlusion may be due to changing camera orientation, and/or movement of one or more object (s) in the visual scene. A lost feature can reappear in yet another frame, but not be recognized as a continuation of a previously identified feature. This feature might be ignored, and remain lost. It may instead be incorrectly identified and tracked as an entirely new feature, creating a "broken path." A broken path has two (or more) discontinuous segments such that one path ends where the feature was lost, and the next path begins where the feature reappears. A single feature may therefore be erroneously tracked as multiple unrelated and independent features, each with its own unique piece of the broken path.

The above conditions that lead to a lost feature can also contribute to a "bad match." A bad match is a feature identified in one frame that is incorrectly matched to a different feature in another frame. A bad match can be even more troublesome than a lost feature or broken path, since the feature tracking algorithm proceeds as if the feature were being correctly tracked.

SUMMARY OF THE INVENTION

The advantages of feature tracking have been demonstrated in experimental results and in field trials, particularly in applications that derive higher level scene information by automatically tracking and correlating multiple feature points. However, the limitations of feature tracking methods as discussed above reduce their utility in certain practical settings. A tool that would enable a user to visualize the output of feature tracking to identify bad matches or other instances in which movement is being tracking incorrectly could greatly improve the utility of automatic feature tracking. It is also desirable to eliminate erroneous tracks and to correct anomalies in tracks as much as possible prior to their being input to automatic camera and scene modeling algorithms, because their presence in the tracking data causes errors in resulting computations.

Briefly, the present invention is a visualization tool that displays the output of a feature tracking or optical flow algorithm in a type of three-dimensional "spaghetti graph." The spaghetti graph enables a human user to identify and eliminate outliers and other bad track matches from the results of a feature tracking algorithm performed on the original 2D image sequence. The technique involves building a track in three dimensions representing the movement of a single feature through the sequence of images, and furthermore builds a track for any number of features in the sequence. The display provides a representation of the tracks in a 3D coordinate system where the x and y coordinates are the coordinates of a feature within the image coordinate system, and the z coordinate is a number associated with the temporal ordering of each image frame in the sequence of images . The tracks are preferably marked with an attribute of a selected pixel in the feature in the originating image in order to further allow the user to visually separate the tracks. For example, the marked track may be colored in the same color as the selected pixel in the case of a color image, or set to a corresponding grey scale value in the case of a black and white image .

The result is a three-dimensional display of marked tracks representing the evolution of the optical flow over time. The 3D track representation may be manipulated by rotation, scaling, zooming, viewpoint modification, and other standard 3D viewer tools which permit the user to view a 3D object from various angles on a 2D computer monitor. This permits the user to identify problem areas such as broken paths or lost features indicated by places in the graph where tracks are not smooth, tracks end or begin abruptly, tracks cross one another, or have other anomalies. The graph may therefore be used to evaluate the quality of different feature tracking runs and/or algorithms .

The invention provides further benefits in terms of producing feature tracking outputs which are of greater accuracy by eliminating the very features which cause most errors in computations. For example, once problem areas in the optical flow are identified and/or corrected, the user can rerun feature tracking algorithms or improve their results by excluding problem bad tracks or outliers from camera path or scene model analysis .

BRIEF DESCRIPTION OF THE DRAWINGS The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawing (s) will be provided by the Patent and

Trademark Office upon request and payment of the necessary fee. Fig. 1 is a block diagram of an image processing system in which a feature track visualization technique may be used according to the invention.

Fig. 2 is a more detailed view of a sequence of images and a feature point generation process showing their interaction with a feature tracking, scene modeling, and camera modeling process.

Fig. 3 is an exemplary view of a camera, its image plane, and the derivation of feature points, scene structure and camera models. Fig. 4 is a flow chart of a sequence of steps performed in order to produce a feature track visualization according to the invention. Fig. 5 is a set of steps that may be performed subsequent to the visualization process of Fig. 4 to identify and remove bad tracks or anomalies from subsequent processing. Fig. 6 is an exemplary first image from a sequence of images .

Fig. 7 is an exemplary 3D feature track visualization.

Fig. 8 is the same feature track visualization but viewed from a second viewpoint with a higher zoom factor, illustrating a bad track having an anomaly.

Fig. 9 is an even closer view illustrating a broken track.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT Turning attention now in particular to the drawings, Fig. 1 is a block diagram of the components of a digital image processing system 10 in which a feature track visualization technique according to the invention may be implemented. The system 10 includes a computer workstation 20, a computer monitor 21, and input devices such as a keyboard 22 and mouse or stylus 23. The workstation 20 also includes input/output interfaces 24, storage 25, such as a disk 26 and random access memory 27, as well as one or more processors 28. The workstation 20 may be a computer graphics workstation such as the 02/Octane sold by Silicon Graphics, Inc., a Windows NT type-work station, or other suitable computer or computers . The computer monitor 21, keyboard 22, mouse or stylus 23, and other input devices are used to interact with various software elements of the system existing in the workstation 20 to cause programs to be run and data to be stored as described below.

The system 10 also includes a number of other hardware elements typical of an image processing system, such as a video monitor 30, audio monitors 31, hardware accelerator 32, and user input devices 33.

Also included are image capture devices, such as a video cassette recorder (VCR) , video tape recorder (VTR) , and/or digital disk recorder 34 (DDR) , cameras 35, and/or film scanner/telecine 36. Sensors 38 may also provide information about the scene and image capture devices .

One aspect of the present invention is concerned with a technique for visualizing an array of feature points derived from a sequence of images provided by one of the image capture devices. As shown in Fig. 2, a sequence 50 of images 51-1, 51-2, ..., 51-F are provided to a feature point generation process 54. For example, the images 51 may be provided at a Dl resolution of 720 by 486 pixels. Each entry in the feature array 58, however, may actually represent a feature selected over the tiled image 51, such as over a 5x5 or a 7x7 pixel tile.

An output of the feature point generation process 54 a set of arrays 58-1, 58-2, ..., 58-F of feature points with typically an array 58 for each input image

51.

As a result of creating the feature point arrays

58, a feature track process 61, a scene structure modeling process 62, a camera modeling process 63, or other image processing techniques may be applied to derive further information from the image sequence 50. Feature tracking 61 may, for example, estimate the path or "directional flow" of two-dimensional shapes across the image sequence 50, or estimate three- dimensional paths of selected feature points. The scene structure model 62 may derive information about the relative distances or "depth" of objects in the image sequence 50. The camera modeling processes 63 may estimate one or more camera paths in three dimensions from multiple feature points . Considering the scene structure modeling 62 and camera modeling 63 more particularly, the sequence 50 of images 51-1, and 51-2, ..., 51-F is typically taken from a camera that is moving relative to objects in a scene. Imagine that we locate P feature points 52 in the first image 51-1. Feature points 52 are often selected to be the corners of objects in the images 51, although other selection methods may be used. Each feature point 52 corresponds to a single world point, located at position s_p in some fixed world coordinate system. This point will appear at varying positions in each of the following images 51-2, ..., 51-F, depending on the position and orientation of the camera in that image, and depending upon whether the point moves or remains fixed over time in world coordinates relative to the camera.

The observed image position of point p in frame f is written as the two-vector u_fp containing its image x- and y- coordinates, which is sometimes written as (u_fp,v_fp) . These image positions are measured by tracking the feature from frame to frame using known feature tracking techniques .

The camera position and orientation in each frame is described by a rotation matrix R_f and a translation vector t_f representing the transformation from world coordinates to camera coordinates in each frame. It is possible to physically interpret the rows of R_f as giving the orientation of the camera axes in each frame - the first row i_f, gives the orientation of the camera's x-axis, the second row, j_f, gives the orientation of the camera's y-axis, and the third row, k_f, gives the orientation of the camera's optical axis, which points along the camera's line of sight. The vector t_f indicates the position of the camera in each frame by pointing from the world origin to the camera's focal point. This formulation is illustrated in Fig. 3.

The process of projecting a three-dimensional point onto the image plane in a given frame is referred to as projection. This process models the physical process by which light from a point in the world is focused on the camera's image plane, and mathematical projection models of various degrees of sophistication can be used to compute the expected or predicted image positions P(f,p) as a function of s_p, R_f, and t_f . In fact, this process depends not only on the position of a point and the position and orientation of the camera, but also on the complex lens optics and image digitization characteristics. These may include an orthographic projection model, scaled orthographic projection model, para-perspective projection model, perspective projection model, radial projection model, or other types of models. These models have varying degrees of mathematical sophistication and complexity, and account for the actual physics of image formation to increasingly accurate degrees . One such camera movement and surface mesh modeling algorithm is described in Poelman, C.J., "The Paraperspective and Projective Factorization Methods for Recovering Shape and Motion, " Carnegie Mellon University, School of Computer Science Report CMU-CS- 95-173 dated 12 July 1995.

The specific algorithms used to derive a scene structure 62 or camera model 63 are not of particular importance to the present invention. Rather, the present invention is concerned with a technique for developing a visual representation of the arrays of feature points 58 to better permit identification of errors and anomalies therein.

The feature points developed from the image sequence 50 are stored in the feature array 58 as a number of associated image feature entries 60. For example, each entry 60 in the feature array 58 contains at least (1) a grid position (GRID POS) or "(x,y) coordinate" and (2) a flow vector (FLOW) or "path." Path for the feature array 58 is developed by applying a feature tracking algorithm 60 across successive images 51. Consider an example where the image stream 50 contains images of a rotating cube 68 against a uniform dark background. The visual corners 52 of the cube 68 are what is traditionally detected and tracked as feature points . The GRID POS data for each feature point in image 51-1 is thus the (x,y) position of each feature point in the first array 58-1. As the image stream progresses, a second image 51-2 in the sequence has the cube rotated to a different position. As shown, a corresponding movement of the feature points 52 occurs. The grid positions of - l i ¬

the feature points are thus stored in a second array 58-2 of the feature array 58.

Therefore, across each image pair, a sub-pixel directional flow vector can be generated representing the movement of each feature point 52. The vectors are generated between the first 51-1 and second image 51-2, the second 51-2 and third 51-3 image, and so on up to the F'th image 51-F.

A corresponding flow vector can thus be derived for each feature point pair which determines the sub-pixel location of the feature point in a next successive image. Data representing the flow vector for each feature point is stored in the PATH entries in feature array 58. A given directional flow vector, for example, associated with the subsequent images 51, may have a different magnitude and direction as the speed and direction of the cube 68 changes.

Fig. 4 is a sequence of steps that can now be performed given that the feature array 58 containing sets of feature points and flow vectors for each image is available.

From an idle state 100, a first state 102 is entered in which the feature track algorithm is used to define feature points and paths for each frame as already described.

The following states 104 through state 110 are executed for each feature point array 58.

Likewise, beginning in state 106, a loop is performed for each image, f, in the array. In state 108, a track segment is built in three dimensions for each feature point 52 from its data associated with each image in the feature array 58. In particular, a track segment is built in three dimensions by plotting a line segment beginning at a location (x,y,f) where the x and y coordinates correspond to the relative position of the feature point in its associated image 51, and its location along the z axis is a number, f, associated with the temporal ordering or the "index" of each image 50 in the sequence 51.

Once the start coordinates is known, the line segment is drawn in the direction given by the corresponding path vector.

In state 110, the track segment is actually rendered on the display. In particular, in this state 110, the track is rendered in a color that is the same as the feature point's color in the first frame of the sequence 51.

States 104 through 110 are iterated until a track is displayed representing the movement of a single feature point throughout an entire sequence of images and such a track is built for each feature point in the image. The result is a set of colored tracks representing the evolution of the optical flow over time through the image sequence 51.

In state 112, the result is then displayed to the user, and the user is permitted in state 114 to change the viewpoint via rotation, zooming, and other standard 3D viewer tools in order to evaluate the quality of the feature tracking algorithm. In particular, the user may access the quality of the particular feature tracking algorithm implemented to easily identify problems areas such as places in which the tracks are not smooth, tracks begin or end abruptly, tracks cross one another, or have other anomalies. For example, turning attention briefly to Fig. 6, there is shown a view of a scene in which a woman is seated in a room next to a fireplace. Fig. 7 shows one view of a feature track visualization produced from this scene according to the sequence of steps performed in Fig. 4. The sequence of images was taken by panning the camera around the seated woman in the room. The particular feature points can be traced more or less back to their origin points in the first image in the sequence by coordinating the color of the feature points with the colors of various regions in the first image in the sequence .

Fig. 8 is a viewpoint of the same set of tracks but taken from a closer viewpoint. Notice that one of the tracks 200 has an anomaly in that it has a sharp peak in a region of otherwise smooth tracks. The user knows this because the camera movement could not have possibly produced such an anomaly for only one feature of the image when other surrounding features in the same portion of the image exhibit much smoother flow. Fig. 9 is an even more detailed viewpoint of a track 210 which is considered to be "bad" in that there is an obvious break or premature end point for the track 210. The process of Fig. 4 may therefore be used to evaluate the performance of particular feature tracking algorithm 61. However, additional application of the process can be used whereby the user intervenes in automatic scene modeling and camera path algorithms in order to produce higher quality results.

For example, when viewing a three-dimensional flow display such as that of Figs. 7, 8 or 9, the user can identify anomalies and other problem areas in the flow, such as unsmooth tracks, tracks that appear to flow in physically impossible directions, crossing tracks, and interrupted tracks as before. Once such tracks have been identified, the user can alter or remove them entirely from subsequent processing in order to reduce the noise in the input to automatic algorithms and thereby improve their output.

For example, turning attention to Fig. 5, process may begin from an idle state 100, performing the states 102 through 112 as in Fig. 4. However, at the end of state 112, a state 130 may be entered in which the user identifies a bad track from three-dimensional displays such as the track that was shown in Fig. 9.

In a state 132, this track can be identified as a track which should be removed from further analysis. Thus in this state, for example, an entry is made in the feature array 58 to indicate the status is "bad."

Bad tracks, for example, are often found most likely in outlying areas of the scene, most likely a result of the fact that information on the edges of a particular image 51 typically change more rapidly than the information in the center of the image. When the subsequent image processing algorithm such as a feature tracking algorithm 61, a camera modeling algorithm 63, or scene modeling algorithm 62 may be run without using such a track, with improved results.

Similarly, from state 112, the user may enter a state 140 in which an anomaly in a track is identified. In state 142, the system 10 may permit the user to specify a correction to this particular track. This correction is reflected in a modification to the entries in the feature point array such as by modifying the location of an x,y point in the array or visually changing its corresponding path vector with the input device 23.

The corrected tracks are then applied in state 150 to the subsequent feature track 61, camera modeling 63, or scene modeling 62. By having the user identify points in the scene that appear incorrect, such as their position does not correspond to the user's understanding of the scene geometry.

It should be understood that the processes described in Fig. 4 and Fig. 5 can then be iterated as indicated in state 100 through state 150 and to further refine the process with user input .

EQUIVALENTS While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described specifically herein. Such equivalents are intended to be encompassed in the scope of the claims.

Claims

CLAIMSWhat is claimed is:

1. A method for a visualization of optical flow for a time sequence of images comprising the steps of: forming a feature point array from the image sequence, with entries in the feature point array corresponding to the coordinate positions of feature points in image of the array and associated flow vector information; and deriving a flow graph representation in three dimensions from the feature point array wherein coordinate positions along a first pair of coordinates axes correspond to coordinate positions in a source images of the image sequence, and wherein coordinate positions along an orthogonal depth axis of the visualization correspond to an index number of the corresponding image from which the feature point was taken.

2. A method as in claim 1 wherein the flow graph representation for a given feature point is a track comprising a series of line segments illustrating the change in position of the feature point over a corresponding series of images in the image sequence.

3. A method as in claim 2 wherein the track is marked with an attribute of the feature point in one of the images in the series.

4. A method as in claim 3 wherein the attribute is a color.

5. A method as in claim 3 wherein the attribute is a grey scale value.

6. A method as in claim 1 wherein the user is permitted to identify anomalies in the flow graph representation .

7. A method as in claim 6 wherein the anomalies are used to control inputs to a subsequent automatic image processing algorithm.

8. A method as in claim 6 wherein the anomalies include locations in the flow graph representation which end abruptly indicating where a feature track was lost .

9. A method as in claim 8 wherein the lost feature track is excluded from the subsequent image processing algorithm.

10. A method as in claim 8 wherein the user supplies an input indicating how the lost feature track can be recovered by stitching it to another feature track.