US20160328860A1 - Occlusion-Robust Visual Object Fingerprinting using Fusion of Multiple Sub-Region Signatures - Google Patents
Occlusion-Robust Visual Object Fingerprinting using Fusion of Multiple Sub-Region Signatures Download PDFInfo
- Publication number
- US20160328860A1 US20160328860A1 US14/705,473 US201514705473A US2016328860A1 US 20160328860 A1 US20160328860 A1 US 20160328860A1 US 201514705473 A US201514705473 A US 201514705473A US 2016328860 A1 US2016328860 A1 US 2016328860A1
- Authority
- US
- United States
- Prior art keywords
- sub
- image frame
- regions
- candidate image
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G06T7/2006—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/215—Motion-based segmentation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/86—Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
- G01S13/865—Combination of radar systems with lidar systems
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/86—Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
- G01S13/867—Combination of radar systems with cameras
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G06T7/0097—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/174—Segmentation; Edge detection involving the use of two or more images
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/20—Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from infrared radiation only
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/30—Transforming light or analogous information into electric information
- H04N5/33—Transforming infrared radiation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/64—Circuits for processing colour signals
- H04N9/646—Circuits for processing colour signals for image enhancement, e.g. vertical detail restoration, cross-colour elimination, contour correction, chrominance trapping filters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30236—Traffic on road, railway or crossing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
Definitions
- a method in one example, includes receiving an indication of an object within a sequence of video frames, selecting, from the sequence of video frames, a reference image frame indicative of the object and candidate image frames representative of possible portions of the object, dividing the reference image frame and the candidate image frames into multiple cells, and defining, for the reference image frame and the candidate image frames, a plurality of sub-regions of the multiple cells.
- One or more of the sub-regions include the same cells for overlapping representations and the plurality of sub-regions include multiple sizes.
- the method also includes comparing characteristics of the plurality of sub-regions of the reference image frame to characteristics of the plurality of sub-regions of the candidate image frames and determining similarity measurements, and based on the similarity measurements, tracking the object within the sequence of video frames.
- the processor further tracks the object by comparing characteristics of the plurality of sub-regions of the reference image frame to characteristics of the plurality of sub-regions of the candidate image frames and determining similarity measurements, and based on the similarity measurements, tracking the object within the sequence of video frames.
- FIG. 1 is a block diagram of an example system for object tracking, according to an example embodiment.
- FIG. 2 shows a flowchart of an example method for occlusion-robust visual object fingerprinting using fusion of multiple sub-region signatures, according to an example embodiment.
- FIG. 4A illustrates examples of sub-regions defined for the reference image frame, according to an example embodiment.
- FIG. 6 illustrates the example 3 ⁇ 3 sub-region of the reference image frame 120 that can be rotated around the center cell resulting in a plurality of different combinations of sub-regions, according to an example embodiment.
- a target signature model for robust tracking and reacquisition using multiple overlapped sub-regions of a selected image frame, which may be performed in real-time for onboard processing on the UAV.
- Example methods enable long term persistent target tracking and reacquisition using robust target signatures, which may be occlusion-robust target signatures based on overlapped sub-regions of selected image frames. Further examples may enable matching using sub-region based target signatures so as to disregard unwanted background information in the selected image frames. Using these methods, robust target reacquisition due to long term occlusion and reliable target identification under partial occlusion are described.
- a UAV may include an electro-optical (EO) or infrared camera that captures video of ground target(s), and processing is performed to determine distinguishable and consistent target signatures.
- Target loss can occur due to changes of illumination, partial/full occlusions, etc. in the video.
- sub-region matching between reference image frames and newly detected image frames of the target can be used based on statistical characteristics of luminance, chrominance, and respective entropies, to achieve reliable target matching and re-acquire a target lost due to occlusions or tracking failure.
- Target signature matching can be performed using cyclic sub-region matching and median of minimum or minimum of minimums matching between reference and candidate image frames that may have different occlusion patterns to track or reacquire identification of the target.
- Using sub-region matching effectively filters out occluded areas by selecting a variety of sub-regions to be matched between the reference and candidate image frames.
- the system 100 may be entirely within a vehicle or an aircraft, or portions of the system 100 may be on an aircraft (e.g., such as the sensors) and portions of the system may be elsewhere or located within other computing devices (e.g., such as the 3D terrain database).
- portions of the system 100 may be on an aircraft (e.g., such as the sensors) and portions of the system may be elsewhere or located within other computing devices (e.g., such as the 3D terrain database).
- the IR camera 104 may be a long wave IR camera configured to collect infrared information of an environment of a vehicle or aircraft, and to generate an image using the infrared information. Thus, the IR camera 104 may collect information of the environment of the vehicle and output a sequence of video frames 105 , for example, to the processor 112 . Other types of cameras may be alternatively or additionally included, such as an EO camera.
- the LIDAR 106 can estimate distance to environmental features while scanning through a scene to assemble a “point cloud” indicative of reflective surfaces in the environment.
- Individual points in the point cloud can be determined by transmitting a laser pulse and detecting a returning pulse, if any, reflected from any object in the environment, and then determining a distance to the object according to a time delay between the transmitted pulse and reception of the reflected pulse.
- a laser, or set of lasers can be rapidly and repeatedly scanned across portions of the environment to generate continuous real-time information on distances to reflective objects in the environment. Combining measured distances and orientation of the laser(s) while measuring each distance allows for associating a three-dimensional position with each returning pulse.
- a three-dimensional map of points (e.g., a point cloud) indicative of locations of reflective features in the environment can be generated for the entire scanning zone.
- the LIDAR 106 may output point cloud data, or may output images generated using point cloud data, for example.
- the LIDAR can be configured to collect laser point cloud data of the environment of the vehicle.
- the RADAR 108 is an object-detection sensor that uses radio waves to determine range, altitude, direction, or speed of objects in an environment.
- the RADAR may include an antenna that transmits pulses of radio waves or microwaves that bounce off any object in their path. The object returns a portion of the wave's energy to a receiver of the RADAR for estimation or determination of positioning of the object.
- the other sensor(s) 110 may include a variety of sensors included on the vehicle for navigational purposes, such as other imaging cameras, inertial measurement units (IMUs), temperature sensors, SONAR, or any other array of sensors and optical components.
- the sensors 110 may include an inertial navigation system (INS) configured to determine navigation information of the vehicle, a global positioning system (GPS) for determining navigation information as well, or other navigation system.
- INS inertial navigation system
- GPS global positioning system
- the processor 112 may receive inputs from the sensors 102 to track objects over time as seen in the inputs.
- the processor 112 may track objects within a video feed output by the IR camera 104 in real-time while the vehicle is traversing the environment, based on inputs from the IR camera 104 , the LIDAR 106 , the RADAR 108 and the sensors 110 , for example.
- the processor 112 may extract, from the video 105 , a reference image frame 120 indicative of, or including the object and candidate image frames 122 representative of possible portions of the object, divide the reference image frame 120 and the candidate image frames 122 into multiple cells, and compare characteristics of the reference image frame 120 to characteristics of the candidate image frames 122 for determination of similarity measurements.
- the processor 112 may store the reference image frame 120 and the candidate image frames 122 in the data storage 118 .
- the similarity measurements can be used to track the object within the sequence of video frames.
- Terrain images from the 3D terrain database 114 may be overlaid onto the video feed to generate the outputs 116 for storage in the data storage 118 and for display.
- the outputs 116 may include a number of various forms including a video feed that tracks a target object, or data representative of the target object location in the environment over time.
- the outputs 116 can be sent to the display 124 , which may include both multi-function displays (MFD) and head mounted displays (HMD), permitting aircrews to view the outputs.
- the display 124 may include other displays of a vehicle as well.
- the outputs 116 may be displayed on the display 124 to highlight the target object being tracked over time within the sequence of video frames.
- the system 100 may be configured to receive inputs from the sensors 102 that include data representative of moving objects in an environment, and process the inputs to track the objects over time.
- the system 100 may be present on a vehicle (e.g., a UAV) that travels through an environment capturing a video feed of the environment and any moving objects in the environment.
- the IR camera 104 may provide the sequence of video frames 105 of the environment with the moving objects, and the processor 112 may process the sequence of video frames 105 to track the moving objects over time with respect to a location of the object within the sequence of video frames 105 , which may be mapped to a physical geographic location of the object in the environment.
- Persistent target tracking can be performed so as to track the object even when the object is occluded by features of the environment, and thus, the processor 112 may perform target reacquisition from long term occlusions or partial occlusions in real-time.
- the processor 112 may perform target reacquisition from long term occlusions or partial occlusions in real-time.
- distinguishable and consistent target signatures can be used for the system 100 to reacquire and track lost target(s).
- Target loss occurs usually due to sudden changes of illumination, partial/full occlusions, etc.
- the system 100 may perform sub-region matching between reference image frames and newly detected image frames (e.g., portions of image frames) using statistical characteristics of luminance, chrominance, and their entropies, to achieve reliable target matching and reacquisition due to targets lost because of occlusions or tracking failure.
- the processor 112 determines a signature for a reference image frame of the target object using multiple overlapped sub-regions of the reference image frame for comparison with signatures of newly detected image frames using cyclic sub-region matching, or median/minimum of minimums between reference and candidate image frames that may have different occlusion patterns.
- the processing of data may be performed on a computing device separate from the system 100 , or processing may be performed onboard the system (e.g., onboard the UAV) to enhance capabilities for autonomous operations and UAV surveillance.
- FIG. 2 shows a flowchart of an example method 200 for occlusion-robust visual object fingerprinting using fusion of multiple sub-region signatures, according to an example embodiment.
- Method 200 shown in FIG. 2 presents an embodiment of a method that could be used with the system shown in FIG. 1 , for example, and may be performed by a computing device (or components of a computing device) such as a client device or a server or may be performed by components of both a client device and a server.
- Example devices or systems may be used or configured to perform logical functions presented in FIG. 2 .
- components of the devices and/or systems may be configured to perform the functions such that the components are actually configured and structured (with hardware and/or software) to enable such performance.
- Method 200 may include one or more operations, functions, or actions as illustrated by one or more of blocks 202 - 212 . Although the blocks are illustrated in a sequential order, these blocks may also be performed in parallel, and/or in a different order than those described herein. Also, the various blocks may be combined into fewer blocks, divided into additional blocks, and/or removed based upon the desired implementation.
- each block may represent a module, a segment, or a portion of program code, which includes one or more instructions executable by a processor for implementing specific logical functions or steps in the process.
- the program code may be stored on any type of computer readable medium or data storage, for example, such as a storage device including a disk or hard drive.
- the computer readable medium may include non-transitory computer readable medium or memory, for example, such as computer-readable media that stores data for short periods of time like register memory, processor cache and Random Access Memory (RAM).
- the computer readable medium may also include non-transitory media, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example.
- the computer readable media may also be any other volatile or non-volatile storage systems.
- the computer readable medium may be considered a tangible computer readable storage medium, for example.
- each block in FIG. 2 may represent circuitry that is wired to perform the specific logical functions in the process.
- Alternative implementations are included within the scope of the example embodiments of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrent or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art.
- the method 200 includes receiving an indication of an object within the sequence of video frames 105 .
- the sequence of video frames 105 may be output by the camera 104 and received by a computing device or the processor 112 . It may be desired to track an object within the sequence of video frames.
- Visual object tracking can thus be performed to track a ground target/object within a video sequence, and once an object for tracking is chosen or determined, the object can be followed within the video sequence.
- a specific object for tracking can be determined by a user selecting or designating the object or by other manners resulting in receipt of an input or indication indicating the object.
- the method 200 may include detecting a moving object in the sequence of video frames as the object for tracking.
- Moving object detection may be performed in a number of ways, such as by frame-by-frame comparison to determine differences between frames and drawing bounding boxes around areas that have differences. Areas without differences (or differences less than a threshold) may be determined to be background (e.g., portions of frames that include little or no movement). Areas with differences above a threshold likely include moving objects, and such areas can be identified and noted as including objects of interest for tracking.
- moving object detection can be performed with a moving object detection method that takes into account jittering/vibration when the videos contain image motion due to platform motion.
- the video images can be stabilized frame by frame so that stationary backgrounds remain fixed in the image.
- the video are stabilized by registering image frames to a certain global coordinate system, and then videos of the scene appear stable with respect to a ground plane and other environmental structures fixed in the image so that independently moving objects such as ground vehicles appear as moving objects in the video.
- Feature correspondence matching can be used to compare sets of features and match key points from one image frame to others that have similar features.
- a set of matching points from two images can be generated, and processed using minimum Euclidean distances, for example, resulting in matching features that are indicative of moving objects in the video.
- the method 200 includes selecting, from the sequence of video frames 105 , a reference image frame, frame 120 for example, indicative of the object and one or more candidate image frames, frames 122 for example, representative of possible portions of the object.
- a reference image frame of the object in the video is selected or extracted as a target signature for tracking or target reacquisition.
- a target signature for example, is a representation of an appearance and shape of a target of interest (e.g., vehicle, pedestrian) in an image frame to be used for matching/comparison with other signatures collected.
- the reference image frame can be manually selected/extracted/identified from the video, or using the moving object detection methods, an image frame can be extracted that includes an ideal representation of the object (e.g., an image frame that illustrates the object with little or no occlusions).
- Candidate image frames may be representative of possible portions of the object, such as video frames that include portions of the object occluded by another object.
- Candidate image frames can also be identified within the video frames using the moving object detection methods where feature comparisons between frames indicate matches of at least some features so that the candidate image frames contain at least a portion of the object.
- the method 200 includes dividing the reference image frame 120 and the candidate image frames 122 into multiple cells.
- a cell may be a smaller portion of the image frame.
- Each cell contains a certain number of pixels representing partial appearance information of the object.
- the size of a cell and a number of cells can vary.
- FIG. 3A illustrates the example reference image frame 120 , showing an object 302 (e.g., a car), that has been extracted from a video or the sequence of video frames 105 .
- the reference image frame 120 is a portion of video or the sequence of video frames 105 (shown in FIG. 1 ).
- the object 302 is a car.
- FIG. 3B illustrates the reference image frame 120 divided into a number of cells 304 .
- the number of cells 304 is 25 cells.
- the size of the reference image frame 120 is approximately 25 ⁇ 40 pixels, and by dividing the reference image frame 120 into 25 (5 ⁇ 5) cells, each cell size becomes 5 ⁇ 8 pixels.
- Each cell represents part of the reference image frame 120 .
- FIG. 3C illustrates the example candidate image frame 122 showing the object 302 that has been extracted from the sequence of video frames 105 .
- the candidate image frame 122 illustrates the object 302 occluded by an occlusion 312 (e.g., the car driving under a bridge).
- FIG. 3B illustrates the candidate image frame 122 divided into a number of cells 304 . In this example, the number of cells 304 is 25 cells.
- the method 200 includes defining, for the reference image frame 120 and the candidate image frames 122 , a plurality of sub-regions 402 , 404 , 406 , and 408 (as shown in FIG. 4A ) for the reference image frame 120 and a plurality of sub-regions 410 , 412 , 414 , and 416 (as shown in FIG. 4B ) for the candidate image frame 122 , and each sub-region 402 - 408 and 410 - 416 includes multiple cells 304 .
- One or more of the sub-regions 402 - 408 and 410 - 416 include the same cells for overlapping cells 306 or overlapping representations and the plurality of sub-regions include multiple sizes.
- the candidate image frames 122 are compared to the reference image frame 120 to track the object 302 throughout the video.
- various conditions such as a location (center, left, or right) and size of the object/target within the image frame, viewpoints (view angles) toward the target, or existence of occlusions and clutters, it may not be guaranteed to have cell-to-cell matching correspondence among different image frames of the same target.
- multiple sub-regions are assigned in overlapped and multiple-sized ways.
- An example purpose of overlapping is to include the same features in many sub-regions, and an example purpose of multiple-sizes is to consider that an effective number of cells in a sub-region varies due to background inclusion or partial occlusion in the image frame.
- FIG. 4A illustrates examples of sub-regions that may be defined for the reference image frame 120 .
- the top row shows the reference image frame 120 with example sub-regions 402 and 404 of 3 ⁇ 3 cells and the bottom row shows example sub-regions 406 and 408 of 4 ⁇ 4 cells.
- sub-regions can be generated from a left-top corner of the image frame down to a to right-bottom corner by shifting the 3 ⁇ 3 bounding box over and down one cell at a time.
- the overlapping cells 306 are both included in the sub-region 402 and in the sub-region 404 .
- Using a 3 ⁇ 3 sub-region size may result in 9 different sub-regions and using a 4 ⁇ 4 sub-region size may result in 4 different sub-regions for a 25 cell image frame.
- This way, sub-regions are generated to be overlapping and of multiple sizes. Not all possible different sub-regions of overlapping and possible sub-regions of multiple sizes need to be generated. More overlapping and size variation can result in more robust matching.
- FIG. 4B illustrates examples of sub-regions that may be defined for the candidate image frame 122 .
- the candidate image frame 122 shows the object 302 occluded by portions of environment (e.g., car drives under bridge and an occlusion 312 exists).
- the top row shows the candidate image frame 122 with example sub-regions 410 and 412 of 3 ⁇ 3 cells and the bottom row shows example sub-regions 414 and 416 of 4 ⁇ 4 cells.
- sub-regions can be generated from a left-top corner of the image frame down to a to right-bottom corner by shifting the 3 ⁇ 3 bounding box over and down one cell at a time. For example, as shown in FIG.
- the overlapping cells 308 are both included in the sub-region 410 and the sub-region 412 .
- Using a 3 ⁇ 3 sub-region size may result in 9 different sub-regions and using a 4 ⁇ 4 sub-region size may result in 4 different sub-regions for a 25 cell image frame. This way, sub-regions are generated to be overlapping and of multiple sizes. Not all possible different sub-regions of overlapping and possible sub-regions of multiple sizes need to be generated. More overlapping and size variation can result in more robust matching.
- both of the reference image frame 120 and the candidate image frames 122 are divided into multiple cells 304 .
- Multiple cells 304 may then be grouped together to form sub-regions, such as for example, the sub-regions 402 , 404 , 406 , and 408 shown in FIG. 4A and sub-regions 410 , 412 , 414 , and 416 shown in FIG. 4B .
- the method 200 includes comparing characteristics of the plurality of sub-regions of the reference image frame 120 to characteristics of the plurality of sub-regions of the candidate image frames 122 and determining similarity measurements.
- a fingerprint signature is calculated by extracting unique features of the sub-regions for comparison to determine if the object (or portion of object) is present in both the reference and candidate image frames.
- An example fingerprint signature vector f contains the following information (in YCbCr color space) of pixels in an image frame: Luminance mean value L mean , Red chrominance mean value Cr mean , Blue chrominance mean value Cb mean , Luminance entropy L ent , Red chrominance entropy Cr ent , and/or Blue chrominance entropy Cb ent .
- its covariance matrix, C can be estimated such that each sub-region has a fingerprint pair, ⁇ f, C ⁇ .
- FIGS. 5A-5B illustrate example sub-region comparisons.
- the reference image frame 120 is shown with the sub-region 406 represented by the rectangle that includes a portion of the object 302 .
- FIG. 5B is the candidate image frame 122 that includes the sub-region 414 represented by the rectangle.
- the candidate image frame 122 shows the object 302 occluded by portions of environment (e.g., car drives under bridge and the occlusion 312 exists).
- the example in FIGS. 5A-5B show that the sub-region 406 matches to the sub-region 414 . However, for other sub-regions of the candidate image frame 122 in FIG. 5B , no match may have been determined due to the occlusion 312 .
- sub-region matching can identify matches when different overlapping and multiple-sized sub-regions of the reference image frame 120 are used for comparison with sub-regions that can be generated from the candidate image frame 122 .
- sub-region matching is performed in a manner to consider a number of occlusion patterns of the object within the given candidate image frame.
- Sub-region comparisons may include comparing respective fingerprint signatures of the reference image frame 120 to respective fingerprint signatures of the candidate image frames 122 , and determining similarity measurements based on a Kullback-Leibler Distance (KLD).
- KLD Kullback-Leibler Distance
- the KLD similarity measurement may be used as an indication of a match, and a lower/shorter distance indicates a better match.
- a threshold distance may be satisfied to determine a match.
- FIG. 6 illustrates the example 3 ⁇ 3 sub-region 402 of the reference image frame 120 that can be rotated around a center cell 310 so as to rotate sub-regions around the center cell 310 resulting in a plurality of different combinations of sub-regions.
- the same may be performed for the candidate image frames 122 resulting in a plurality of different combinations of pairs of sub-regions between the reference image frame 120 and the candidate image frames 122 , and similarity measurements for the combinations of sub-region pairs are determined.
- a given candidate image frame with a minimum of similarity measurements may be determined as a best match for tracking purposes.
- Cyclic matching may be useful for candidate image frames where a target itself rotates or turns, or when a sensing platform (on the UAV) changes viewpoints.
- sub-region matching is performed by taking the rotation effects in consideration.
- a number of sub-regions is fixed as nine (e.g., one center sub-region and eight rotating sub-regions around the center sub-region as partially illustrated in FIG. 4 )
- cycle sub-region matching can show more robust target matching for cases of partial occlusion and rotation effects being present.
- the KLD similarity measurement for each signature pair (e.g., mean vector of luminance/chrominance/entropies and their corresponding covariance matrices) is determined.
- KLD value between an image frame i and an image frame j the following equation is used:
- KLD i ⁇ j log ⁇ ( det ⁇ ( C i ) det ⁇ ( C j ) ) + trace ⁇ ( ( C i ) - 1 ⁇ C j ) + ( f i - f j ) ⁇ ( C i ) - 1 ⁇ ( f i - f j ) T Equation ⁇ [ 1 ]
- i is also calculated and an average between KLD j
- the signature pair ⁇ f T , C T ⁇ for the reference image frame and ⁇ f K , ⁇ for the K th candidate image frame are compared with calculating each sub-KLD, KLD T
- KLD T,K i,j 0.5*(KLD T
- KLD T ⁇ K i ⁇ j log ⁇ ( det ⁇ ( C T i ) det ⁇ ( C K i ) ) + trace ⁇ ( ( C T i ) - 1 ⁇ C K j ) + ( f T i - f K i ) ⁇ ( C T i ) - 1 ⁇ ( f T i - f K j ) T Equation ⁇ [ 3 ]
- KLDs are calculated by fixing the center cell and rotating the other cells in one direction, as in FIG. 7 , resulting in the following KLD measurements:
- KLD T min k ( ⁇ i ⁇ j ⁇ KLD T,K i,k ⁇ ) Equation [5]
- matching can be performed by determining median of a minimum or a minimum of a minimum of the similarity measurements of the candidate image frame 122 .
- KLD T,K i,j is obtained with the sub-region j in the image frame K.
- KLD T,K i,j is estimated.
- a median of KLD T,K i 's is determined.
- a final KLD value for image frame T with candidate K will be as follows:
- KLD T,K median i (min j ( ⁇ KLD T,K i,j ⁇ )) Equation [6]
- An image frame that has a minimum value among KLD T,K 's is then chosen as the best match.
- KLD T,K i 's KLD T,K i 's
- KLD T,K min i (min j ( ⁇ KLD T,K i,j ⁇ )) Equation [7]
- the method 200 includes based on the similarity measurements, tracking the object 302 within the sequence of video frames 105 .
- Tracking the object 302 within the sequence of video frames 105 includes determining matches between the candidate image frames 122 and the reference image frame 120 , and based on mis-matches between the candidate image frames 122 and the reference image frame 120 within a portion of the sequence of video frames 105 target reacquisition within a subsequent portion of the sequence of video frames 105 is performed.
- examples of the method 200 may include storing a reference image frame's fingerprint pairs ⁇ f T i , C T i ⁇ after the object 302 is selected, detecting moving target candidates, assigning each detected object into a candidate image frame, and for each image frame dividing it into cells (e.g., 5 ⁇ 5 cells in one image frame) and assigning sub-regions (e.g., 3 ⁇ 3 cells or 4 ⁇ 4 cells).
- cells e.g., 5 ⁇ 5 cells in one image frame
- sub-regions e.g., 3 ⁇ 3 cells or 4 ⁇ 4 cells.
- KLD T,K i,j For each pair between a j th sub-region of a K th candidate image frame and an i th sub-region of the reference image frame T, KLD T,K i,j is calculated, and a candidate with a minimum of cyclic/minimum of median/minimum of minimum KLDs with the reference image frame is determined to track the object between frames of a video.
- Example tests were performed and a comparison of test results with random occlusion rates were determined using matching methods of (1) an entire image frame method (uses the entire area of the extracted image frame and only one KLD value), (2) the cyclic sub-region method (uses sub-regions in a cyclic way), (3) the median of the minimum of the overlapped multiple sub-region method (uses multiple overlapped sub-regions and selects the median of the minimum KLD values), and (4) the minimum of the minimum of the overlapped multiple sub-region method (uses multiple overlapped sub-regions and selects the minimum of the minimum KLD values).
- FIG. 7 is an example target classification accuracy graph between the entire image frame method and the sub-region method (e.g., minimum of the minimum method was used in this test). Over all the ranges (0% through 50%), the sub-region method provided better performance. Up to 25%, the sub-region method rarely shows a decrease of accuracy, and after 33%, both the methods decrease because 33% occlusion in the image frame can be more than 50% occlusion of the target itself in some examples. If occlusion is more than 50%, target matching becomes difficult.
- the sub-region method e.g., minimum of the minimum method was used in this test.
- FIG. 8 illustrates a schematic drawing of an example computing device 800 .
- the computing device 800 in FIG. 8 may represent devices shown in FIG. 1 including the processors, the system, or any of the blocks conceptually illustrating computing components, or the computing device 800 may represent the system in FIG. 1 in general. In some examples, some components illustrated in FIG. 8 may be distributed across multiple computing devices. However, for the sake of example, the components are shown and described as part of one example device 800 .
- the computing device 800 may be or include a mobile device, desktop computer, email/messaging device, tablet computer, or similar device that may be configured to perform the functions described herein.
- the computing device 800 may include an interface 802 , a wireless communication component 804 , sensor(s) 806 , data storage 808 , and a processor 810 . Components illustrated in FIG. 8 may be linked together by a communication link 812 .
- the computing device 800 may also include hardware to enable communication within the computing device 800 and between the computing device 800 and another computing device (not shown), such as a server entity.
- the hardware may include transmitters, receivers, and antennas, for example.
- the interface 802 may be configured to allow the computing device 800 to communicate with another computing device (not shown), such as a server.
- the interface 802 may be configured to receive input data from one or more computing devices, and may also be configured to send output data to the one or more computing devices.
- the interface 802 may also maintain and manage records of data received and sent by the computing device 800 .
- the interface 802 may also include a receiver and transmitter to receive and send data.
- the interface 802 may also include a user-interface, such as a keyboard, microphone, touchscreen, etc., to receive inputs as well.
- the wireless communication component 804 may be a communication interface that is configured to facilitate wireless data communication for the computing device 800 according to one or more wireless communication standards.
- the wireless communication component 804 may include a Wi-Fi communication component that is configured to facilitate wireless data communication according to one or more IEEE 802.11 standards.
- the wireless communication component 804 may include a Bluetooth communication component that is configured to facilitate wireless data communication according to one or more Bluetooth standards. Other examples are also possible.
- the sensor 806 may include one or more sensors, or may represent one or more sensors included within the computing device 800 .
- Example sensors include an accelerometer, gyroscope, pedometer, light sensors, microphone, camera, or other location and/or context-aware sensors.
- the data storage 808 may store program logic 814 that can be accessed and executed by the processor 810 .
- the data storage 808 may also store collected sensor data or image data 816 .
Landscapes
- Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
Description
- The present disclosure relates generally to target or object tracking, such as by manned or unmanned aerial vehicles, in example environments that may cause occlusion or partial occlusion of the object within a sequence of video frames.
- In unmanned aerial vehicle (UAV) surveillance and target tracking operations, persistent and robust target tracking/re-acquisition/re-identification is needed. However, in urban environments, target loss situations are often confronted due to partial or total occlusion by buildings, bridges, or other landmarks. Existing techniques for reacquisition of a target may analyze a motion of a target on a road, for example, and try to reacquire a target location using an assumption of linear or close to linear target trajectories. Other existing techniques may perform vehicle fingerprinting using line segment features of the tracked vehicles by determining an orientation of the vehicle (e.g., by aligning collection of line features from the vehicle into a rectangular cuboid), and estimates matching using a likelihood method for line segments.
- Existing techniques may not be applicable in all operations. For examples, trajectory matching may not apply to objects that have dynamic trajectories or trajectories that do not follow roads or landmarks. Further, clear image quality and large target sizes may be required in order to extract a sufficient number of line features from vehicles, however, in practice, it can be difficult to acquire clear and large target images at all times from the UAVs.
- In one example, a method is described. The method includes receiving an indication of an object within a sequence of video frames, selecting, from the sequence of video frames, a reference image frame indicative of the object and candidate image frames representative of possible portions of the object, dividing the reference image frame and the candidate image frames into multiple cells, and defining, for the reference image frame and the candidate image frames, a plurality of sub-regions of the multiple cells. One or more of the sub-regions include the same cells for overlapping representations and the plurality of sub-regions include multiple sizes. The method also includes comparing characteristics of the plurality of sub-regions of the reference image frame to characteristics of the plurality of sub-regions of the candidate image frames and determining similarity measurements, and based on the similarity measurements, tracking the object within the sequence of video frames.
- In another example, a non-transitory computer readable medium having stored thereon instructions that, upon execution by a computing device, cause the computing device to perform functions is described. The functions comprise receiving an indication of an object within a sequence of video frames, selecting, from the sequence of video frames, a reference image frame indicative of the object and candidate image frames representative of possible portions of the object, dividing the reference image frame and the candidate image frames into multiple cells, and defining, for the reference image frame and the candidate image frames, a plurality of sub-regions of the multiple cells. One or more of the sub-regions include the same cells for overlapping representations and the plurality of sub-regions include multiple sizes. The functions also comprise comparing characteristics of the plurality of sub-regions of the reference image frame to characteristics of the plurality of sub-regions of the candidate image frames and determining similarity measurements, and based on the similarity measurements, tracking the object within the sequence of video frames.
- In still another example, a system is described comprising a camera to collect information of an environment of an vehicle and to output a sequence of video frames, and a processor to track an object within the sequence of video frames by determining, from the sequence of video frames, a reference image frame indicative of the object and candidate image frames representative of possible portions of the object, dividing the reference image frame and the candidate image frames into multiple cells, and defining, for the reference image frame and the candidate image frames, a plurality of sub-regions of the multiple cells. One or more of the sub-regions include the same cells for overlapping representations and the plurality of sub-regions include multiple sizes. The processor further tracks the object by comparing characteristics of the plurality of sub-regions of the reference image frame to characteristics of the plurality of sub-regions of the candidate image frames and determining similarity measurements, and based on the similarity measurements, tracking the object within the sequence of video frames.
- The features, functions, and advantages that have been discussed can be achieved independently in various embodiments or may be combined in yet other embodiments further details of which can be seen with reference to the following description and drawings.
- The novel features believed characteristic of the illustrative embodiments are set forth in the appended claims. The illustrative embodiments, however, as well as a preferred mode of use, further objectives and descriptions thereof, will best be understood by reference to the following detailed description of an illustrative embodiment of the present disclosure when read in conjunction with the accompanying drawings, wherein:
-
FIG. 1 is a block diagram of an example system for object tracking, according to an example embodiment. -
FIG. 2 shows a flowchart of an example method for occlusion-robust visual object fingerprinting using fusion of multiple sub-region signatures, according to an example embodiment. -
FIG. 3A illustrates the example reference image frame ofFIG. 1 that has been extracted from a video frame, according to an example embodiment. -
FIG. 3B illustrates the reference image frame ofFIG. 1 divided into a number of cells, according to an example embodiment. -
FIG. 3C illustrates the example candidate image frame ofFIG. 1 that has been extracted from a video frame, according to an example embodiment. -
FIG. 3D illustrates the candidate image frame ofFIG. 1 divided into a number of cells, according to an example embodiment. -
FIG. 4A illustrates examples of sub-regions defined for the reference image frame, according to an example embodiment. -
FIG. 4B illustrates examples of sub-regions defined for the candidate image frame, according to an example embodiment. -
FIG. 5A illustrates the reference image frame shown with a sub-region represented by the rectangle that includes a portion of the object according to an example embodiment. -
FIG. 5B illustrates the candidate image frame shown with a sub-region represented by the rectangle that includes a portion of the object according to an example embodiment. -
FIG. 6 illustrates the example 3×3 sub-region of thereference image frame 120 that can be rotated around the center cell resulting in a plurality of different combinations of sub-regions, according to an example embodiment. -
FIG. 7 is an example target classification accuracy graph between the entire image frame method and the sub-region method, according to an example embodiment. -
FIG. 8 illustrates a schematic drawing of an example computing device, according to an example embodiment. - Disclosed embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all of the disclosed embodiments are shown. Indeed, several different embodiments may be described and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are described so that this disclosure will be thorough and complete and will fully convey the scope of the disclosure to those skilled in the art.
- In some instances, unmanned aerial vehicle (UAV) surveillance and target tracking missions requires persistent and robust target tracking/re-acquisition/re-identification. However, in urban environments, target loss situations may occur due to partial or total occlusion by buildings, bridges, or other landmarks. Targets may be tracked by analyzing a motion context of the target using assumption of linear or close to linear target trajectories, or using line segment features of the tracked objects. However, such tracking may be based on simple motion trajectories, such as on highways, and may not consider dynamic motions and also often does not take into account occlusions.
- Within many environments, persistent target or object tracking such as by manned or UAV may require target reacquisition due to occlusion or partial occlusion of the object within a sequence of image frames. As used herein, an image frame is defined as a single image in a sequence of image frames or video, and an image includes a digital two-dimensional image comprising pixels organized into rows and columns. Each pixel may have a value representing a color and/or brightness for that pixel. Further, a sequence of image frames includes two or more images generated in a consecutive order with respect to time.
- Within examples herein, a target signature model is described for robust tracking and reacquisition using multiple overlapped sub-regions of a selected image frame, which may be performed in real-time for onboard processing on the UAV. Example methods enable long term persistent target tracking and reacquisition using robust target signatures, which may be occlusion-robust target signatures based on overlapped sub-regions of selected image frames. Further examples may enable matching using sub-region based target signatures so as to disregard unwanted background information in the selected image frames. Using these methods, robust target reacquisition due to long term occlusion and reliable target identification under partial occlusion are described.
- Within an example, a UAV may include an electro-optical (EO) or infrared camera that captures video of ground target(s), and processing is performed to determine distinguishable and consistent target signatures. Target loss can occur due to changes of illumination, partial/full occlusions, etc. in the video. To lower probabilities of target loss within a tracking system, sub-region matching between reference image frames and newly detected image frames of the target can be used based on statistical characteristics of luminance, chrominance, and respective entropies, to achieve reliable target matching and re-acquire a target lost due to occlusions or tracking failure. Target signature matching can be performed using cyclic sub-region matching and median of minimum or minimum of minimums matching between reference and candidate image frames that may have different occlusion patterns to track or reacquire identification of the target. Using sub-region matching effectively filters out occluded areas by selecting a variety of sub-regions to be matched between the reference and candidate image frames.
- Referring now to the figures,
FIG. 1 is a block diagram of anexample system 100 for object tracking. Thesystem 100 includessensors 102, such as an infrared (IR) camera (or EO camera) 104, a LIDAR (light detection and ranging) 106, a RADAR (radio detection and ranging) 108, and possiblyother sensors 110 that are in communication with aprocessor 112. Thesystem 100 further includes a three-dimensional (3D) terrain database 114 also in communication with theprocessor 112. Theprocessor 112 may receive inputs from thesensors 102 and the 3D terrain database 114, and process the inputs to generateoutputs 116 that are stored indata storage 118. Thedata storage 118 may store a sequence of image frames 105 that include areference image frame 120 representative of a frame of a video that includes an exemplar object, and one or more candidate image frames 122 of the video that are identified as possibly including portions of the object. Thesystem 100 may further include adisplay 124 in communication with thedata storage 118 and/or theprocessor 112 to receive and display theoutputs 116. - The
system 100 may be entirely within a vehicle or an aircraft, or portions of thesystem 100 may be on an aircraft (e.g., such as the sensors) and portions of the system may be elsewhere or located within other computing devices (e.g., such as the 3D terrain database). - The
IR camera 104 may be a long wave IR camera configured to collect infrared information of an environment of a vehicle or aircraft, and to generate an image using the infrared information. Thus, theIR camera 104 may collect information of the environment of the vehicle and output a sequence of video frames 105, for example, to theprocessor 112. Other types of cameras may be alternatively or additionally included, such as an EO camera. - The
LIDAR 106 can estimate distance to environmental features while scanning through a scene to assemble a “point cloud” indicative of reflective surfaces in the environment. Individual points in the point cloud can be determined by transmitting a laser pulse and detecting a returning pulse, if any, reflected from any object in the environment, and then determining a distance to the object according to a time delay between the transmitted pulse and reception of the reflected pulse. A laser, or set of lasers, can be rapidly and repeatedly scanned across portions of the environment to generate continuous real-time information on distances to reflective objects in the environment. Combining measured distances and orientation of the laser(s) while measuring each distance allows for associating a three-dimensional position with each returning pulse. In this way, a three-dimensional map of points (e.g., a point cloud) indicative of locations of reflective features in the environment can be generated for the entire scanning zone. TheLIDAR 106 may output point cloud data, or may output images generated using point cloud data, for example. Thus, the LIDAR can be configured to collect laser point cloud data of the environment of the vehicle. - The
RADAR 108 is an object-detection sensor that uses radio waves to determine range, altitude, direction, or speed of objects in an environment. For example, the RADAR may include an antenna that transmits pulses of radio waves or microwaves that bounce off any object in their path. The object returns a portion of the wave's energy to a receiver of the RADAR for estimation or determination of positioning of the object. - The other sensor(s) 110 may include a variety of sensors included on the vehicle for navigational purposes, such as other imaging cameras, inertial measurement units (IMUs), temperature sensors, SONAR, or any other array of sensors and optical components. In some examples, the
sensors 110 may include an inertial navigation system (INS) configured to determine navigation information of the vehicle, a global positioning system (GPS) for determining navigation information as well, or other navigation system. - The 3D terrain database 114 may store terrain images captured by a camera on the vehicle to generate visual representations of the environment of the vehicle.
- The
processor 112 may receive inputs from thesensors 102 to track objects over time as seen in the inputs. Thus, theprocessor 112 may track objects within a video feed output by theIR camera 104 in real-time while the vehicle is traversing the environment, based on inputs from theIR camera 104, theLIDAR 106, theRADAR 108 and thesensors 110, for example. To do so, in one example, theprocessor 112 may extract, from thevideo 105, areference image frame 120 indicative of, or including the object and candidate image frames 122 representative of possible portions of the object, divide thereference image frame 120 and the candidate image frames 122 into multiple cells, and compare characteristics of thereference image frame 120 to characteristics of the candidate image frames 122 for determination of similarity measurements. Theprocessor 112 may store thereference image frame 120 and the candidate image frames 122 in thedata storage 118. The similarity measurements can be used to track the object within the sequence of video frames. - Terrain images from the 3D terrain database 114 may be overlaid onto the video feed to generate the
outputs 116 for storage in thedata storage 118 and for display. - The
outputs 116 may include a number of various forms including a video feed that tracks a target object, or data representative of the target object location in the environment over time. Theoutputs 116 can be sent to thedisplay 124, which may include both multi-function displays (MFD) and head mounted displays (HMD), permitting aircrews to view the outputs. Thedisplay 124 may include other displays of a vehicle as well. As an example, theoutputs 116 may be displayed on thedisplay 124 to highlight the target object being tracked over time within the sequence of video frames. - The
system 100 may be configured to receive inputs from thesensors 102 that include data representative of moving objects in an environment, and process the inputs to track the objects over time. As an example, thesystem 100 may be present on a vehicle (e.g., a UAV) that travels through an environment capturing a video feed of the environment and any moving objects in the environment. TheIR camera 104 may provide the sequence of video frames 105 of the environment with the moving objects, and theprocessor 112 may process the sequence of video frames 105 to track the moving objects over time with respect to a location of the object within the sequence of video frames 105, which may be mapped to a physical geographic location of the object in the environment. - Persistent target tracking can be performed so as to track the object even when the object is occluded by features of the environment, and thus, the
processor 112 may perform target reacquisition from long term occlusions or partial occlusions in real-time. As an example, when a UAV system with an EO or IR camera tracks ground target(s), distinguishable and consistent target signatures can be used for thesystem 100 to reacquire and track lost target(s). Target loss occurs usually due to sudden changes of illumination, partial/full occlusions, etc. Thesystem 100 may perform sub-region matching between reference image frames and newly detected image frames (e.g., portions of image frames) using statistical characteristics of luminance, chrominance, and their entropies, to achieve reliable target matching and reacquisition due to targets lost because of occlusions or tracking failure. In other examples, to track the object, theprocessor 112 determines a signature for a reference image frame of the target object using multiple overlapped sub-regions of the reference image frame for comparison with signatures of newly detected image frames using cyclic sub-region matching, or median/minimum of minimums between reference and candidate image frames that may have different occlusion patterns. - The processing of data may be performed on a computing device separate from the
system 100, or processing may be performed onboard the system (e.g., onboard the UAV) to enhance capabilities for autonomous operations and UAV surveillance. -
FIG. 2 shows a flowchart of anexample method 200 for occlusion-robust visual object fingerprinting using fusion of multiple sub-region signatures, according to an example embodiment.Method 200 shown inFIG. 2 presents an embodiment of a method that could be used with the system shown inFIG. 1 , for example, and may be performed by a computing device (or components of a computing device) such as a client device or a server or may be performed by components of both a client device and a server. Example devices or systems may be used or configured to perform logical functions presented inFIG. 2 . In some instances, components of the devices and/or systems may be configured to perform the functions such that the components are actually configured and structured (with hardware and/or software) to enable such performance. In other examples, components of the devices and/or systems may be arranged to be adapted to, capable of, or suited for performing the functions, such as when operated in a specific manner.Method 200 may include one or more operations, functions, or actions as illustrated by one or more of blocks 202-212. Although the blocks are illustrated in a sequential order, these blocks may also be performed in parallel, and/or in a different order than those described herein. Also, the various blocks may be combined into fewer blocks, divided into additional blocks, and/or removed based upon the desired implementation. - It should be understood that for this and other processes and methods disclosed herein, flowcharts show functionality and operation of one possible implementation of present embodiments. In this regard, each block may represent a module, a segment, or a portion of program code, which includes one or more instructions executable by a processor for implementing specific logical functions or steps in the process. The program code may be stored on any type of computer readable medium or data storage, for example, such as a storage device including a disk or hard drive. The computer readable medium may include non-transitory computer readable medium or memory, for example, such as computer-readable media that stores data for short periods of time like register memory, processor cache and Random Access Memory (RAM). The computer readable medium may also include non-transitory media, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example. The computer readable media may also be any other volatile or non-volatile storage systems. The computer readable medium may be considered a tangible computer readable storage medium, for example.
- In addition, each block in
FIG. 2 may represent circuitry that is wired to perform the specific logical functions in the process. Alternative implementations are included within the scope of the example embodiments of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrent or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art. - At
block 202, themethod 200 includes receiving an indication of an object within the sequence of video frames 105. The sequence of video frames 105 may be output by thecamera 104 and received by a computing device or theprocessor 112. It may be desired to track an object within the sequence of video frames. Visual object tracking can thus be performed to track a ground target/object within a video sequence, and once an object for tracking is chosen or determined, the object can be followed within the video sequence. A specific object for tracking can be determined by a user selecting or designating the object or by other manners resulting in receipt of an input or indication indicating the object. - In some examples, rather than manually selecting an object in a scene or frame for tracking, the
method 200 may include detecting a moving object in the sequence of video frames as the object for tracking. Moving object detection may be performed in a number of ways, such as by frame-by-frame comparison to determine differences between frames and drawing bounding boxes around areas that have differences. Areas without differences (or differences less than a threshold) may be determined to be background (e.g., portions of frames that include little or no movement). Areas with differences above a threshold likely include moving objects, and such areas can be identified and noted as including objects of interest for tracking. - In another example, moving object detection can be performed with a moving object detection method that takes into account jittering/vibration when the videos contain image motion due to platform motion. To detect salient or independently moving ground objects in these videos, the video images can be stabilized frame by frame so that stationary backgrounds remain fixed in the image. The video are stabilized by registering image frames to a certain global coordinate system, and then videos of the scene appear stable with respect to a ground plane and other environmental structures fixed in the image so that independently moving objects such as ground vehicles appear as moving objects in the video. Feature correspondence matching can be used to compare sets of features and match key points from one image frame to others that have similar features. A set of matching points from two images can be generated, and processed using minimum Euclidean distances, for example, resulting in matching features that are indicative of moving objects in the video.
- At
block 204, themethod 200 includes selecting, from the sequence of video frames 105, a reference image frame,frame 120 for example, indicative of the object and one or more candidate image frames, frames 122 for example, representative of possible portions of the object. Once an object is designated, using, for example, the moving object detection method described above, a reference image frame of the object in the video is selected or extracted as a target signature for tracking or target reacquisition. A target signature, for example, is a representation of an appearance and shape of a target of interest (e.g., vehicle, pedestrian) in an image frame to be used for matching/comparison with other signatures collected. The reference image frame can be manually selected/extracted/identified from the video, or using the moving object detection methods, an image frame can be extracted that includes an ideal representation of the object (e.g., an image frame that illustrates the object with little or no occlusions). - Candidate image frames may be representative of possible portions of the object, such as video frames that include portions of the object occluded by another object. Candidate image frames can also be identified within the video frames using the moving object detection methods where feature comparisons between frames indicate matches of at least some features so that the candidate image frames contain at least a portion of the object.
- At
block 206, themethod 200 includes dividing thereference image frame 120 and the candidate image frames 122 into multiple cells. A cell may be a smaller portion of the image frame. Each cell contains a certain number of pixels representing partial appearance information of the object. Depending on the image frame size, the size of a cell and a number of cells can vary. -
FIG. 3A illustrates the examplereference image frame 120, showing an object 302 (e.g., a car), that has been extracted from a video or the sequence of video frames 105. Thereference image frame 120 is a portion of video or the sequence of video frames 105 (shown inFIG. 1 ). In this example, theobject 302 is a car.FIG. 3B illustrates thereference image frame 120 divided into a number ofcells 304. In this example, the number ofcells 304 is 25 cells. In this example, the size of thereference image frame 120 is approximately 25×40 pixels, and by dividing thereference image frame 120 into 25 (5×5) cells, each cell size becomes 5×8 pixels. Each cell represents part of thereference image frame 120. -
FIG. 3C illustrates the examplecandidate image frame 122 showing theobject 302 that has been extracted from the sequence of video frames 105. Thecandidate image frame 122 illustrates theobject 302 occluded by an occlusion 312 (e.g., the car driving under a bridge).FIG. 3B illustrates thecandidate image frame 122 divided into a number ofcells 304. In this example, the number ofcells 304 is 25 cells. - Referring back to
FIG. 2 , atblock 208, themethod 200 includes defining, for thereference image frame 120 and the candidate image frames 122, a plurality of 402, 404, 406, and 408 (as shown insub-regions FIG. 4A ) for thereference image frame 120 and a plurality of 410, 412, 414, and 416 (as shown insub-regions FIG. 4B ) for thecandidate image frame 122, and each sub-region 402-408 and 410-416 includesmultiple cells 304. One or more of the sub-regions 402-408 and 410-416 include the same cells for overlappingcells 306 or overlapping representations and the plurality of sub-regions include multiple sizes. - Using the
method 200, the candidate image frames 122 are compared to thereference image frame 120 to track theobject 302 throughout the video. However, depending on various conditions such as a location (center, left, or right) and size of the object/target within the image frame, viewpoints (view angles) toward the target, or existence of occlusions and clutters, it may not be guaranteed to have cell-to-cell matching correspondence among different image frames of the same target. To make target signature matching robust, multiple sub-regions are assigned in overlapped and multiple-sized ways. An example purpose of overlapping is to include the same features in many sub-regions, and an example purpose of multiple-sizes is to consider that an effective number of cells in a sub-region varies due to background inclusion or partial occlusion in the image frame. -
FIG. 4A illustrates examples of sub-regions that may be defined for thereference image frame 120. The top row shows thereference image frame 120 with 402 and 404 of 3×3 cells and the bottom row showsexample sub-regions 406 and 408 of 4×4 cells. To create the various overlapping sub-regions, after a size is chosen, sub-regions can be generated from a left-top corner of the image frame down to a to right-bottom corner by shifting the 3×3 bounding box over and down one cell at a time. For example, as shown inexample sub-regions FIG. 4A , the overlappingcells 306 are both included in thesub-region 402 and in thesub-region 404. Using a 3×3 sub-region size may result in 9 different sub-regions and using a 4×4 sub-region size may result in 4 different sub-regions for a 25 cell image frame. This way, sub-regions are generated to be overlapping and of multiple sizes. Not all possible different sub-regions of overlapping and possible sub-regions of multiple sizes need to be generated. More overlapping and size variation can result in more robust matching. -
FIG. 4B illustrates examples of sub-regions that may be defined for thecandidate image frame 122. Thecandidate image frame 122 shows theobject 302 occluded by portions of environment (e.g., car drives under bridge and anocclusion 312 exists). The top row shows thecandidate image frame 122 with 410 and 412 of 3×3 cells and the bottom row showsexample sub-regions 414 and 416 of 4×4 cells. To create the various overlapping sub-regions, after a size is chosen, sub-regions can be generated from a left-top corner of the image frame down to a to right-bottom corner by shifting the 3×3 bounding box over and down one cell at a time. For example, as shown inexample sub-regions FIG. 4B , the overlappingcells 308 are both included in thesub-region 410 and thesub-region 412. Using a 3×3 sub-region size may result in 9 different sub-regions and using a 4×4 sub-region size may result in 4 different sub-regions for a 25 cell image frame. This way, sub-regions are generated to be overlapping and of multiple sizes. Not all possible different sub-regions of overlapping and possible sub-regions of multiple sizes need to be generated. More overlapping and size variation can result in more robust matching. - Thus, both of the
reference image frame 120 and the candidate image frames 122 are divided intomultiple cells 304.Multiple cells 304 may then be grouped together to form sub-regions, such as for example, the 402, 404, 406, and 408 shown insub-regions FIG. 4A and 410, 412, 414, and 416 shown insub-regions FIG. 4B . - Referring back to
FIG. 2 , atblock 210, themethod 200 includes comparing characteristics of the plurality of sub-regions of thereference image frame 120 to characteristics of the plurality of sub-regions of the candidate image frames 122 and determining similarity measurements. - As one example, for each sub-region, a fingerprint signature is calculated by extracting unique features of the sub-regions for comparison to determine if the object (or portion of object) is present in both the reference and candidate image frames. An example fingerprint signature vector f contains the following information (in YCbCr color space) of pixels in an image frame: Luminance mean value Lmean, Red chrominance mean value Crmean, Blue chrominance mean value Cbmean, Luminance entropy Lent, Red chrominance entropy Crent, and/or Blue chrominance entropy Cbent. Alternatively or additionally, besides the mean-entropy vector, its covariance matrix, C, can be estimated such that each sub-region has a fingerprint pair, {f, C}.
-
FIGS. 5A-5B illustrate example sub-region comparisons. InFIG. 5A , thereference image frame 120 is shown with thesub-region 406 represented by the rectangle that includes a portion of theobject 302.FIG. 5B is thecandidate image frame 122 that includes thesub-region 414 represented by the rectangle. Thecandidate image frame 122 shows theobject 302 occluded by portions of environment (e.g., car drives under bridge and theocclusion 312 exists). The example inFIGS. 5A-5B show that thesub-region 406 matches to thesub-region 414. However, for other sub-regions of thecandidate image frame 122 inFIG. 5B , no match may have been determined due to theocclusion 312. In this example, sub-region matching can identify matches when different overlapping and multiple-sized sub-regions of thereference image frame 120 are used for comparison with sub-regions that can be generated from thecandidate image frame 122. Thus, sub-region matching is performed in a manner to consider a number of occlusion patterns of the object within the given candidate image frame. - Referring back to
FIG. 2 , after comparing 406 and 414, similarity measurements are determined. Sub-region comparisons may include comparing respective fingerprint signatures of thesub-regions reference image frame 120 to respective fingerprint signatures of the candidate image frames 122, and determining similarity measurements based on a Kullback-Leibler Distance (KLD). The KLD similarity measurement may be used as an indication of a match, and a lower/shorter distance indicates a better match. A threshold distance may be satisfied to determine a match. - In some examples, for a given comparison of sub-regions of the
reference image frame 120 to thecandidate image frame 122, matching can be performed in a cyclic manner.FIG. 6 illustrates the example 3×3sub-region 402 of thereference image frame 120 that can be rotated around acenter cell 310 so as to rotate sub-regions around thecenter cell 310 resulting in a plurality of different combinations of sub-regions. The same may be performed for the candidate image frames 122 resulting in a plurality of different combinations of pairs of sub-regions between thereference image frame 120 and the candidate image frames 122, and similarity measurements for the combinations of sub-region pairs are determined. A given candidate image frame with a minimum of similarity measurements may be determined as a best match for tracking purposes. - Cyclic matching may be useful for candidate image frames where a target itself rotates or turns, or when a sensing platform (on the UAV) changes viewpoints. Here, sub-region matching is performed by taking the rotation effects in consideration. In an example in which a number of sub-regions is fixed as nine (e.g., one center sub-region and eight rotating sub-regions around the center sub-region as partially illustrated in
FIG. 4 ), then using cyclic matching, there will be eight different combinations of sub-region pairs between two different image frames. - Since a signature value for each sub-region represents a local signature and it is matched to a corresponding signature in the other image frame (but with any possible rotations), cycle sub-region matching can show more robust target matching for cases of partial occlusion and rotation effects being present.
- The KLD similarity measurement for each signature pair (e.g., mean vector of luminance/chrominance/entropies and their corresponding covariance matrices) is determined. As an example, for the KLD value between an image frame i and an image frame j, the following equation is used:
-
- where fi and fj are mean-entropy vectors of the image frame i and the image frame j, respectively, and Ci and Cj are the corresponding covariance matrices. Since KLD is not symmetrical, then KLDj|i is also calculated and an average between KLDj|i and KLDi|j is determined.
- So, the signature pair {fT, CT} for the reference image frame and {fK,} for the Kth candidate image frame are compared with calculating each sub-KLD, KLDT|K i,j between the ith sub-signature pair, {fT i, CT i}, and the jth sub-signature pair {fK j, CK j} as shown below:
-
KLDT,K i,j=0.5*(KLDT|K i|j+KLDK|T j|i) Equation [2] - where:
-
- As mentioned earlier, for each comparison, there are eight different combinations and those KLDs are calculated by fixing the center cell and rotating the other cells in one direction, as in
FIG. 7 , resulting in the following KLD measurements: -
- An initial clause of the KLD measurements {KLdist(FP0 j, FP0 0)} is always the same, and remaining portions of the KLD measurements are due to the cyclic rotation to compare all different orientations.
- Finally, a resulting KLD for a best candidate image frame “T” is as follow
-
KLDT=mink(ΣiΣj{KLDT,K i,k}) Equation [5] - In other examples, for a given comparison of sub-regions of the
reference image 120 frame to thecandidate image frame 122, matching can be performed by determining median of a minimum or a minimum of a minimum of the similarity measurements of thecandidate image frame 122. For median of minimum matching, one sub-region matching that presents the best fit is effectively chosen. For each sub-region i in the image frame T, KLDT,K i,j is obtained with the sub-region j in the image frame K. To obtain the best match for the sub-region i in T, a minimum KLDT,K i,j over j's is estimated. Then over i's, a median of KLDT,K i's is determined. Then a final KLD value for image frame T with candidate K will be as follows: -
KLDT,K=mediani(minj({KLDT,K i,j})) Equation [6] - An image frame that has a minimum value among KLDT,K 's is then chosen as the best match.
- A minimum of minimum matching can be used when large amounts of occlusion are expected. Therefore, unless small amounts of partial occlusions are expected, for minimum of the minimum method is used with minimums of KLDT,K i's. Choosing the minimum will likely provide a higher chance to be free from partial occlusions compared to choosing the median of KLDT,K i's. In this example, a final KLDT,K is estimated as follows:
-
KLDT,K=mini(minj({KLDT,K i,j})) Equation [7] - Referring back to
FIG. 2 , atblock 212, themethod 200 includes based on the similarity measurements, tracking theobject 302 within the sequence of video frames 105. Tracking theobject 302 within the sequence of video frames 105 includes determining matches between the candidate image frames 122 and thereference image frame 120, and based on mis-matches between the candidate image frames 122 and thereference image frame 120 within a portion of the sequence of video frames 105 target reacquisition within a subsequent portion of the sequence of video frames 105 is performed. - In summary, examples of the
method 200 may include storing a reference image frame's fingerprint pairs {fT i, CT i} after theobject 302 is selected, detecting moving target candidates, assigning each detected object into a candidate image frame, and for each image frame dividing it into cells (e.g., 5×5 cells in one image frame) and assigning sub-regions (e.g., 3×3 cells or 4×4 cells). For each pair between a jth sub-region of a Kth candidate image frame and an ith sub-region of the reference image frame T, KLDT,K i,j is calculated, and a candidate with a minimum of cyclic/minimum of median/minimum of minimum KLDs with the reference image frame is determined to track the object between frames of a video. - Example tests were performed and a comparison of test results with random occlusion rates were determined using matching methods of (1) an entire image frame method (uses the entire area of the extracted image frame and only one KLD value), (2) the cyclic sub-region method (uses sub-regions in a cyclic way), (3) the median of the minimum of the overlapped multiple sub-region method (uses multiple overlapped sub-regions and selects the median of the minimum KLD values), and (4) the minimum of the minimum of the overlapped multiple sub-region method (uses multiple overlapped sub-regions and selects the minimum of the minimum KLD values).
- In the example tests, sixty vehicle image frames of four different vehicles were selected and artificially occluded by background image frames. Partial occlusion rates were randomly selected in 15%-25% and in a random portion of the image frame. Tables 1-4 below present test results. In the tables, “Occ V#” is the occluded vehicle type and “V#” is an original vehicle image frame in the same category. The original image frame itself was not compared with its own occluded image frame. The “Background” indicates non-vehicle image frames. As shown in the tables, all sub-region based methods outperformed the previous entire image frame method, and the minimum of the minimum method was shown to be the most accurate in this test.
-
TABLE 1 The entire image frame method (Average correctness: 81.67%) V1 V2 V3 V4 Background Occ V1 0.786 0.071 0 0 0.143 Occ V2 0.077 0.769 0 0 0.154 Occ V3 0 0 0.375 0.625 0 Occ V4 0 0 0 1.000 0 -
TABLE 2 The cyclic subregion method (Average correctness: 91.67%) V1 V2 V3 V4 Background Occ V1 0.929 0 0 0.071 0 Occ V2 0.077 0.769 0 0.154 0 Occ V3 0 0 0.875 0.125 0 Occ V4 0 0 0 1.000 0 -
TABLE 3 The median of the minimum method (Average correctness: 86.67%) V1 V2 V3 V4 Background Occ V1 0.786 0.071 0 0 0.143 Occ V2 0 0.692 0.077 0.154 0.077 Occ V3 0 0 0.875 0.125 0 Occ V4 0 0 0 1.000 0 -
TABLE 4 The minimum of the minimum method (Average correctness: 96.67%) V1 V2 V3 V4 Background Occ V1 1.000 0 0 0 0 Occ V2 0 0.846 0 0.077 0.077 Occ V3 0 0 1.000 0 0 Occ V4 0 0 0 1.000 0 - Similar tests were performed for a comparison test with fixed occlusion rates. In this test, 2641 vehicle image frames with 46 different vehicles were used. Occlusion was applied using 215 background image frames, and occlusion rates were selected as 0%, 12.5%, 25%, 33%, and 50% for each test. Though occlusion rates are fixed for each test, occlusion locations were randomly assigned. Sizes of sub-regions were selected as 45% and 65% of each entire image frame.
-
FIG. 7 is an example target classification accuracy graph between the entire image frame method and the sub-region method (e.g., minimum of the minimum method was used in this test). Over all the ranges (0% through 50%), the sub-region method provided better performance. Up to 25%, the sub-region method rarely shows a decrease of accuracy, and after 33%, both the methods decrease because 33% occlusion in the image frame can be more than 50% occlusion of the target itself in some examples. If occlusion is more than 50%, target matching becomes difficult. - As mentioned, portions of any of the methods described herein (e.g., the method 200) may be performed by a computing device (or components of a computing device), as well as by components of elements shown in
FIG. 1 .FIG. 8 illustrates a schematic drawing of anexample computing device 800. Thecomputing device 800 inFIG. 8 may represent devices shown inFIG. 1 including the processors, the system, or any of the blocks conceptually illustrating computing components, or thecomputing device 800 may represent the system inFIG. 1 in general. In some examples, some components illustrated inFIG. 8 may be distributed across multiple computing devices. However, for the sake of example, the components are shown and described as part of oneexample device 800. Thecomputing device 800 may be or include a mobile device, desktop computer, email/messaging device, tablet computer, or similar device that may be configured to perform the functions described herein. - The
computing device 800 may include aninterface 802, awireless communication component 804, sensor(s) 806,data storage 808, and aprocessor 810. Components illustrated inFIG. 8 may be linked together by acommunication link 812. Thecomputing device 800 may also include hardware to enable communication within thecomputing device 800 and between thecomputing device 800 and another computing device (not shown), such as a server entity. The hardware may include transmitters, receivers, and antennas, for example. - The
interface 802 may be configured to allow thecomputing device 800 to communicate with another computing device (not shown), such as a server. Thus, theinterface 802 may be configured to receive input data from one or more computing devices, and may also be configured to send output data to the one or more computing devices. In some examples, theinterface 802 may also maintain and manage records of data received and sent by thecomputing device 800. Theinterface 802 may also include a receiver and transmitter to receive and send data. In other examples, theinterface 802 may also include a user-interface, such as a keyboard, microphone, touchscreen, etc., to receive inputs as well. - The
wireless communication component 804 may be a communication interface that is configured to facilitate wireless data communication for thecomputing device 800 according to one or more wireless communication standards. For example, thewireless communication component 804 may include a Wi-Fi communication component that is configured to facilitate wireless data communication according to one or more IEEE 802.11 standards. As another example, thewireless communication component 804 may include a Bluetooth communication component that is configured to facilitate wireless data communication according to one or more Bluetooth standards. Other examples are also possible. - The
sensor 806 may include one or more sensors, or may represent one or more sensors included within thecomputing device 800. Example sensors include an accelerometer, gyroscope, pedometer, light sensors, microphone, camera, or other location and/or context-aware sensors. - The
data storage 808 may storeprogram logic 814 that can be accessed and executed by theprocessor 810. Thedata storage 808 may also store collected sensor data orimage data 816. - The description of the different advantageous arrangements has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. Further, different advantageous embodiments may describe different advantages as compared to other advantageous embodiments. The embodiment or embodiments selected are chosen and described in order to best explain the principles of the embodiments, the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/705,473 US9483839B1 (en) | 2015-05-06 | 2015-05-06 | Occlusion-robust visual object fingerprinting using fusion of multiple sub-region signatures |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/705,473 US9483839B1 (en) | 2015-05-06 | 2015-05-06 | Occlusion-robust visual object fingerprinting using fusion of multiple sub-region signatures |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US9483839B1 US9483839B1 (en) | 2016-11-01 |
| US20160328860A1 true US20160328860A1 (en) | 2016-11-10 |
Family
ID=57189434
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/705,473 Active US9483839B1 (en) | 2015-05-06 | 2015-05-06 | Occlusion-robust visual object fingerprinting using fusion of multiple sub-region signatures |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US9483839B1 (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108629350A (en) * | 2017-03-15 | 2018-10-09 | 华为技术有限公司 | The method and device of similarity relation between a kind of identification picture |
| KR20190108018A (en) * | 2018-03-13 | 2019-09-23 | 재단법인대구경북과학기술원 | An object detecting apparatus and method using fusion sensor |
| CN110794397A (en) * | 2019-10-18 | 2020-02-14 | 北京全路通信信号研究设计院集团有限公司 | Target detection method and system based on camera and radar |
| DE102021001022A1 (en) | 2021-02-25 | 2022-08-25 | Ziehm Imaging Gmbh | Method and device for registering two sets of medical image data, taking scene changes into account |
| WO2025094276A1 (en) * | 2023-10-31 | 2025-05-08 | 日本電気株式会社 | Sensing data processing device, sensor system, sensing data processing method, and recording medium |
Families Citing this family (34)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015127203A1 (en) * | 2014-02-21 | 2015-08-27 | Astronautics Corporation Of America | System for communicating avionics information through portable electronic devices |
| CN105974940B (en) * | 2016-04-29 | 2019-03-19 | 优利科技有限公司 | Target tracking method suitable for aircraft |
| US10816354B2 (en) | 2017-08-22 | 2020-10-27 | Tusimple, Inc. | Verification module system and method for motion-based lane detection with multiple sensors |
| US10762673B2 (en) | 2017-08-23 | 2020-09-01 | Tusimple, Inc. | 3D submap reconstruction system and method for centimeter precision localization using camera-based submap and LiDAR-based global map |
| US10565457B2 (en) | 2017-08-23 | 2020-02-18 | Tusimple, Inc. | Feature matching and correspondence refinement and 3D submap position refinement system and method for centimeter precision localization using camera-based submap and LiDAR-based global map |
| US10953881B2 (en) | 2017-09-07 | 2021-03-23 | Tusimple, Inc. | System and method for automated lane change control for autonomous vehicles |
| US10649458B2 (en) | 2017-09-07 | 2020-05-12 | Tusimple, Inc. | Data-driven prediction-based system and method for trajectory planning of autonomous vehicles |
| US10953880B2 (en) | 2017-09-07 | 2021-03-23 | Tusimple, Inc. | System and method for automated lane change control for autonomous vehicles |
| US10788585B2 (en) | 2017-09-15 | 2020-09-29 | Toyota Research Institute, Inc. | System and method for object detection using a probabilistic observation model |
| US10410055B2 (en) * | 2017-10-05 | 2019-09-10 | TuSimple | System and method for aerial video traffic analysis |
| AU2019206509A1 (en) | 2018-01-09 | 2020-07-23 | Tusimple, Inc. | Real-time remote control of vehicles with high redundancy |
| EP3738106A4 (en) | 2018-01-11 | 2021-09-08 | TuSimple, Inc. | Monitoring system for autonomous vehicle operation |
| US12270661B2 (en) | 2018-02-14 | 2025-04-08 | Tusimple, Inc. | Lane marking localization and fusion |
| US11009365B2 (en) | 2018-02-14 | 2021-05-18 | Tusimple, Inc. | Lane marking localization |
| US11009356B2 (en) | 2018-02-14 | 2021-05-18 | Tusimple, Inc. | Lane marking localization and fusion |
| US10685244B2 (en) | 2018-02-27 | 2020-06-16 | Tusimple, Inc. | System and method for online real-time multi-object tracking |
| CN110378184A (en) | 2018-04-12 | 2019-10-25 | 北京图森未来科技有限公司 | A kind of image processing method applied to automatic driving vehicle, device |
| CN116129376A (en) | 2018-05-02 | 2023-05-16 | 北京图森未来科技有限公司 | Road edge detection method and device |
| CN110458861B (en) * | 2018-05-04 | 2024-01-26 | 佳能株式会社 | Object detection and tracking method and device |
| US11292480B2 (en) | 2018-09-13 | 2022-04-05 | Tusimple, Inc. | Remote safe driving methods and systems |
| US10942271B2 (en) | 2018-10-30 | 2021-03-09 | Tusimple, Inc. | Determining an angle between a tow vehicle and a trailer |
| CN111366938B (en) | 2018-12-10 | 2023-03-14 | 北京图森智途科技有限公司 | Method, device and vehicle for measuring trailer angle |
| CN111319629B (en) | 2018-12-14 | 2021-07-16 | 北京图森智途科技有限公司 | A method, device and system for forming an autonomous vehicle fleet |
| DE102020107804A1 (en) * | 2019-04-26 | 2020-10-29 | Infineon Technologies Ag | Radar apparatus and method for detecting radar targets |
| US11823460B2 (en) | 2019-06-14 | 2023-11-21 | Tusimple, Inc. | Image fusion for autonomous vehicle operation |
| CN110570463B (en) * | 2019-09-11 | 2023-04-07 | 深圳市道通智能航空技术股份有限公司 | Target state estimation method and device and unmanned aerial vehicle |
| EP3893150A1 (en) | 2020-04-09 | 2021-10-13 | Tusimple, Inc. | Camera pose estimation techniques |
| AU2021203567A1 (en) | 2020-06-18 | 2022-01-20 | Tusimple, Inc. | Angle and orientation measurements for vehicles with multiple drivable sections |
| CN112489090B (en) * | 2020-12-16 | 2024-06-04 | 影石创新科技股份有限公司 | Method for tracking target, computer readable storage medium and computer device |
| US20230029566A1 (en) * | 2021-07-27 | 2023-02-02 | Electronics And Telecommunications Research Institute | Method and system for detecting unmanned aerial vehicle using plurality of image sensors |
| CN115272988A (en) * | 2022-07-18 | 2022-11-01 | 天翼云科技有限公司 | A vehicle tracking method, device, equipment and medium |
| GB2627955A (en) * | 2023-03-08 | 2024-09-11 | Milestone Systems As | Video anomaly detection |
| CN116664882A (en) * | 2023-04-28 | 2023-08-29 | 南方科技大学 | A correlation matching method, system and storage medium for three-dimensional deformation measurement |
| CN117409285B (en) * | 2023-12-14 | 2024-04-05 | 先临三维科技股份有限公司 | Image detection method and device and electronic equipment |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008070012A2 (en) * | 2006-12-01 | 2008-06-12 | Thomson Licensing | Estimating a location of an object in an image |
| US9147260B2 (en) * | 2010-12-20 | 2015-09-29 | International Business Machines Corporation | Detection and tracking of moving objects |
| US8811670B2 (en) | 2012-09-28 | 2014-08-19 | The Boeing Company | Method and system for using fingerprints to track moving objects in video |
| EP2849425A1 (en) * | 2013-09-16 | 2015-03-18 | Thomson Licensing | Color video processing system and method, and corresponding computer program |
| WO2015117072A1 (en) * | 2014-01-31 | 2015-08-06 | The Charles Stark Draper Laboratory, Inc. | Systems and methods for detecting and tracking objects in a video stream |
| US9158971B2 (en) * | 2014-03-03 | 2015-10-13 | Xerox Corporation | Self-learning object detectors for unlabeled videos using multi-task learning |
-
2015
- 2015-05-06 US US14/705,473 patent/US9483839B1/en active Active
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108629350A (en) * | 2017-03-15 | 2018-10-09 | 华为技术有限公司 | The method and device of similarity relation between a kind of identification picture |
| KR20190108018A (en) * | 2018-03-13 | 2019-09-23 | 재단법인대구경북과학기술원 | An object detecting apparatus and method using fusion sensor |
| KR102090487B1 (en) * | 2018-03-13 | 2020-03-18 | 재단법인대구경북과학기술원 | An object detecting apparatus and method using fusion sensor |
| CN110794397A (en) * | 2019-10-18 | 2020-02-14 | 北京全路通信信号研究设计院集团有限公司 | Target detection method and system based on camera and radar |
| DE102021001022A1 (en) | 2021-02-25 | 2022-08-25 | Ziehm Imaging Gmbh | Method and device for registering two sets of medical image data, taking scene changes into account |
| DE102021001022B4 (en) | 2021-02-25 | 2023-02-02 | Ziehm Imaging Gmbh | Method and device for registering two sets of medical image data, taking scene changes into account |
| WO2025094276A1 (en) * | 2023-10-31 | 2025-05-08 | 日本電気株式会社 | Sensing data processing device, sensor system, sensing data processing method, and recording medium |
Also Published As
| Publication number | Publication date |
|---|---|
| US9483839B1 (en) | 2016-11-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9483839B1 (en) | Occlusion-robust visual object fingerprinting using fusion of multiple sub-region signatures | |
| US11967023B2 (en) | Method and system for use in colourisation of a point cloud | |
| TWI695181B (en) | Methods and systems for color point cloud generation | |
| US9576375B1 (en) | Methods and systems for detecting moving objects in a sequence of image frames produced by sensors with inconsistent gain, offset, and dead pixels | |
| JP7082545B2 (en) | Information processing methods, information processing equipment and programs | |
| CN107272021B (en) | Object detection using radar and visually defined image detection areas | |
| US20200401617A1 (en) | Visual positioning system | |
| US10297084B2 (en) | Identification of relative distance of objects in images | |
| EP2713308B1 (en) | Method and system for using fingerprints to track moving objects in video | |
| JP7343054B2 (en) | Location estimation method, location estimation device, and location estimation program | |
| US11430199B2 (en) | Feature recognition assisted super-resolution method | |
| Wang et al. | Automated road sign inventory system based on stereo vision and tracking | |
| CN114898314A (en) | Target detection method, device and equipment for driving scene and storage medium | |
| Sorial et al. | Towards a real time obstacle detection system for unmanned surface vehicles | |
| CN114384486B (en) | Data processing method and device | |
| Sakai et al. | Large-scale 3D outdoor mapping and on-line localization using 3D-2D matching | |
| CN115994934B (en) | Data time alignment method and device and domain controller | |
| GB2520243A (en) | Image processor | |
| KR20220062709A (en) | System for detecting disaster situation by clustering of spatial information based an image of a mobile device and method therefor | |
| Tilon et al. | Vehicle tracking and speed estimation from unmanned aerial vehicles using segmentation-initialised trackers | |
| US20210304518A1 (en) | Method and system for generating an environment model for positioning | |
| Ao et al. | Detecting tiny moving vehicles in satellite videos | |
| Sambolek et al. | Determining the geolocation of a person detected in an image taken with a drone | |
| Fehlmann et al. | Application of detection and recognition algorithms to persistent wide area surveillance | |
| Machado | Vehicle speed estimation based on license plate detection |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: THE BOEING COMPANY, ILLINOIS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KWON, HYUKSEONG;OWECHKO, YURI;KIM, KYUNGNAM;SIGNING DATES FROM 20150429 TO 20150506;REEL/FRAME:035578/0525 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |