CoTracker is a learning-based point tracking system that jointly follows many user-specified points across a video, rather than tracking each point independently. By reasoning about all tracks together, it can maintain temporal consistency, handle mutual occlusions, and reduce identity swaps when trajectories cross. The model takes sparse point queries on one frame and predicts their sub-pixel locations and a visibility score for every subsequent frame, producing long, coherent trajectories. Its transformer-style architecture aggregates information both along time and across points, allowing it to recover tracks even after brief disappearances. The repository ships with inference scripts, pretrained weights, and simple interfaces to seed points, run tracking, and export trajectories for downstream tasks. Typical uses include correspondence building, motion analysis, dynamic SLAM priors, video editing masks, and evaluation of geometric consistency in real scenes.
Features
- Joint tracking of many points with visibility confidence
- Transformer-based spatiotemporal reasoning over trajectories
- Robustness to occlusions, re-appearances, and large motions
- Easy seeding of points and export of long-range tracks
- Pretrained weights and CLI/Python APIs for quick inference
- Utilities for visualization, evaluation, and downstream integration