MediaPipe offers open-source cross-platform, customizable ML solutions for live and streaming media. Provides segmentation masks for prominent humans in the scene. MediaPipe Face Mesh is a face geometry solution that estimates 468 3D face landmarks in real-time even on mobile devices. It employs machine learning (ML) to infer the 3D surface geometry, requiring only a single camera input without the need for a dedicated depth sensor. Utilizing lightweight model architectures together with GPU acceleration throughout the pipeline, the solution delivers real-time performance-critical for live experiences. Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign language recognition, and full-body gesture control. For example, it can form the basis for yoga, dance, and fitness applications. It can also enable the overlay of digital content and information on top of the physical world in augmented reality.
Features
- High-fidelity human body pose tracking, inferring up to 33 3D full-body landmarks from RGB video frames
- 21 landmarks in 3D with multi-hand support, based on high-performance palm detection and hand landmark model
- 468 face landmarks in 3D with multi-face support
- Provides segmentation masks for prominent humans in the scene
- Ultra lightweight face detector with 6 landmarks and multi-face support
- Detection and tracking of objects in video in a single pipeline