The input pipeline must be prepared by the users. This code is aimed to provide the implementation for Coupled 3D Convolutional Neural Networks for audio-visual matching. Lip-reading can be a specific application for this work. Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information. The essential problem is to find the correspondence between the audio and visual streams, which is the goal of this work. We proposed the utilization of a coupled 3D Convolutional Neural Network (CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features.

Features

  • The proposed architecture will incorporate both spatial and temporal information
  • The input pipeline must be provided by the user
  • For lip tracking, the desired video must be fed as the input
  • Running the aforementioned script extracts the lip motions by saving the mouth area of each frame and create the output video with a rectangular around the mouth area
  • In the visual section, the videos are post-processed to have an equal frame rate of 30 f/s
  • The proposed architecture utilizes two non-identical ConvNets which uses a pair of speech and video streams

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Lip Reading

Lip Reading Web Site

You Might Also Like
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Lip Reading!

Additional Project Details

Programming Language

Python

Related Categories

Python Machine Learning Software, Python Speech Recognition Software

Registered

2022-08-11