Lip Reading

The input pipeline must be prepared by the users. This code is aimed to provide the implementation for Coupled 3D Convolutional Neural Networks for audio-visual matching. Lip-reading can be a specific application for this work. Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information. The essential problem is to find the correspondence between the audio and visual streams, which is the goal of this work. We proposed the utilization of a coupled 3D Convolutional Neural Network (CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features.

Features

The proposed architecture will incorporate both spatial and temporal information
The input pipeline must be provided by the user
For lip tracking, the desired video must be fed as the input
Running the aforementioned script extracts the lip motions by saving the mouth area of each frame and create the output video with a rectangular around the mouth area
In the visual section, the videos are post-processed to have an equal frame rate of 30 f/s
The proposed architecture utilizes two non-identical ConvNets which uses a pair of speech and video streams

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Lip Reading

Lip Reading Web Site

User Reviews

Be the first to post a review of Lip Reading!

Additional Project Details

Programming Language

Python

Related Categories

Python Machine Learning Software, Python Speech Recognition Software

Registered

2022-08-11

Similar Business Software

Google Cloud Speech-to-Text

Google Cloud’s Speech API processes more than 1 billion voice minutes per month with close to human levels of understanding for many commonly spoken languages. Powered by the best of Google's AI research and technology, Google Cloud's Speech-to-Text API helps you accurately transcribe speech...

See Software
Google Cloud BigQuery

BigQuery is a serverless, multicloud data warehouse that simplifies the process of working with all types of data so you can focus on getting valuable business insights quickly. At the core of Google’s data cloud, BigQuery allows you to simplify data integration, cost effectively and securely...

See Software
Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
Teradata VantageCloud

Teradata VantageCloud: The complete cloud analytics and data platform for AI. Teradata VantageCloud is an enterprise-grade, cloud-native data and analytics platform that unifies data management, advanced analytics, and AI/ML capabilities in a single environment. Designed for scalability and...

See Software
Fraud.net

Fraudnet's AI-driven platform empowers enterprises to prevent threats, streamline compliance, and manage risk in real-time. Our sophisticated machine learning models continuously learn from billions of transactions to identify anomalies and predict fraud attacks. Our unified solutions:...

See Software

Report inappropriate content

Lip Reading

Cross Audio-Visual Recognition using 3D Architectures

Get an email when there's a new version of Lip Reading

Features

Project Samples

Project Activity

Categories

License

Follow Lip Reading

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered