Mahavidyalaya, 2014 - Google Patents

Phoneme and viseme based approach for lip synchronization

Mahavidyalaya, 2014

Document ID: 12955103996753465433
Author: Mahavidyalaya B
Publication year: 2014
Publication venue: Int. J. Signal Process., Image Process. Pattern Recogn

External Links

Cited by

Snippet

Phoneme to Viseme mapping has great application in Visual Speech Recognition, Lip Synchronization, Talking Head Applications, movies, news reading, and film industries. Lot of work has been done in area of various face component detection and recognition. Apart …

Continue reading at www.researchgate.net (PDF) (other versions)

210000000088 Lip 0 title abstract description 42

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/205—3D [Three Dimensional] animation driven by audio data
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems

Similar Documents

Publication	Publication Date	Title
Czyzewski et al.	2017	An audio-visual corpus for multimodal automatic speech recognition
Cao et al.	2005	Expressive speech-driven facial animation
US10460732B2 (en)	2019-10-29	System and method to insert visual subtitles in videos
Chen	2001	Audiovisual speech processing
Hong et al.	2002	Real-time speech-driven face animation with expressions using neural networks
US7133535B2 (en)	2006-11-07	System and method for real time lip synchronization
Chen et al.	1998	Audio-visual integration in multimodal communication
Liu et al.	2015	Video-audio driven real-time facial animation
CN112581569B (en)	2021-11-23	Adaptive emotion expression speaker facial animation generation method and electronic device
Zhang et al.	2022	Text2video: Text-driven talking-head video synthesis with personalized phoneme-pose dictionary
JP2007507784A (en)	2007-03-29	Audio-visual content composition system and method
WO2021023869A1 (en)	2021-02-11	Audio-driven speech animation using recurrent neutral network
Liew et al.	2009	Visual Speech Recognition: Lip Segmentation and Mapping: Lip Segmentation and Mapping
KR20230095432A (en)	2023-06-29	Text description-based character animation synthesis system
Shimba et al.	2015	Talking heads synthesis from audio with deep neural networks
Theobald et al.	2004	Near-videorealistic synthetic talking faces: Implementation and evaluation
CN117115310A (en)	2023-11-24	Digital face generation method and system based on audio and image
Mahavidyalaya	2014	Phoneme and viseme based approach for lip synchronization
Sui et al.	2012	A 3D audio-visual corpus for speech recognition
Ding et al.	2015	Lip animation synthesis: a unified framework for speaking and laughing virtual agent.
Asadiabadi et al.	2018	Multimodal speech driven facial shape animation using deep neural networks
Serra et al.	2012	A proposal for a visual speech animation system for European Portuguese
Kolivand et al.	2015	Realistic lip syncing for virtual character using common viseme set
Zorić et al.	2006	Real-time language independent lip synchronization method using a genetic algorithm
Sáenz	2022	Mouthing recognition with OpenPose in sign language