Sakti et al., 2012 - Google Patents

Distributed speech translation technologies for multiparty multilingual communication

Sakti et al., 2012

Document ID: 14114975326145082913
Author: Sakti S; Paul M; Finch A; Hu X; Ni J; Kimura N; Matsuda S; Hori C; Ashikari Y; Kawai H; Kashioka H; Sumita E; Nakamura S
Publication year: 2012
Publication venue: ACM Transactions on Speech and Language Processing (TSLP)

External Links

Cited by

Snippet

Developing a multilingual speech translation system requires efforts in constructing automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS) components for all possible source and target languages. If the numerous ASR, MT …

Continue reading at dl.acm.org (other versions)

238000004891 communication 0 title abstract description 54

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30427—Query translation
- G06F17/3043—Translation of natural language queries to structured queries

Similar Documents

Publication	Publication Date	Title
Renduchintala et al.	2018	Multi-modal data augmentation for end-to-end ASR
Waibel	2002	Interactive translation of conversational speech
Sangeetha et al.	2017	Speech translation system for english to dravidian languages
Abushariah et al.	2012	Phonetically rich and balanced text and speech corpora for Arabic language
Yun et al.	2014	Multilingual speech-to-speech translation system for mobile consumer devices
Sakti et al.	2013	A-STAR: Toward translating Asian spoken languages
Granell et al.	2018	Multimodality, interactivity, and crowdsourcing for document transcription
Veisi et al.	2022	Jira: a Central Kurdish speech recognition system, designing and building speech corpus and pronunciation lexicon
López-Ludeña et al.	2013	LSESpeak: A spoken language generator for Deaf people
San Segundo et al.	2006	A Spanish speech to sign language translation system for assisting deaf-mute people.
Mhaskar et al.	2023	Vakta-setu: A speech-to-speech machine translation service in select indic languages
Prasad et al.	2013	BBN TransTalk: Robust multilingual two-way speech-to-speech translation for mobile platforms
Sakti et al.	2012	Distributed speech translation technologies for multiparty multilingual communication
Shimizu et al.	2008	NICT/ATR Chinese-Japanese-English speech-to-speech translation system
Matsuda et al.	2017	Development of the “VoiceTra” multi-lingual speech translation system
Sarim et al.	2025	Direct Speech to Speech Translation: A Review
Stallard et al.	2011	The BBN transtalk speech-to-speech translation system
Chen et al.	2016	Mismatched crowdsourcing based language perception for under-resourced languages
Marasek et al.	2015	Spoken language translation for polish
Stuker et al.	2012	Research opportunities in automatic speech-to-speech translation
Dua et al.	2024	An amalgamation of integrated features with DeepSpeech2 architecture and improved spell corrector for improving Gujarati language ASR system
Naidoo et al.	2023	Developing an Automatic Speech-to-Speech Translator Mobile Application using AWS
Narayanan et al.	2006	Speech recognition engineering issues in speech to speech translation system design for low resource languages and domains
Sakti et al.	2009	The Asian network-based speech-to-speech translation system
Shimizu et al.	2006	Development of client-server speech translation system on a multi-lingual speech communication platform