Campbell, 2005 - Google Patents

Developments in corpus-based speech synthesis: Approaching natural conversational speech

Campbell, 2005

Document ID: 2244993565313410942
Author: Campbell N
Publication year: 2005
Publication venue: IEICE transactions on information and systems

External Links

Cited by

Snippet

This paper describes the special demands of conversational speech in the context of corpus- based speech synthesis. The author proposed the CHATR system of prosody-based unit- selection for concatenative waveform synthesis seven years ago, and now extends this work …

Continue reading at search.ieice.org (other versions)

230000015572 biosynthetic process 0 title abstract description 45

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis

Similar Documents

Publication	Publication Date	Title
Zhang et al.	2023	Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities
Campbell	2005	Developments in corpus-based speech synthesis: Approaching natural conversational speech
Gibbon et al.	2012	Handbook of multimodal and spoken dialogue systems: Resources, terminology and product evaluation
Reddy et al.	2023	Speech-to-text and text-to-speech recognition using deep learning
Tucker et al.	2016	Why we need to investigate casual speech to truly understand language production, processing and the mental lexicon
Yi	1998	Natural-sounding speech synthesis using variable-length units
Haselow	2017	Spontaneous spoken English: An integrated approach to the emergent grammar of speech
Adell et al.	2012	Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence
Campbell et al.	2005	No laughing matter.
Campbell	2006	Conversational speech synthesis and the need for some laughter
López-Ludeña et al.	2013	LSESpeak: A spoken language generator for Deaf people
Hamad et al.	2011	Arabic text-to-speech synthesizer
Ifeanyi et al.	2014	Text–To–Speech Synthesis (TTS)
Wu et al.	2009	Modeling the expressivity of input text semantics for Chinese text-to-speech synthesis in a spoken dialog system
JP3706112B2 (en)	2005-10-12	Speech synthesizer and computer program
Trouvain et al.	2019	Speech synthesis: text-to-speech conversion and artificial voices
Jaiswal et al.	2021	Concatenative text-to-speech synthesis system for communication recognition
Kamble et al.	2015	Audio Visual Speech Synthesis and Speech Recognition for Hindi Language
Mamatov et al.	2023	Formation of a Speech Database in the Karakalpak Language for Speech Synthesis Systems
Pan et al.	2002	Designing a speech corpus for instancebased spoken language generation
Mihajlik et al.	2024	Is spoken Hungarian low-resource?: a quantitative survey of Hungarian speech data sets
Campbell	2004	Extra-semantic protocols; input requirements for the synthesis of dialogue speech
Farrugia	2005	Text-to-speech technologies for mobile telephony services
Kaveri et al.	2015	A novel approach for hindi text description to speech and expressive speech synthesis
Romito	2017	CHAPTER THREE A TRAINING PROGRAM FOR EXPERT FORENSIC TRANSCRIBERS LUCIANO ROMITO