Campbell, 2005 - Google Patents
Developments in corpus-based speech synthesis: Approaching natural conversational speechCampbell, 2005
- Document ID
- 2244993565313410942
- Author
- Campbell N
- Publication year
- Publication venue
- IEICE transactions on information and systems
External Links
Snippet
This paper describes the special demands of conversational speech in the context of corpus- based speech synthesis. The author proposed the CHATR system of prosody-based unit- selection for concatenative waveform synthesis seven years ago, and now extends this work …
- 230000015572 biosynthetic process 0 title abstract description 45
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities | |
Campbell | Developments in corpus-based speech synthesis: Approaching natural conversational speech | |
Gibbon et al. | Handbook of multimodal and spoken dialogue systems: Resources, terminology and product evaluation | |
Reddy et al. | Speech-to-text and text-to-speech recognition using deep learning | |
Tucker et al. | Why we need to investigate casual speech to truly understand language production, processing and the mental lexicon | |
Yi | Natural-sounding speech synthesis using variable-length units | |
Haselow | Spontaneous spoken English: An integrated approach to the emergent grammar of speech | |
Adell et al. | Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence | |
Campbell et al. | No laughing matter. | |
Campbell | Conversational speech synthesis and the need for some laughter | |
López-Ludeña et al. | LSESpeak: A spoken language generator for Deaf people | |
Hamad et al. | Arabic text-to-speech synthesizer | |
Ifeanyi et al. | Text–To–Speech Synthesis (TTS) | |
Wu et al. | Modeling the expressivity of input text semantics for Chinese text-to-speech synthesis in a spoken dialog system | |
JP3706112B2 (en) | Speech synthesizer and computer program | |
Trouvain et al. | Speech synthesis: text-to-speech conversion and artificial voices | |
Jaiswal et al. | Concatenative text-to-speech synthesis system for communication recognition | |
Kamble et al. | Audio Visual Speech Synthesis and Speech Recognition for Hindi Language | |
Mamatov et al. | Formation of a Speech Database in the Karakalpak Language for Speech Synthesis Systems | |
Pan et al. | Designing a speech corpus for instancebased spoken language generation | |
Mihajlik et al. | Is spoken Hungarian low-resource?: a quantitative survey of Hungarian speech data sets | |
Campbell | Extra-semantic protocols; input requirements for the synthesis of dialogue speech | |
Farrugia | Text-to-speech technologies for mobile telephony services | |
Kaveri et al. | A novel approach for hindi text description to speech and expressive speech synthesis | |
Romito | CHAPTER THREE A TRAINING PROGRAM FOR EXPERT FORENSIC TRANSCRIBERS LUCIANO ROMITO |