[go: up one dir, main page]

Neubarth et al., 2013 - Google Patents

A hybrid approach to statistical machine translation between standard and dialectal varieties

Neubarth et al., 2013

View PDF
Document ID
5579828802185136483
Author
Neubarth F
Haddow B
Huerta A
Trost H
Publication year
Publication venue
Language and Technology Conference

External Links

Snippet

Using statistical machine translation (SMT) for dialectal varieties usually suffers from data sparsity, but combining word-level and character-level models can yield good results even with small training data by exploiting the relative proximity between the two varieties. In this …
Continue reading at www.research.ed.ac.uk (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2809Data driven translation
    • G06F17/2827Example based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • G06F17/2715Statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • G06F17/271Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/289Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2872Rule based translation
    • G06F17/2881Natural language generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2863Processing of non-latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • G09B19/08Printed or written appliances, e.g. text books, bilingual letter assemblies, charts

Similar Documents

Publication Publication Date Title
US9460080B2 (en) Modifying a tokenizer based on pseudo data for natural language processing
US20140163951A1 (en) Hybrid adaptation of named entity recognition
Scannell Statistical unicodification of African languages
Hajdik et al. Neural text generation from rich semantic representations
Alsohybe et al. Machine-translation history and evolution: Survey for Arabic-English translations
Lindén et al. Hfst—a system for creating nlp tools
Rubino et al. Extremely low-resource neural machine translation for Asian languages
Menacer et al. Machine translation on a parallel code-switched corpus
Vāravs et al. Restoring punctuation and capitalization using transformer models
Biadgligne et al. Parallel corpora preparation for English-Amharic machine translation
Anastasopoulos Computational tools for endangered language documentation
Žagar et al. Cross-lingual transfer of abstractive summarizer to less-resource language
Bensalah et al. Arabic machine translation based on the combination of word embedding techniques
Kubis et al. Open challenge for correcting errors of speech recognition systems
Pichel et al. A methodology to measure the diachronic language distance between three languages based on perplexity
Bonilla Spoken Spanish PoS tagging: gold standard dataset
Sherif et al. Bootstrapping a stochastic transducer for Arabic-English transliteration extraction
Chennoufi et al. Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization
Gamal et al. Survey of arabic machine translation, methodologies, progress, and challenges
Stahlberg et al. Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment
Amri et al. Amazigh POS tagging using TreeTagger: a language independant model
Neubarth et al. A hybrid approach to statistical machine translation between standard and dialectal varieties
Winiwarter Learning transfer rules for machine translation from parallel corpora
Singh et al. Urdu to Punjabi machine translation: an incremental training approach
Popli et al. Multilingual query-by-example spoken term detection in Indian languages