Koehn et al., 2003 - Google Patents
Empirical methods for compound splittingKoehn et al., 2003
View PDF- Document ID
- 17025322200315328041
- Author
- Koehn P
- Knight K
- Publication year
- Publication venue
- arXiv preprint cs/0302032
External Links
Snippet
Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of …
- 150000001875 compounds 0 title description 33
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2845—Using very large corpora, e.g. the world wide web [WWW]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2836—Machine assisted translation, e.g. translation memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G06F17/2881—Natural language generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/271—Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/2715—Statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2795—Thesaurus; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
- G06F17/279—Discourse representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Koehn et al. | Empirical methods for compound splitting | |
US7711545B2 (en) | Empirical methods for splitting compound words with application to machine translation | |
US8548794B2 (en) | Statistical noun phrase translation | |
Vickrey et al. | Word-sense disambiguation for machine translation | |
Virpioja et al. | Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner | |
Farhath et al. | Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil | |
US20100057437A1 (en) | Machine-translation apparatus using multi-stage verbal-phrase patterns, methods for applying and extracting multi-stage verbal-phrase patterns | |
Schlippe et al. | Diacritization as a machine translation and as a sequence labeling problem | |
Crego et al. | Using shallow syntax information to improve word alignment and reordering for SMT | |
Awadalla et al. | An integrated approach for Arabic-English named entity translation | |
De Gispert et al. | On the impact of morphology in English to Spanish statistical MT | |
Ebeling et al. | CHAPTER ONE A CROSS-LINGUISTIC COMPARISON OF RECURRENT WORD COMBINATIONS IN A COMPARABLE CORPUS OF ENGLISH AND NORWEGIAN FICTION | |
Matusov et al. | The RWTH machine translation system | |
Popović et al. | Augmenting a small parallel text with morpho-syntactic language | |
Gascó i Mora et al. | Part-of-speech tagging based on machine translation techniques | |
Hasan et al. | Reranking translation hypotheses using structural properties | |
Zhang et al. | Adapting an Example-Based Translation System to Chinese | |
Imamura et al. | Bilingual corpus cleaning focusing on translation literality. | |
Ellouze et al. | Word Alignment Applied on English-Arabic Parallel Corpus. | |
Mitamura et al. | Diagnostics for interactive controlled language checking | |
Gough et al. | Controlled generation in example-based machine translation | |
Ji et al. | Collaborative entity extraction and translation | |
Giménez et al. | Low-cost enrichment of spanish wordnet with automatically translated glosses: combining general and specialized models | |
Ney et al. | The RWTH system for statistical translation of spoken dialogues | |
Carpuat | Toward using morphology in French-English phrase-based SMT |