Farhath et al., 2018 - Google Patents
Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamilFarhath et al., 2018
- Document ID
- 17264314013381962167
- Author
- Farhath F
- Ranathunga S
- Jayasena S
- Dias G
- Publication year
- Publication venue
- 2018 Moratuwa Engineering Research Conference (MERCon)
External Links
Snippet
Availability of quality parallel data is a major requirement to build a reasonably well performing statistical machine translation (SMT) system. Thus, developing a decent SMT system for a low-resourced language pair like Sinhala and Tamil that does not have a large …
- 238000000034 method 0 abstract description 47
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2845—Using very large corpora, e.g. the world wide web [WWW]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G06F17/2881—Natural language generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G06F17/2223—Handling non-latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2795—Thesaurus; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hearne et al. | Statistical machine translation: a guide for linguists and translators | |
Koehn et al. | Empirical methods for compound splitting | |
Farhath et al. | Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil | |
Birch et al. | Edinburgh SLT and MT system description for the IWSLT 2014 evaluation | |
Haque et al. | TermFinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction | |
Ranathunga et al. | Si-ta: Machine translation of sinhala and tamil official documents | |
Abdurakhmonova et al. | Linguistic functionality of Uzbek Electron Corpus: uzbekcorpus. uz | |
Aasha et al. | Machine translation from English to Malayalam using transfer approach | |
Barreiro et al. | Linguistic evaluation of support verb constructions by OpenLogos and Google Translate | |
Ferreira et al. | Surface realization shared task 2018 (sr18): The tilburg university approach | |
Rabbani et al. | A new verb based approach for English to Bangla machine translation | |
Vandeghinste et al. | Parse and corpus-based machine translation | |
Tennage et al. | Handling rare word problem using synthetic training data for sinhala and tamil neural machine translation | |
Farhath et al. | Improving domain-specific SMT for low-resourced languages using data from different domains | |
Li et al. | Uzbek-English and Turkish-English morpheme alignment corpora | |
Musleh et al. | Enabling medical translation for low-resource languages | |
Yashothara et al. | Improving Phrase-Based Statistical Machine Translation with Preprocessing Techniques | |
Haque et al. | Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor, and Violeta Seretan (eds): Multiword units in machine translation and translation technology: Current Issues in Linguistic Theory, Volume 341, John Benjamin Publishing Company, Amsterdam & Philadelphia, 2018, ix+ 259 pp, ISBN 978-90-272-0060-0 (HB), ISBN 978-90-272-6420-6 (e-book) | |
Mohaghegh et al. | Improved language modeling for English-Persian statistical machine translation | |
Costa-Jussa et al. | A large Spanish-Catalan parallel corpus release for machine translation | |
Dubey et al. | Generation of bilingual dictionaries using structural properties | |
Akhtar et al. | An unsupervised approach for mapping between vector spaces | |
Ji et al. | Phonetic name matching for cross-lingual spoken sentence retrieval | |
Way et al. | Multiword units in machine translation and translation technology--Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor, and Violeta Seretan (eds), Book Review. | |
Alubaidi | Hybrid Arabic-English Machine Translation to Solve Reordering and Ambiguity Problems |