Wasala et al., 2010 - Google Patents
A data-driven approach to checking and correcting spelling errors in sinhalaWasala et al., 2010
View PDF- Document ID
- 5779254823475863555
- Author
- Wasala A
- Weerasinghe R
- Pushpananda R
- Liyanage C
- Jayalatharachchi E
- Publication year
- Publication venue
- Int. J. Adv. ICT Emerg. Reg
External Links
Snippet
In this paper we describe the construction of a spell checker for Sinhala, the majority language of Sri Lanka. Due to its morphological richness, the language is difficult to completely enumerate in a lexicon. The approach described is based on n-gram statistics …
- 238000000034 method 0 abstract description 11
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G06F17/2223—Handling non-latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/271—Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/2715—Statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2795—Thesaurus; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/273—Orthographic correction, e.g. spelling checkers, vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G06F17/2881—Natural language generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2755—Morphological analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7302640B2 (en) | Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors | |
US7165019B1 (en) | Language input architecture for converting one text form to another text form with modeless entry | |
US20070021956A1 (en) | Method and apparatus for generating ideographic representations of letter based names | |
US11386269B2 (en) | Fault-tolerant information extraction | |
Singh et al. | Systematic review of spell-checkers for highly inflectional languages | |
Khan et al. | A light weight stemmer for Urdu language: a scarce resourced language | |
Mosavi Miangah | FarsiSpell: A spell-checking system for Persian using a large monolingual corpus | |
Bugert et al. | Generalizing cross-document event coreference resolution across multiple corpora | |
Mon et al. | SymSpell4Burmese: Symmetric delete spelling correction algorithm (SymSpell) for burmese spelling checking | |
Sorokin | Spelling correction for morphologically rich language: a case study of russian | |
Liyanapathirana et al. | Sinspell: A comprehensive spelling checker for sinhala | |
Tufiş et al. | DIAC+: A professional diacritics recovering system | |
Moreno-Sandoval et al. | Morpho-syntactic tagging of the Spanish C-ORAL-ROM corpus: methodology, tools and evaluation | |
Mahar et al. | Rule based part of speech tagging of sindhi language | |
Wu et al. | Integrating dictionary and web n-grams for Chinese spell checking | |
Pankam et al. | Two-stage Thai Misspelling Correction based on Pre-trained Language Models | |
Jayalatharachchi et al. | Data-driven spell checking: the synergy of two algorithms for spelling error detection and correction | |
Vasiu et al. | Enhancing tokenization by embedding romanian language specific morphology | |
Sharma et al. | Word prediction system for text entry in Hindi | |
Azmi et al. | Light diacritic restoration to disambiguate homographs in modern Arabic texts | |
Wasala et al. | A data-driven approach to checking and correcting spelling errors in sinhala | |
Kapočiūtė-Dzikienė et al. | Character-based machine learning vs. language modeling for diacritics restoration | |
Rajendran et al. | Text processing for developing unrestricted Tamil text to speech synthesis system | |
Wasala et al. | An open-source data driven spell checker for sinhala | |
Uchimoto et al. | Morphological analysis of a large spontaneous speech corpus in Japanese |